Adaptive particle swarm optimization algorithm based long short-term memory networks for sentiment analysis

Abstract

Text Sentiment analysis is the process of predicting whether a segment of text has opinionated or objective content and analyzing the polarity of the text’s sentiment. Understanding the needs and behavior of the target customer plays a vital role in the success of the business so the sentiment analysis process would help the marketer to improve the quality of the product as well as a shopper to buy the correct product. Due to its automatic learning capability, deep learning is the current research interest in Natural language processing. Skip-gram architecture is used in the proposed model for better extraction of the semantic relationships as well as contextual information of words. However, the main contribution of this work is Adaptive Particle Swarm Optimization (APSO) algorithm based LSTM for sentiment analysis. LSTM is used in the proposed model for understanding complex patterns in textual data. To improve the performance of the LSTM, weight parameters are enhanced by presenting the Adaptive PSO algorithm. Opposition based learning (OBL) method combined with PSO algorithm becomes the Adaptive Particle Swarm Optimization (APSO) classifier which assists LSTM in selecting optimal weight for the environment in less number of iterations. So APSO - LSTM ‘s ability in adjusting the attributes such as optimal weights and learning rates combined with the good hyper parameter choices leads to improved accuracy and reduces losses. Extensive experiments were conducted on four datasets proved that our proposed APSO-LSTM model secured higher accuracy over the classical methods such as traditional LSTM, ANN, and SVM. According to simulation results, the proposed model is outperforming other existing models.

Keywords

Sentimental analysis adaptive particle swarm optimization LSTM skip-gram feature extraction

1 Introduction

There has been significant websites increment in the quantity of Web 2.0 applications, such as online interpersonal organization and internet business locales, where customers whole heartedly express their decisions. Because of this expansion, a lot of information is created. Along these lines, opinion investigation was introduced as a device for automatically extracting insightful and valuable data from client produced information [1]. Sensory analysis is the major natural language processing (NLP) tasks. Due to its usefulness and challenges, this field attracted numerous researchers and professional communities [2, 3]. With the support of social media, people can share information as well as messages opinion and ideas. A huge number of people utilize web sites likes Facebook, Twitter, WhatsApp, Instagram, Google Plus and LinkedIn to express their opinions. Presently about 2.46 billion people utilize social media, with an anticipated increment to 3.02 billion until 2021 [4].

They also used to share their daily life events that lead to collecting large and different types of data. People want to share about a specific product’s experience using posts, likes, and reviews, and it gives companies a chance to gather this information and analyze the fame of their item and services [5]. How media is created and distributed through sharing and the realization of messages without any control is a revolution. Moreover, social media has an important impact on the business, advertising, and e-commerce industry, as it describes consumers’ behavior and perceptions about specific business plans, services, and products. Monitoring social media activity is an incredible method to quantify loyalty of customers, track their impressions of items or brands, influence crusades and the achievement of marketing messages, and identify the best influencers that are generally applicable to the campaign, item or brand. Thus the Social media is recognized as the next logical marketing platform [6, 7].

Specifically, comments on products in tweets are merit mining. Vendors can get purchaser’s feedback progressively and afterward update their items to be increasingly serious in the commercial center; Buyers can pick up the experience of others through these comments to decide if to purchase an item [8]. The tweets in real-time have a large influence on network transmission [9]. And also, the opinions are significant for organizations to know what customers are saying, whether it is positive or negative. If it is negative means, they can improve product quality and satisfy the customers. Even though to find the customer’s opinion is a difficult task, Most of the algorithms belong to machine learning, feature extraction methods are developed for sentimental analysis. The main contribution of the proposed work are summarized as:

We proposed the Adaptive Particle Swarm Optimization algorithm model (APSO-LSTM) for effective sentiment analysis. As the APSO algorithm selects the weight parameters of LSTM, computational complexity reduced and the accuracy of the work improved.

To obtain the overall higher accuracy, the Skip-gram word embedding method is utilized. Skip-gram word2vec representation achieves superior results over other word representations.

The evaluation results executed on four datasets shows that our APSO model is effective. Furthermore, we analyze different metrics such as accuracy, recall, precision, and F-measure. The results are proved through conducted experiments.

2 Literature review

Numerous machine learning algorithms are used for sentimental analysis. In paper [10] sentimental analysis is developed using the Naive Bayes classifier. Here, they identify whether the particular sentence is positive, negative, or neutral. In [2], Online Movies’ Reviews based Sentiment Analysis of utilizing enhanced k-Nearest Neighbor Classifier was explained. In this work, Sentiment Analysis of Customer Product Reviews Using support vector machine [11] is analyzed. Moreover, Deep neural networks also known as DNNs have currently attained huge gains in different NLP undertakings, for example, language modeling [12], sentiment analysis [13], syntactic parsing [14], and machine interpretation [15]. A recurrent neural network is also known as RNN is an uncommon kind of neural system, where connections are made between units that structure a directed cycle, which permits it to show a unique temporal behavior for the model. One of the special variations of RNN is LSTM. Nowadays, many researchers have developed LSTM for sentimental analysis, but the work in this area is not enough in terms of accuracy. So, an efficient sentimental analysis methodology is presented in this work.

The purpose of the proposed methodology is the automatic prediction of customer opinion of different products. The proposed model has three modules namely, pre-processing of text, implicit and explicit feature engineering and polarity prediction based on sentiments. Gathering reviews from social media is the first task and then pre-processing is done to enhance the data quality. Text pre-processing is the first step. Feature engineering is done using skip-gram based word embedding. Then, the extracted features are given to the LSTM network. The remaining portions in the paper are sorted as pursues; in section 2 some of the existing literatures are discussed.

Many researchers had developed customer review based sentimental analysis. some of them are discussed in this section. Sheng et al. [16] explained the finding of a rumor based on consumer opinion. To achieve the predictive process, they used a convolution neural network with LSTM (CNN-LSTM). In this paper, LSTM was incorporated into the pooling layer on CNN. Meanwhile, the perception has been added as an important element in the rumor detection model. The effectiveness of this method has been verified by experiment. Moreover, Zhigang et al. [17] explained a stock closing forecast depends on LSTM and sentiment analysis. Here, they were first introduced to include investor’s perceptions of stock forecasting, which improved the forecasting accuracy of the model. Second, the stock price sequence is a complex time series with varying time fluctuations, making accurate forecasting more challenging. Then, they introduced a gradual decomposition of the stock price complex by adding the empirical mode decomposition (EMD), which attained better predictive accuracy. Third, they adopted LSTM because of the merits of evaluating the relationships between time-series data via its memory function. The experimental results depicted that their proposed model not only improves accuracy but also reduces the delay.

Xiangua Fu et al. [18] described Lexicon- enhanced LSTM using an attention mechanism. The research focused on word embedding quality improvement and thereby improve the sentiment classifier’s accuracy using sentiment lexicon. He carried out experiments on three English datasets namely IMDB, Yelp2013, MR, and two Chinese datasets namely NB4000 and Book 4000. when compared with ALE-LSTM and WALE-LSTM, the proposed method obtained a higher accuracy of 89%,60.6%,79.9%,93%, and 96% [19]. Guixian Xu et al. proposed aBiLSTM model sentiment analysis tasks of comment text. Sentiment information concatenates with traditional TD-IDF word representation is utilized in this work. Relu activation function is used to overcome overfitting and gradient vanishing problems with the help of a gating mechanism.

Similarly, Latif et al. [20] had explained an ensemble approach for sentimental analysis. Here, they introduced a combination of three kinds of features namely, unigram, lexicon, and phrases. Then, two-level ensembles were introduced for the selection of features by using Logistic Regression (LR), Gini Index (GI), SVM, and Information Gain (IG). Finally, the classification was done by SVM. Here, they reviewed different products namely, Books, DVDs, electronics products, kitchen items, and movies review. The performance of the introduced method attains the maximum accuracy of 81.85%, 91.45%, 89.70%, 93.05%, and 97.60% for electronics, DVD, Books products, kitchen items, and movies review respectively.

In [21], Graber et al. had explained an aspect-based opinion mining of drug review. Here, initially, SA was performed to foresee the sentiments concerning by and large fulfillment level, effectiveness, and side effects of client reviews on particular drugs. Then, the features are given to the logistic regression model to predict the recommended drugs based on positive reviews. This method was attained the maximum accuracy of 75.8% [22]. Da’U and Salim presented an efficient neural attention based recommendor system. This model composed of LSTM encoder, Semi-Supervised topic model, mechanism for co-attention and an analysis layer for predicting the rating of the users. The speciality of this model is the better learning capacity of the aspects of the products and sentiment lexicons and thereby improve the efficiency of recommender system. Kashfia Sailunaz and Reda Alhajj [23]. Explained the sentiment analysis task on twitter comments. The main contribution of this work is the detection of emotions and sentiments from the posts and tweets people in the social networks using the Naïve Bayes classifier. This model provides a topic-based general recommendation and user-based customized recommendation.

In [24], Shoieb and Ajit had explained an emoticon-based SA using web data. Initially, reviews are collected from the web. Then, pre-processing was done. In data pre-processing, POS tagging, stop-word removal, Lemmatization, and emotion processing was applied. Then using sentiwordnet emotion-based features using derived. After the feature extraction process, classification was done by using different classifiers namely, SVM, IBK, MLP, and Naive bias classifier. Among the classifier, naive bias classifier attained the maximum precision of 84.7% for college dataset and 83.3$ for using hospital dataset. In this paper, reviews are labeled as neutral tweets, positive, strong positive, strong-negative, weak-positive, negative and weak-negative. Additionally, Zeeshan et al. [25] had presented a lexicon and ANN-based SA. Here, they utilized the movie review dataset and the dataset consists of two labels namely positive and negative. The trained network managed to achieve a final accuracy of 91% [26]. Ankit et al. suggested an ensemble sentiment classifier for twitter sentiment analysis. The computation of the work is to predict the category of the tweet as positive, neutral, or negative. Four datasets like Stanford-sentiment 140 corpus, Health Care Reform(HCR), First GOP debate, and Sentiment analysis dataset were collected from twitter and used for testing. The proposed ensemble technique acquires 75.79%,70.28%,76.85%, and 73.33%.

Weijiang et al. [27] had explained Bidirectional LSTM with multi-channel features and self-attention mechanism (SAMF-BiLSTM) for classification of sentiment. SAMF-BiLSTM model was completely misuse the connection between sentiment polarity words and target words in a sentence, and doesn’t depend on physically sorted out opinion dictionary. Moreover, they presented the SAMF-BiLSTM-D model dependent on SAMF-BiLSTM model for record level content order assignments. The strategy gets the portrayal of all sentences in the archive through SAMF-BiLSTM preparing, at that point coordinates BiLSTM to become familiar with the portrayal all things considered, and further gets the assessment include data of the whole record. At long last, they assess test results under various datasets. The outcomes show that SAMF-BiLSTM and SAMF-BiLSTM-D was better than other propelled strategies in order precision as a rule.

Moreover, Alqaryouti et al. [28] had presented aspects depend on sentimental analysis utilizing data of government review. This approach has been adopted to address the challenges of language analysis techniques, rules, and dictionaries in much sensory analysis and to provide concise results. Identification of Indirect features in this approach which used to enhance the accuracy of the feature extraction process. Furthermore, the combined model for classification surpasses the dictionary-depend criteria and other rule combinations by an average accuracy of 5%. While utilizing the same dataset, the introduced method surpasses machine learning schemes using the support vector machine abbreviated as SVM [29]. Paramita Ray and AmlanChakrabati proposed a combined approach of Rule-based and Deep Learning method for aspect level sentiment analysis. This research used dependency parsing, machine learning techniques, and a seven-layered deep convolution neural network (CNN)for tagging each aspect in the comments.

Akyol et al. [30] describes a Social Impact Theory depend Optimization Algorithm and whale optimization model based opinion mining. The prevalent exchanging methodology dependent on the sentiment feedback quality between the tweets and news utilizing conventional programming optimization strategy was discussed by Yang et al. [31]. Keshavarz et al. [32] proposed a genetic algorithm based sentimental analysis model. Six different datasets are used for conducting experiments and the result achieved higher accuracy. In Paper [33], the author proposed a sentimental analysis method based on cross-domain aspect. They introduced a heterogeneous organization depend depiction that merges different qualities into a single network.

Donatas Meskele and Flavius Frasincar [34] given a neural network based ontology model for giving answer for sentence level opinion mining. Based a viewpoint’s opinion esteem in a given sentence. For estimating the importance of the words in the given sentence based on aspect’s sentiment value, bidirectional mechanism is used [35]. Wenxin Liang et al described topic embedding model for short texts. Gibbs sampling process is utilized in this research to enhance the topic coherence. Global and Local word embedding contributes more in increasing the performance of this model.

Yao Hu et al. [36] proposed a LSTM network aggregates the PSO algorithm for safety forecast model. Enhanced PSO – GD aggregated LSTM best suited for the analysis for Time series data. GD methods are applied iteratively to LSTM parameters to reduce the cost and to improve the accuracy. Peng Wang et al. [37] presented a solution for toll station based lane work-schedule using toll data. LSTM and PSO algorithm predicts the average length queue of the lane with the three input parameters such as the traffic volume, queue length average and time taken for service. The proposed PSO-LSTM method increased 2% and 3% of the accuracy respectively when compared to the SVR models and traditional LSTM. At last, the toll station operating cost is used for the estimation of work-schedule of toll lane.

3 Dataset description

For conducting experiments, four datasets are used namely, Amazon reviews [38], trip advisor [39], demonetization reviews [40], and book reviews [41]. Table 1 describes the datasets.

Table 1
Description of datasets

Datasets Total Reviews Total positive reviews Total negative reviews Total Neutral reviews

Amazon reviews 60000 25627 19567 14806

Trip Advisor 20000 10000 7000 3000

Demonetization Twitter data set 12974 2974 4936 5064

Books reviews 213269 147268 38434 27567

Datasets	Total Reviews	Total positive reviews	Total negative reviews	Total Neutral reviews
Amazon reviews	60000	25627	19567	14806
Trip Advisor	20000	10000	7000	3000
Demonetization Twitter data set	12974	2974	4936	5064
Books reviews	213269	147268	38434	27567

The Amazon dataset includes 60000 Amazon customer reviews and star ratings. The 60000 reviews are classified as 25627 positive reviews, 19567 negative reviews and 14806 are neutral reviews. The Trip advisor dataset includes customer’s reviews about 1000 hotels which are delivered by Datafiniti’s Business Database. The Trip advisor dataset consists of hotel name, location, review data, title, username, rating and more. It has 20000 numbers of reviews out of which 10000 reviews are positive, 7000 reviews are negative and 3000 reviews are neutral. The demonetization twitter’s dataset with 12974 tweets is classified as 2974 positive tweets, 4936 negative tweets and 5064 neutral tweets Besides the books review dataset contains 213335 numbers of reviews out of which 177268, 38434 and 27567 numbers of reviews are positive, negative and neutral respectively.

4 Proposed sentiment analysis using APSO based LSTM

The proposed model’s work flow diagram is depicted in the Fig. 1. As shown in the figure, reviews or tweets from the datasets such as Amazon dataset, Trip Advisor dataset, Demonetization dataset and Book review dataset are pre-processed by pursuit the phased tokenization, stop words removal, stemming and segregation. Then the pre-processed twitter words are represented as a vector using one-hot encoding representation method. Skip-gram based word2vec architecture model is used to mention the words in lower-dimensional space and to make the representation more accurate. Finally, the input tweet or review is classified as positive or negative polarity using the LSTM network. To improve the performance of the LSTM, optimal weight parameters using Adaptive Particle Swarm Optimization (APSO) algorithm are chosen.

Fig. 1

The Flow diagram of proposed model.

4.1 Pre-processing

Before classifying the sentiment of tweets or reviews, the following steps are applied to datasets.

Tokenization: In this process, tweets are split into phrases or tokens, symbols, and words.

Stop words removal: Stop words such as I, and, the, for, should, etc, are removed from the tweets using NLTK’s(Natural Language Toolkit)stop word list.

Stemming: The main work of the Stemming process is to reduce the words to its base forms with the help of morphology stemming. For instance, the words ‘advising’ and ‘advised’ are reduced to their root word as ‘advice’.

Segregations: In this process, the special characters such as “ ’ ? !;: # $ % & () *+–/< > =[] n ∧ _ { } | ∼ are removed from the tweets.

Padding the excess data: One of the significant task of opinion analysis is Padding. Padding deals with reviews of different lengths by padding or truncating the reviews to a fixed length. The fixed length is determined by sequence length. Padding helps in the process of reducing wastage of memory. Like the number of time steps in LSTM layer, sequence length is defined. In reviews, if the seq_length is lesser, 0 s wil be dumped and if the seq_length is higher then, reduction should be done based on first seq_length words.

4.2 Words to a vector representation

To convert words as vectors, one-hot encoding technique is used. Figure 2 delineates the one-hot encoding strategy. It refers to words as vectors that are similar in size as vocabulary dictionary. For illustration, as in Fig. 1, if there are 100 words in a dictionary, the words in a sentence deals with a dimensional vector size of 100. The Skip-Gram algorithm is used to marks words at a lower-dimensional space and to show their significance in a vector.

Fig. 2

An example of a one-hot encoding technique.

4.2.1 Skip-gram based word embedding model

Numerous words list are extricate from contents statement into vectors with lower dimensions, usually from 10 to 1000 dimensions. Break down the frequency of words in text analysis and statements by large neglects of the sequence of sentences, paragraphs and words. In any case, this sort of evaluation may confine the comprehension of words importance in the sentence on the grounds that the logical significance of the words and the presence of the words are barred from the analysis. In this way, we used word embedding method to more precisely understand users’ reviews and to understand the basic qualities and meaning of words.

Skip-gram algorithm [42] is used in this proposed work as it shows better accuracy in extracting relationships between semantic words. The structure of the skip-gram word2Vec representation is shown in Fig. 3. The skip-gram word2Vec model speculates words that can be shown with regards to the current word. The information layer utilized words introduced as one-hot encoding vector. This vector implies as a context word, just one unit from U units, ${x_{1}, x_{2}, . . . . ., x_{U}}$ , will be 1, and every single other unit are 0 as appeared in Fig. 1. Size of one hot encoder vector is equivalent to the size of the vocabulary dictionary utilized in the phase of training. The words utilized by the input layer are anticipated to a M-dimensional vector W of U×M size are contributed towards projection layer. Yield from this is increased by the weight matrix W’ of M×U size by a projection matrix and given to the output layer. Finally with the help of softmax function, the output layer predicts a neighborhood words. However, softmax function is costly in terms of computation and it reduces the effort for training. Along these lines, hierarchical softmax is used in output layer rather than softmax function. For estimating the probability in hierarchical softmax, a binary tree structure is used to predict words. Dissimilar the softmax function, the hierarchy softmax does not need to search for the entire word, so we can decrease the computation time in the output layer and the output probability of a word is calculated as follows: $\begin{matrix} pr (w / w_{input}) = \prod_{j = 1}^{L - 1} σ (m (w, j + 1) \\ = child (m (w, j)) v_{m (w, j)}^{'} v_{w_{input}}) \end{matrix}$ (1)

Where, the input word is represented as w_input and v denotes the input vector and v’ denotes the output vector representations of the word w. σ () represents a sigmoid function. Path length is denoted as L. The jthnode in the binary tree is represented as m(w,j), child(m)denotes a child node of mchosen subjectively. Then, the predicted words should minimize error function and the error is determined as follows: $Error = \frac{1}{N} \sum_{n = 1}^{N} \sum_{- c \leq j \leq c} log pr (w_{n + j} | w_{n})$ (2)

Where, the size of the context is represented as c and the number of word sequences is represented as N. The goal of this function is to update the weight matrix W’ with minimum error rate. In the same context, if two reliable words are used, both words will be assigned with the, unlike vector values. In these ways, various assumptions and analyzes made based on them.

Fig. 3

Structure of the skip-gram model.

4.3 Sentimental analysis using APSO based LSTM

The output vectors or features are given as classifier’s input which classifies the tweet or review as a positive context or negative context.

The set of feature is denoted as follows, $Y = {Y_{1}, Y_{2}, . . . . ., Y_{S}}$ (3) Where, $Y_{S}$ denotes the S^th feature.

A special type of recurrent neural network (RNN) [13] called an LSTM neural network [44], is utilized in the proposed model for classification. Conventional neural structures do not consider sequential factors and cannot review the substance of the past. RNN was structured to understand this issue. Figure 4 displays the structure of an RNN. The hidden state Ht time is purchased from the Yt information and ultimately from the Ht - 1 yield. It is used to find the loss model of the current layer and to determine the Ht + 1 of the following layer. Regardless, the hidden structure of the RNN sequence index position t was upgraded to keep the target distance from the gradient disappearing issue, considering the way an RNN gradient would deal with the breakdown. Then, an abnormal RNN model called LSDM can adjust the long-distance reliability information. LSDM is somewhat inaccurate to the general neural system module of RNNs. In RNN [44], the re-emphasized NN module A has a basic structure, for example, a tanH layer.

Fig. 4

Structure of RNN.

$H_{t} = tan H [w_{H} (Y_{t}, H_{t - 1}) + c_{H}]$ (4)

Where, cH denotes the control parameters of hidden state. tanH allows neural network to add or remove information to the previous input. wH refers to the weight parameter of the hidden state Ht.

On the other hand, Fig. 5 displays that LSTM consists of four neural network layers that connect within an exceptional way. Using a phenomenally designed structure named as a “gate”, LSTM can add or delete information to the memory cell state. This is the area where the gate selects operational data, viz. features of the input. It has the sigmoid neural system layer and multiplication function. The sigmoid layer switches over the input values of the features through the sigmoid function and outputs value someplace in the scope of 0 and 1, depicted how much input features can encounter that in Section A of the framework. “0” indicates that no data is permitted to pass. “1” shows that all data is permitted to pass. At each sequence index level t, the gate system in the LSDM, everywhere, connected to the gates. The sigmoid output is to assumed if the limit is of [0,1], Equations (5)–(9) describe how a model of LSTM works, as shown in Fig. 5.

Fig. 5

Structure of LSTM.

The Forget gate will select which information to discard or keep from last minute’s memory: $F_{t} = σ [w_{F} (Y_{t}, H_{t - 1}) + c_{F}]$ (5) Where, F_t represents forget gate. c_F and w_F denote the control and weight parameters at forget gate. Y_t represents input at the current timestamp, H_t-1 denotes the output obtained at the timestamp t-1 from the previous block of LSTM. σ denotes the logistic sigmoid function and they give the output value between 0 or 1. If the output is ‘0’ then it means blocking of gates. If the output is ‘1’ gates letting everything pass.

The input gate chooses the information that should be stored: $I_{t} = σ [w_{I} (Y_{t}, H_{t - 1}) + c_{I}]$ (6)

Where,

I_t → input gate.

σ→ Sigmoid function

H_t-1 → output of the previous timestamp

Y_t → output of the current LSTM block

W_I → weight parameter of the input gate neurons

C_I → points to the bias for the input gate.

Another candidate value vector is made by a tanH layer and is denoted as follows: $V_{t} = tan H [w_{V} (Y_{t}, H_{t - 1}) + c_{V}]$ (7)

The tanH allows LSTM to add or remove information to the previous input. Vt denotes candidate at timestamp (t) for the cell state. control and weight parameters of tanH layer are referred as cV and wV refers.

The generation of the candidate value vector is selected at input gate, and the forget gate selects whether to keep or discard the information to create the final memory. $V_{t} = F_{t} * V_{t - 1} + I_{t} * V_{t}$ (8)

Where Vt denotes memory cell state at current timestamp(t) and *represent the element-wise multiplication of the vector. Finally, the output gate determines which part of the memory is in the long run yielded: $O_{t} = σ [w_{O} (Y_{t}, H_{t - 1}) + c_{O}]$ (9)

By then, the passed data streams into the tanH layer for getting ready. The output is regard between [–1, 1] and the output gate multiplies the yielded regard. Ot denotes the output gate. Wo carries the weight parameter of the output gate neurons. σ points the sigmoid function and the obtained output from the previous timestamp is denoted as Ht-1, Yt points to the obtained output of the current LSTM block, c0 points to the bias for the input gate. Finally the output is evaluated by, $H_{t} = O_{t} * tan H (V_{t})$ (10)

Where,* denotes the vector’s element wise multiplication. Through softmax output layer, predicted output from the current block is obtained and is pointed by Ht. Vt denotes memory cell state at the (t) current time stamp.

Finally, the loss function of this model is estimated by calculating the MSE (mean square error). The calculation is done as follows $Loss = \sum_{t = 1}^{N} \overset{2}{(H_{t} - T_{t})}$ (11)

Where, Tt denotes the desired output. N is the prediction generated from a sample of n data points. Loss calculates the mean squared prediction error.

The mean square error (MSE) is the average squared difference between the values of evaluated and the actual value.

If the estimated score is below 0 (negative values), then the tweet or review is considered as a sentiment of negative and if the estimated score is above 0 (positive), then the tweet or review is considered as a positive sentiment and lastly if the estimated score is 0 it is considered as neutral.

4.3.1 Weight optimization using APSO

To enhance the performance of the LSTM, Weight parameters ${w_{F}, w_{I}, w_{V} and w_{O}}$ from Equations (5)–(9) are optimized utilizing the proposed APSO algorithm. These weight parameters are used to attain the target output in LSTM. In the training phase, the LSTM network is to be trained. For reducing computational complexity, the LSTM network is trained by selecting the weight parameters using the proposed APSO algorithm. Before executing the LSTM network, the weight parameters of each layer within the range [0,1] are given as input to the APSO algorithm. The PSO algorithm was developed in 1995 by Kennedy and Eberhart. This mechanism is motivated by the behavior of the flock of birds. It is a population-based evolutionary algorithm which starts with a population of particles with random solutions. Each Particles is the initialization of solutions to the evolutionary optimization problem. The random position and velocity has been initialized for the Particles. The particle swarm optimization (PSO) algorithm is best optimization algorithms as it has only fewer parameters to adjust. The PSO technique is getting famous because of its straight forwardness of execution and capacity to achieve a good solution.

The PSO has 4 levels:

Initialization of particles,

Estimating the particles with the fitness function

Renewing particle positions and velocities.

Updating the experiences of the particles with the general knowledge of the swarm.

For increasing population diversity and for avoiding the premature convergence of PSO, the opposition based learning (OBL) method [45] is used with PSO. Opposition based learning (OBL) combined with PSO becomes the Adaptive Particle Swarm Optimization (APSO).

APSO optimizer plays a key role in increasing the accuracy of the proposed LSTM neural network model by adjusting the attributes such as weights and learning rates to reduce losses.

Opposition based learning (OBL): OBL has amazing learning and optimization skills. OBL (opposition-based learning) is well known for its ability in selecting the best optimal solution from a set of feasible solutions with fast convergence. OBL is integrated with the PSO variant to direct the particles using velocity clamping and to regulate its speed. It helps to choose the finest particles from the current swarm and its counter swarm to improve the fitness of the whole swarm

While surveying an answer x to a given issue, figuring its opposite answer x’at the same time, gives another opportunity to estimate a closer solution for the global optimum. The concept of opposition-based learning can be integrated with neural networking;

•Opposite Weights: For all the weights chosen for ANN, the opposite weights are generated. This process is analogous to mutation mechanisms in genetic algorithms. The number of weights to choose and the way to select them offers a wide variety of feasible schemas that can be inspected. Using this algorithm, optimal weight parameters’ are choosing and are described as followed:

Initialization: With a d dimensional vector, the particles or candidate solutions are initialized. In this algorithm, the weight parameters of LSTM are considered as the candidate solutions and are selected within the range [0, 1].

The initialization is done as $X_{k} (i) = {X_{k 1} (i), X_{k 2} (i), . . . . . X_{kd} (i)}$ (12) Where, $X_{kd} (t)$ denotes the kth particle’s position in the d^th dimension vector at iteration i and defines as follows $X_{kd} (i) = {w_{F}, w_{I}, w_{V} and w_{O}}_{kd} (i)$ (13) Where,

w_F → weight parameter of forget gate,

W_I → weight parameter of the input gate neurons

W_o → weight parameter of the output gate neurons

W_V → Weight parameter of tanH layer

Besides, the opposite solution is represented as follows;

For a given problem, whenever a answer x is find then the opposite answer $\bar{X}$ is to be evaluated. This evaluation may be based upon experience or randomly guessed solution. ${\bar{X}}_{k} (i) = {{\bar{X}}_{k 1} (i), {\bar{X}}_{k 2} (i), . . . . . {\bar{X}}_{kd} (i)}$ (14)

The real number is choosen from X ∈ [a, b] and the $\bar{X}$ opposite solution is estimated as, $\bar{X} = a + b - X$ (15) Where, b and a are the maxima and minimum weight values individually.

Fitness: Using the conditions (16), every solution’s fitness value is appraised after initializing the candidate solutions and opposite solutions. Fitness function is defined using the Equations (11), $F (i) = Min (Loss (i))$ (16)

The optimal solution is the solution with the least fitness value. APSO finds the optimal weight based on the environment within less time duration. If the number of iterations increases then computational complexity will increase. This also minimizes the loss.

Updating Velocity and Position: After estimating each solution’s fitness, as per its velocity vector and position, the solution is updated. Using the equations of (17) and (18), every solution is updated until determining the best solution.

In each iteration, the particle’s velocity have to be adjusted to its newly formed position Pbest and global position Gbest. The velocity v of each particle is to be updated according to conditions, the $\begin{matrix} V_{kd} (t + 1) = w * V_{kd} (t) + (P_{{best}_{kd}} (t) - X_{kd} (t)) \\ c_{1} r_{1} + (G_{{best}_{d}} (t) - X_{kd} (t)) c_{2} r_{2} \end{matrix}$ (17) Where, $X_{kd} (t)$ represent the velocity and $V_{kd} (t)$ denotes the position of the k^th particle.d denotes the dimensional space and t is the iteration. For any given particle’s (P) velocity. Pbestrepresents personal or local best values and Gbestvaluerepresent global best value. We require to calculate two differences: (G_best - P) and (P_best- P), since any given particle’s (P) velocity is as per the variation between the global best (Gbest), and the particle best (pBest). $c_{1}$ and $c_{2}$ denote the coefficients of acceleration which is equivalent to 2. $r_{1}$ and $r_{2}$ denote the random variables inside the range [0, 1]. $X_{kd} (t + 1) = X_{kd} (t) + V_{kd} (t + 1)$ (18)

W denotes inertia weight. The inertia weight is used for the searching process. The inertia weight will be reducing while maximizing the iteration. This is estimated as follows, $w = w_{max imum} - \frac{w_{max imum} - w_{min imum}}{t_{max imum}} \times t$ (19) Where, $w_{min imum}$ and $w_{max imum}$ denote the minimum and maximum inertia weight respectively. $t_{max imum}$ denotes the maximum number of iterations. $P_{{best}_{kd}} (t)$ and $G_{{best}_{d}} (t)$ represent the best position of the particle kand best position of the group at iteration t. If the k^th particle’s fitness ( $X_{kd} (t + 1)$ ) is lesser than that of previous $P_{{best}_{kd}} (t)$ , then the particle is assumed as new $P_{{best}_{kd}} (t + 1)$ . Otherwise, the particle $X_{kd} (t)$ is considered as new $P_{{best}_{kd}} (t + 1)$ . Additionally, if the fitness of the k^th particle ( $X_{kd} (t + 1)$ ) is lesser than that of previous $G_{{best}_{d}} (t)$ , then the particle is considered as new $G_{{best}_{d}} (t + 1)$ . Otherwise, the particle $X_{kd} (t)$ is considered as new $G_{{best}_{d}} (t + 1)$ . $\begin{matrix} P_{{best}_{kd}} (t + 1) = \\ {\begin{matrix} X_{kd} (t + 1) ifF (X_{kd} (t + 1)) \leq F (P_{{best}_{kd}} (t)) \\ X_{kd} (t) otherwise \end{matrix} \end{matrix}$ (20) $\begin{matrix} G_{{best}_{d}} (t + 1) = \\ {\begin{matrix} X_{kd} (t + 1) ifF (X_{kd} (t + 1)) \leq F (G_{{best}_{d}} (t)) \\ X_{kd} (t) otherwise \end{matrix} \end{matrix}$ (21)

Termination: Until determining the best solution or weight parameters of LSTM, the above steps are continued. The iteration continues until the best solution is obtained. After selecting the optimal weight parameters which are used in the LSTM network for testing. Figure 6 depicts the flow diagram of the proposed APSO algorithm.

Algorithm: Optimize the weight parameters of LSTM using APSO
1. Update inertia weight (w), and random variables (r1 and r2), Acceleration coefficients (c1 and c2).
Where, w denotes inertia weight which is utilized to control the impact of the previous velocity of a particle on its current velocity. c 1 is the cognitive weight and c2 is the social weight. The random values r1 and r2 are in the range [0,1].
2. Initialize the candidate solutions or weight parameters
Initialize the velocity(v), the Position of particles (p), the Number of particles (k),
The dimensional space (d), the iteration (t), the individual particle experience(P_best), and the common knowledge of the swarm(G_best).
3. Initialize the opposite solution using OBL.
Find P_best using opposite-based learning. Choose and update G_best. The fundamental principle of OBL is to simultaneously consider one best solution and the corresponding counter-solution to approximate the global optimum
4. Evaluate the fitness for each solution and the opposite solution using (16).
Using evaluation function, Compute the fitness value of each particle. Best p is the particle’s local best position, best g is the particle’s global best position and obtain the opposition-based solutions of personal best positions.
5. Update velocity and position of the solution and opposite solution using (17) and (18).
At every iteration, each particle adjusts its velocity to be simultaneously close to its new position G_bes_t and its P_bes_t. The velocity of each particle, v, is updated according to the conditions.
6. Compare the fitness value for every iteration. Each particle’s current fitness value should be compared with its previous best fitness (P_bes_t). If the estimated current value is better than P_bes_t, at that point set P_best equivalent to the current value and the P_best area equivalent to the current area in the d-dimensional space.
7. If fitness $X_{kd} (t + 1)$ is greater than the fitness of $P_{{best}_{kd}} (t)$ .
8. Then $X_{kd} (t + 1)$ is considered as new $P_{{best}_{kd}} (t + 1)$
9. Else $X_{kd} (t)$ is considered as $P_{{best}_{kd}} (t + 1)$ .
10. Evaluate the greatest fitness (G_best) by comparing the P_best of particles with each other and replace the global best position of the swarm.
11. If the fitness $X_{kd} (t + 1)$ is greater than the fitness of $G_{{best}_{d}} (t)$
12. Then $X_{kd} (t + 1)$ is considered as new $G_{{best}_{d}} (t + 1)$
13. Else $X_{kd} (t)$ is considered as $G_{{best}_{d}} (t + 1)$
14. End
15. The velocity of every particle has been accelerated against its P_best and G_best. A Random term is used to weigh this acceleration. Every particle’s new position in the solution space is determined by summing up the particle’s position vector and new velocity value.
Until finding the best weight parameters or optimal solution, the above steps 4–14 are to be continued. The iteration continues until the best solution is obtained.

Fig. 6

Flowchart of the APSO algorithm.

5 Results and discussion

The proposed sentiment analysis on various datasets is implemented using the programming language Python 3.7 in the operating system of windows 2007 with 64 bit and with 4GB main memory at 2 GHz dual-core PC. In this simulation, various datasets such as Amazon, Trip advisor, Demonetization, and Books are utilized. From each dataset, 80% of the dataset are taken for training the proposed classifier APSO-LSTM and 20% of the dataset are taken for testing the classifier. Using the training dataset, pre-processing and word embedding are done and the classifiers are APSO-LSTM, LSTM, ANN, and SVM are trained. Similarly, pre-processing and word embedding processes are also done on the testing dataset. Finally, the embedded word features from the testing dataset are given as input to the trained classifiers. Due to the performance of trained classifiers, the sentiment score of the input tweets is analyzed. After the classification (positive, negative, and neutral) of input tweets, based on precision, recall, accuracy, and F-score, the performance of the classifier is appraised. The following section defines the performance metrics.

5.1 Classifiers

For performance analysis, we evaluate the classifiers using four datasets namely, Amazon reviews, trip advisor, demonetization reviews, and book reviews. The results of these classical sentiment classifiers such as ANN(Artificial neural network) [25], SVM [12] (Support vector machine) traditional LSTM [17] and, LSTM – PSO [36, 37] are compared with our proposed classifier APSO-LSTM.

Artificial neural network: This classifier is very useful in performing various tasks in many applications namely, prediction and recognition. There are three layers mainly composed by ANN. For getting information (data), signs, attributes, or estimations from the external condition, input layer is necessary. These information sources (tests or models) are commonly institutionalized inside the limit esteems conveyed by activation. Hidden layer is called a center layer which is placed between the Input and Output layer. Activation function applies on a hidden layer in case it is open and the loads in the hidden hub need to test using training information. The hubs presented in the output layer are a dynamic one. This layer additionally made out of neurons and is responsible for conveying and presenting the last network yields, which result from the preparation performed by the neurons in the past layers.

Support vector machine [12]: The SVM approach (Tehranyet et al. 2015) aims to divide the hyperplane between classes by taking into account the training instances focused on the class descriptions. support vectors are derived from the training cases. Other training vectors are rejected vectors. The purpose of SVM is to improve the generalization capability by extending classification gaps through a discriminant function. For all linear classification problem, let the training sample set be {u_i, v_i } , (i = 1, 2, . . . , m). The mathematical expression of the optimal hyperplane is given in Equation (22). $f (u) = ω . φ (u) + a$ (22)

Where;

a → Threshold value

ω → Weight factor

The training sample of the discriminant function is given in Equation (23). $f (u) = sgn (\sum_{i = 1}^{m} v_{i} . b_{i} . r (u_{i}, u) + a)$ (23)

Where;

b_i → Lagrange multiplier

a → Threshold value

u_i, v_i → Two vectors

r (u_i, u) → Kernel function

LSTM [17, 44]: This neural network model is best suited in various Natural Language Processing such as whole data sequence predictions and prediction over time series data. This classifier is will very suitable for grouping, handling time series predictions. LSTM solves gradient vanishing problem which arise in RNN [13] When compared to ANN and SVM, conventional LSTM classifier attains best results on the datasets used.

5.2 Performance metric

The metrics such as accuracy, recall, precision and F-score are defined with the TP,TN,FP and FN. Where TP denotes True positive, TN denotes True Negative, FP denotes False Positive and FN denotes False Negative. The performance metrics are defined as follows. $Accuracy = \frac{(TN + TP)}{(TN + TP + FN + FP)}$ (24) $Re call = \frac{TP}{(FN + TP)}$ (25) $Precision = \frac{TP}{(FP + TP)}$ (26) $F - measure = \frac{2 (Pr ecision * Re call)}{(Pr ecision + Re call)}$ (27)

5.3 Comparative experiments

For demonstrating the proposed technique efficacy, much comparative analysis has undergone. A good selection of feature extraction techniques contributes well for a better sentimental prediction task.

5.3.1 Selection of parameter for skip-gram word embedding

In Table 2, the Hyper-Parameter choices for word embedding are tabulated below

Table 2
Hyperparameters for word embedding

Hyper Parameters Choices

The Architecture Skip gram (s1)

Window size(w) 10

Iteration 30

Vector size 400

Activation function Negative sampling

Subsampling rate 1e-3

Hyper Parameters	Choices
The Architecture	Skip gram (s1)
Window size(w)	10
Iteration	30
Vector size	400
Activation function	Negative sampling
Subsampling rate	1e-3

5.3.2 Results of different word embedding size

In this paper skip gram algorithm [42] is used for feature extraction process,. This model converts the word into vector format. Here, the performance comparison between the N-gram model [43] and proposed skip-gram model is done.

Table 3 shows the comparative analysis of model with different word embedding size for Amazon dataset. As skip-gram model reduces the computational complexity, accuracy percentage of the skip-gram model with 100 word embedding size is increased to 91.5% while the existing n-gram model attains 80.6%. In addition, at word embedding size 500, skip-gram model based word representation obtains 95.2% accuracy while the existing n-gram model attains 86.1%. The Table 3 corresponding graphical representation is given in Fig. 7(a).

Table 3
Accuracy obtained at different word embedding size on Amazon Review dataset

WE Methods\WE sizes 100 200 300 400 500

N-gram 86 88 89 90.5 91.5

Skip-gram 90.1 91.5 93.7 95.2 96.8

WE Methods\WE sizes	100	200	300	400	500
N-gram	86	88	89	90.5	91.5
Skip-gram	90.1	91.5	93.7	95.2	96.8

Fig. 7

Accuracy plot for different feature extraction methods at different Word Embedding dimensions (a) using Amazon dataset (b) using Trip advisor dataset.

In Table 4, results obtained at different word embedding size using Trip advisor dataset is shown. skip-gram model acquired the accuracy of 92.5% at word embedding size 100. As tabulated, our skip-gram model shows better performance at word embedding size 500. The results shows the continuous highest performance of the proposed model on different datasets. The Table 4 corresponding virtual representation is given in Fig. 7(b).

Table 4

Accuracy obtained at different word embedding size on Trip advisor dataset

WE Methods\WE sizes	100	200	300	400	500
N-gram	88	90	91	92.5	93.5
Skip-gram	91.5	93.5	95.6	96.2	97.8

While making predictions, the skip-gram model learns better representations for the rare words because there is no averaging of embedding vector. Skip gram based word embedding yields consistent best accuracy in models using large corpora and a high number of dimensions.

5.3.3 Parameter setting

The Parameter setting for the sentiment classifier is listed below in Table 5.

Table 5
Parameters used in this work

Hyper parameter Values

Algorithm LSTM

Epoch 500

Batchsize 32

Optimizer APSO

Number of layers 3

Loss function MSE

Learning_rate 0.001

Dropout_rate 0.5

Number of hidden unit 20

Training 80%

Testing 20%

Hyper parameter	Values
Algorithm	LSTM
Epoch	500
Batchsize	32
Optimizer	APSO
Number of layers	3
Loss function	MSE
Learning_rate	0.001
Dropout_rate	0.5
Number of hidden unit	20
Training	80%
Testing	20%

5.3.4 Performance analysis of Proposed APSO-LSTM at different Iterations on Amazon Reviews dataset

The goal of the proposed methodology is the prediction of reviewer’s opinion using APSO-LSTM algorithm. Evaluation is done on proposed model using Amazon review dataset to assess the performance.

The experimental results obtained by using Amazon dataset is given in Table 6. To enhance the performance of LSTM, APSO algorithm selects the optimal weight parameters As per the analysis in Table 6, the maxima accuracy of 96.8%, precision of 85.28%, recall of 76.08% and F-measure of 80.45% attained by theproposed strategy. This is due to optimal weight selection using APSO. The corresponding graphical representation is given in Fig. 8.

Table 6
Performance analysis of Proposed APSO-LSTM at different Iterations on Amazon review dataset

Iterations\Metrics Accuracy Precision Recall F-measure

% % % %

100th 90.1 80.12 71.25 75.43

200th 91.5 81.16 72.27 76.46

300th 93.7 82.58 73.67 77.87

400th 95.2 83.79 74.92 79.11

500th 96.8 85.28 76.08 80.45

Iterations\Metrics	Accuracy	Precision	Recall	F-measure
100th	90.1	80.12	71.25	75.43
200th	91.5	81.16	72.27	76.46
300th	93.7	82.58	73.67	77.87
400th	95.2	83.79	74.92	79.11
500th	96.8	85.28	76.08	80.45

Fig. 8

Experimental results using Amazon dataset at different Iterations.

5.3.5 Comparison of the performance evaluation of different classifier

For demonstrate the proposed technique’s efficacy, the presented APSO-LSTM based sentimental analysis compared with different algorithm namely, LSTM, ANN and SVM based sentimental analysis. Similarly, experimental results analyzed using different four dataset namely, Amazon review dataset, trip advisor dataset, demonstration dataset and book review dataset. Table 7 shows the comparative analysis.

Table 7
The results of Accuracy, Recall, Precision and F-measure of different classifiers

Dataset Methods Accuracy Precision Recall F-measure

% % % %

Amazon APSO-LSTM 96.8 85.28 76.08 80.4

Product PSO-LSTM 94.4 83.3 74.7 78.6

Review LSTM 92.5 82.3 72.6 77.1

ANN 81.9 79.5 72 76.8

SVM 91.8 81.0 73 75.6

Trip APSO-LSTM 97.8 87.28 78.08 82.42

Advisor PSO-LSTM 95.1 85 76.5 80.5

LSTM 93.5 84.2 75.8 79.1

ANN 82.1 81.5 74 77.6

SVM 92.8 83 75 78.8

Demonetization APSO-LSTM 93.2 82.3 73.5 77.4

PSO-LSTM 91.2 80.3 71.3 75.4

LSTM 89.2 79.5 70.9 74.3

ANN 79 76.5 69 72

SVM 88.9 78 70 73.8

Book APSO-LSTM 95.2 83.7 74.9 79.1

review PSO-LSTM 92.8 81.5 72.9 76.9

dataset LSTM 91.7 80.8 70.5 76

ANN 80.2 78 69.5 73.5

SVM 90.1 80 71 75.2

Dataset	Methods	Accuracy	Precision	Recall	F-measure
Amazon	APSO-LSTM	96.8	85.28	76.08	80.4
Product	PSO-LSTM	94.4	83.3	74.7	78.6
Review	LSTM	92.5	82.3	72.6	77.1
	ANN	81.9	79.5	72	76.8
	SVM	91.8	81.0	73	75.6
Trip	APSO-LSTM	97.8	87.28	78.08	82.42
Advisor	PSO-LSTM	95.1	85	76.5	80.5
	LSTM	93.5	84.2	75.8	79.1
	ANN	82.1	81.5	74	77.6
	SVM	92.8	83	75	78.8
Demonetization	APSO-LSTM	93.2	82.3	73.5	77.4
	PSO-LSTM	91.2	80.3	71.3	75.4
	LSTM	89.2	79.5	70.9	74.3
	ANN	79	76.5	69	72
	SVM	88.9	78	70	73.8
Book	APSO-LSTM	95.2	83.7	74.9	79.1
review	PSO-LSTM	92.8	81.5	72.9	76.9
dataset	LSTM	91.7	80.8	70.5	76
	ANN	80.2	78	69.5	73.5
	SVM	90.1	80	71	75.2

When analyzing Table 7, for Amazon product review dataset, our proposed strategy attained the maximum accuracy of 96.8% which is 94.4% by using PSO-LSTM based sentimental analysis, 81.9% by using ANN based sentimental analysis,92.5% by using LSTM based sentimental analysis and 91.8% by using SVM based sentimental analysis. Compared to ANN based sentimental analysis and SVM based sentimental analysis, APSO-LSTM and PSO-LSTM methods attained the better results. Similarly, our proposed method attained the higher precision as 85.28%, recall as 76.08% and F-Measure as 80.4%. Comparative analysis of proposed against existing using Amazon product review dataset is given in Fig. 9. For analyzing trip advisor dataset, proposed method attained the maximum accuracy of 97.8%, precision of 87.28%, and recall of 78.08% and F-measure of 82.42%.

Fig. 9

Comparative analysis using Amazon dataset.

Due to skip-gram model and APSO, the proposed method attained the better results compared to existing method. Comparative analysis of proposed against existing using trip advisor dataset is given in Fig. 10. For demonization dataset, the proposed strategy attained the maximum accuracy of 93.2%, which is 91.2% for utilizing PSO-LSTM based sentimental analysis, 79% for ANN based sentimental analysis and 88.9% for using SVM based sentimental analysis. Similarly, proposed model attained more precision of 82.3% which is only 80% in PSO-LSTM, 76.55 for using ANN and 78% for using SVM based sentimental analysis. Comparative analysis using demonetization review dataset is given in Fig. 11. For analyzing book review dataset, the proposed method attains the maximum accuracy compared to exiting methods. The graphical representation of book review dataset is given in Fig. 12.

Fig. 10

Comparative analysis using Trip adviser.

Fig. 11

Comparative analysis using demonetization.

Fig. 12

Comparative analysis using Book review dataset.

PSO-LSTM shows superior results over the classic algorithms LSTM, SVM and ANN due to its long term memory capabilities at predicting text sequence. PSO-LSTM solves the gradient vanishing and exploding problems. APSO assists LSTM in selecting best weights for the environment in less number of iterations. The computational complexity reduces when the number of iterations decreases. So APSO - LSTM ‘s ability in selecting optimal weights for neural network combined with a good hyper parameter choices leads to improved accuracy when compared to traditional LSTM.

6 Conclusion and future work

In this work, APSO-LSTM based sentimental analysis model is proposed. The skip gram model based feature extraction has been used for word embedding. Skip-gram Word to Vector representation requires less memory space and yield continuous higher accuracy when compared with other Word to vector representations. LSTM encoder is used for capturing global sentiment information. The aim of the work is to improve accuracy in sentiment analysis tasks. So, the model particularly combines APSO with LSTM to enhance the performance. The contribution of APSO in selecting weight parameters for LSTM neural network increases accuracy and decreases loss. APSO- LSTM prevents over-fitting problem and significantly reduces the training time of the large neural network. The mathematical expression of LSTM, Skip gram model and APSO has been clearly explained. For experimental analysis, four types of dataset has been used. The performance evaluation of proposed method has been done using different metrics like accuracy, recall, precision and F-measure. The proposed methodology attained the maximum accuracy of 96.8% for Amazon dataset, 97.8% for trip advisor dataset, 93.2% for demonetization dataset and 95.2% for book review dataset. Empirical results proves the superior performance of the proposed Methodology. As a future work, deep learning techniques will be incorporated for improving the results on different Natural Language Processing tasks like sentiment analysis, named entity recognition, text summaization etc.

References

Pozzi

F.A.

, Fersini

, Messina

, et al., Challenges of sentiment analysis in social networks: an overview, Journal of Sentiment Anal Soc Netw 1 (2017), 1–11.

Liu

, Sentiment Analysis and Opinion Mining, Williston: Morgan & Claypool (2012).

Pang

and Lee

, Opinion mining and sentiment analysis, FNT Inf Retrieval 2 (2008), 1–135.

Kharde

V.A.

and Sonawane

S.S.

, Sentiment Analysis of Twitter Data: A Survey of Techniques, International Journal of Computer Applications 139(11) (2016).

Pak

and Paroubek

, Twitter as a Corpus for Sentiment Analysis and Opinion Mining, In Proceedings of the Seventh Conference on International Language Resources and Evaluation (2010), 1320–1326.

Bifet and Frank

, Sentiment Knowledge Discovery inTwitter Streaming Data, In Proceedings of the 13th InternationalConference on Discovery Science, Berlin, Germany: Springer (2010), 1–15.

Agarwal

, Xie

, Vovsha

and Rambow

R.P.

, Sentiment Analysis of Twitter Data, In Proceedings of the ACL 2011 Workshop on Languages in Social Media (2011), 30–38.

Xia

, Zong

and Li

, Ensemble of feature sets and classification algorithms for sentiment classification, Information Sciences: an International Journal 181(6) (2011), 1138–1152.

Apoorv

, Boyi

, Ilia

, Owen

and Rebecca

, Sentiment analysis of Twitter data, in Proceedings of the Workshop on Languages in Social Media [D], LSM’11 (Association for Computational Linguistics, Stroudsburg, PA, USA, 2011), 30–38.

10.

Suppala

and Rao

, Sentiment Analysis Using Naïve Bayes Classifier, International Journal of Innovative Technology and Exploring Engineering 8(8) (2019).

11.

Arora

, Dhawan

and Singh

, Sentiment Analysis of Online Movies’ Reviews Using Improved k-Nearest Neighbor Classifier, Advances in Computer Science and Information Technology (ACSIT) 3(4) (2016), 241–245.

12.

Zainuddin

and Selamat

, Sentiment analysis using support vector machine, In 2014 International Conference on Computer, Communications, and Control Technology (I4CT), 333–337. IEEE, (2014).

13.

Mikolov

, Karafiat

, Burget

, Cernocký

and Khudanpur

, Recurrent neural network based language model, In Interspeech 2 (2010), 3.

14.

Legrand

and Collobert

, Deep neural networks for syntactic parsing of morphologically rich languages, In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics no.CONF (2016).

15.

Hemalatha

, SaradhiVarma

G.P.

and Govardhan

, Sentiment analysis tool using machine learning algorithms, International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) 2(2) (2013), 105–109.

16.

, Zhang

, He

and Chen

, MicroblogRumor Detection Based on Comment Sentiment and CNN-LSTM, Journal of Artificial Intelligence in China (2020), 148–156.

17.

Jin

, Yang

and Liu

, Stock closing price prediction based on sentiment analysis and LSTM, Neural Computing and Applications (2019).

18.

, Yang

, Li

, Fang

and Wang

, Lexicon-Enhanced LSTM With Attention for General Sentiment Analysis, IEEE Access: 2169-3536 @ 2018 IEEE 6 (2018), Digital Object Identifier 10.1109/ACCESS.2018.2878425

19.

, Meng

, Qiu

, Yu

and Wu

, Sentiment Analysis of Comment Texts Based on BiLSTM, IEEE Access: 2169-3536 @ 2019 IEEE 7 (2019), Digital Object Identifier 10.1109/ACCESS.2019.2909919.

20.

Latif

and Qamar

, A Novel Ensemble Approach for Feature Selection to Improve and Simplify the Sentimental Analysis, In Intelligent Computing-Proceedings of the Computing Conference 573–592. Springer, Cham, (2019).

21.

Graber

, Kallumadi

, Malberg

and Zaunseder

, Aspect-based sentiment analysis of drug reviews applying cross-domain and cross-data learning, In Proceedings of the 2018 International Conference on Digital Health (2018), 121–125.

22.

Da’u

, Salim

, Sentiment-Aware Deep Recommender System With Neural Attention Networks “2169-3536 2019 IEEE. Translations and content mining, 7 (2019), Digital Object Identifier 10.1109/ACCESS.2019.2907729

23.

Sailunaz

and Alhajj

, Emotion and Sentiment analysis from Twitter text”, Journal of Computational Science 1877-7503/@2019, Published by Elsevier Ltd,

24.

Ahamed

and Danti

, Effective Emoticon Based Framework for Sentimental Analysis of Web Data, In International Conference on Recent Trends in Image Processing and Pattern Recognition 622–633. Springer, Singapore, (2018).

25.

Shaukat

, Zulfiqar

A.A.

, Xiao

, Azeem

and Mahmood

, Sentiment analysis on IMDB using lexicon and neural networks, SN Applied Sciences 2(2) (2020), 1–10.

26.

Ankita and Saleena

, An Ensemble Classification System for Twitter Sentiment Analysis, (https://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science, @2018, Published by Elsevier Ltd.

27.

, Qi

, Tang

and Yu

, Bidirectional LSTM with self-attention mechanism and multi-channel features for sentiment classification, Journal of Neuro Computing 387 (2020), 63–77.

28.

Alqaryouti

, Siyam

, Monem

A.A.

and Shaalan

, Aspect-based sentiment analysis using smart government review data, Applied Computing and Informatics (2019).

29.

Ray

and Chakrabarti

, A Mixed approach of Deep Learning method and Rule-Based method to improve Aspect Level Sentiment Analysis”.

30.

Akyol

and Alatas

, Sentiment classification within online social media using whale optimization algorithm and social impact theory based optimization, Physica A: Statistical Mechanics and its Applications 540 (2020), 123094.

31.

Yang

S.Y.

, Mo

S.Y.K.

, Liu

and Kirilenko

A.A.

, Genetic programming optimization for a sentiment feedback strength based trading strategy, Neurocomputing 264 (2017), 29–41.

32.

Keshavarz

and Abadeh

M.S.

, ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs, Knowledge-Based Systems 122 (2017), 1–16.

33.

Marcacini

R.M.

, Rossi

R.G.

, Matsuno

I.P.

and Rezende

S.O.

, Cross-domain aspect extraction for sentiment analysis: A transductive learning approach, Decision Support Systems 114 (2018), 70–80.

34.

Meškel≐

and Frasincar

, ALDONAr: A hybrid solution for sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention model, Information Processing and Management (2020), 0306-4573/©2020 The Authors. Published by Elsevier Ltd, https://doi.org/10.1016/j.ipm.2020.102211.

35.

Liang

, (Member, IEEE), R. Feng, X. Liu, Y. Li, and X. Zhang, Gltm: A Global and Local Word Embedding-Based Topic Model for Short Texts, IEEE. Translations and content mining, 6 (2018), date of current version August 28, 2018. 2169–3536 2018 IEEE, Digital Object Identifier 10.1109/ACCESS.2018.2863260.

36.

, Sun

, Nie

, Li

and Liu

, An Enhanced LSTM for Trend Following of Time Series”, Xuzhou 221116, China, IEEE Access 7 (2019), Digital Object Identifier 10.1109/ACCESS.2019.2896621.

37.

Wang

, Zhao

, Gao

, Sotelo

M.A.

and Li

, Lane Work-Schedule of Toll Station Based on Queuing Theory and PSO-LSTM Model, IEEE Access 8 (2020), Digital Object Identifier 10.1109/ACCESS.2020.2992070.

38.

https://www.kaggle.com/bittlingmayer/amazonreviews.

39.

https://www.kaggle.com/andrewmvd/trip-advisor-hotel-reviews.

40.

https://www.kaggle.com/abhimicro3/demonetization-twitter-sentiment-analysis.

41.

https://www.kaggle.com/jealousleopard/goodreadsbooks.

42.

Kim

A.Y.

, Ha

J.G.

, Choi

and Moon

, Automated text analysis based on skip-gram model for food evaluation in predicting consumer acceptance, Computational Intelligence and Neuroscience 2018 (2018).

43.

Brahimi

, Touahria

and Tari

, Improving sentiment analysis: A combined approach, Journal of King Saud University – Computer and Information Sciences (2019), 1319–1578@

44.

Sherstinsky

, Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network, Physica D: Nonlinear Phenomena 404 (2020), 132306. https://doi.org/10.1016/j.physd.2019.132306.

45.

Tizhoosh

H.R.

, Opposition-based learning: a new scheme for machine intelligence, In International conference on computational intelligence for modelling, control and automation and international conference on intelligent agents, web technologies and internet commerce (CIMCA-IAWTIC’06) 1 695–701. IEEE, (2005).

Adaptive particle swarm optimization algorithm based long short-term memory networks for sentiment analysis

Abstract

Keywords

1 Introduction

2 Literature review

3 Dataset description

Table 1 Description of datasets Datasets Total Reviews Total positive reviews Total negative reviews Total Neutral reviews Amazon reviews 60000 25627 19567 14806 Trip Advisor 20000 10000 7000 3000 Demonetization Twitter data set 12974 2974 4936 5064 Books reviews 213269 147268 38434 27567

4.2 Words to a vector representation

5.1 Classifiers

5.3.1 Selection of parameter for skip-gram word embedding

Table 2 Hyperparameters for word embedding Hyper Parameters Choices The Architecture Skip gram (s1) Window size(w) 10 Iteration 30 Vector size 400 Activation function Negative sampling Subsampling rate 1e-3

Table 3 Accuracy obtained at different word embedding size on Amazon Review dataset WE Methods\WE sizes 100 200 300 400 500 N-gram 86 88 89 90.5 91.5 Skip-gram 90.1 91.5 93.7 95.2 96.8

Table 5 Parameters used in this work Hyper parameter Values Algorithm LSTM Epoch 500 Batchsize 32 Optimizer APSO Number of layers 3 Loss function MSE Learning_rate 0.001 Dropout_rate 0.5 Number of hidden unit 20 Training 80% Testing 20%

References

Table 1
Description of datasets

Datasets Total Reviews Total positive reviews Total negative reviews Total Neutral reviews

Amazon reviews 60000 25627 19567 14806

Trip Advisor 20000 10000 7000 3000

Demonetization Twitter data set 12974 2974 4936 5064

Books reviews 213269 147268 38434 27567

Table 2
Hyperparameters for word embedding

Hyper Parameters Choices

The Architecture Skip gram (s1)

Window size(w) 10

Iteration 30

Vector size 400

Activation function Negative sampling

Subsampling rate 1e-3

Table 3
Accuracy obtained at different word embedding size on Amazon Review dataset

WE Methods\WE sizes 100 200 300 400 500

N-gram 86 88 89 90.5 91.5

Skip-gram 90.1 91.5 93.7 95.2 96.8

Table 5
Parameters used in this work

Hyper parameter Values

Algorithm LSTM

Epoch 500

Batchsize 32

Optimizer APSO

Number of layers 3

Loss function MSE

Learning_rate 0.001

Dropout_rate 0.5

Number of hidden unit 20

Training 80%

Testing 20%