Abstract
Sentiment analysis has been gaining importance in many applications such as recommendation systems, the decision making support and prediction models. Sentiment analysis helps to understand and evaluate public opinion regarding social events, product services, and political trends, especially the feelings expressed through comments by users in social networks such as Twitter, Facebook, and Instagram. There have been a lot of research attempts to address the tweets sentiment analysis problem with high accuracy, particularly in case of tweets that express a single sentiment towards a single object. However, the results of the classification are not highly accurate in cases such as the following: a user expresses multiple sentiments towards a single object in a tweet; a user presents multiple sentiments towards multiple objects; and a user indicates a single sentiment towards multiple objects. Furthermore, the previous studies only analyze the sentiment of each tweet without considering the objects and the sentiment towards each object from an entire set of tweets. This study attempts to deal with the limitations of the previous methods; an approach is proposed herein, based on integrating the sentiment towards a particular object from all tweets related to that object. The proposed method focuses on determining the objects and indicating the sentiment towards the specific objects by combining the sentiments related to each object from the entire set of tweets. On experimental evaluation, the proposed method is observed to have achieved a fairly good result in terms of the error ratio and achieved information.
Introduction
In recent years, social network services have become widespread and necessary communication platforms in everyday life. Twitter is among the most popular social networks worldwide. As of the third quarter of 2018, Twitter averaged at 326 million monthly active users 1 . Twitter is very convenient to use; and celebrities, corporations, and organizations create and manage their Twitter accounts to interact with their fans and the public. Moreover, Twitter is probably the primary communication channel used by many people. Twitter users can freely publish tweets to express their feelings, find their friends, follow famous people they are interested in, and access information at any given period of time. Hence, the number of tweets is increasing rapidly, and their rate of generation is remarkably high. Therefore, Twitter has become a source of meaningful information to anyone who knows how to use it. The sentiments of users expressed through tweets can be used as reviews and survey responses of user opinions and social media, for applications ranging from marketing to customer service. Therefore, sentiment analysis is a potential topic of interest to many researchers, as it can be used to exploit the useful information on social networks [10].
Sentiment analysis is defined as the process of extracting subjective information from a document, sentence, or term [19]. Not only does it include extracting the polarity [24], stance [18], target [8], and the holder of opinion [13] but also it involves in analyzing the implicatures or effects to indicate aspects of the goal that people like or dislike [11].
The methods used for sentiment analysis can be divided into three levels namely the document level, sentence level, and the entity and aspect level [14]. The document level provides the polarity to classify a document as either positive, negative, or neutral. The sentence level provides the polarity for classifying a sentence into either of the classes objective or subjective; and the sentiment of the subjective sentences is in turn classified into two classes: positive or negative. The entity and aspect level explain what people like or dislike. In this study, the second and the third levels of sentiment analysis are examined.
There have been several published methods to solve the sentiment analysis problem. The methods used in previous studies for analyzing sentiment can be divided into three types [23]. The first approach relies on the famous machine learning algorithms. However, this approach has the following limitations: a large number of labeled training tweets are needed for the supervised methods, while the unsupervised techniques have to seek unlabeled training tweets [16]. The second approach is based on the lexicon-based methods [1, 6]. This approach depends on finding the sentiment lexicon. The limitation of the lexicon-based methods is that it is very time consuming, and cannot be used alone; it must be combined with other automated methods as a final check to avoid mistakes. The third approach is the hybrid methods by combining both above approaches [2, 12]. This method is widespread and play a pivotal role in the majority of sentiment analysis methods.
In general, the previous methods could achieve accurate results only in the case of simple tweets that reflect a single sentiment for a single object; for instance, the tweet “I have the worst anxiety because of the weather” presents a negative sentiment for a single object, weather. However, they often generate inaccurate classification results in cases such as the following: Tweets expressing two different sentiments for two different objects: For example, a tweet such as “I do not like rain, but I like the sunshine” would be classified as neutral by mentioned methods for sentiment analysis. However, this result is inaccurate because the user expresses a negative sentiment for the object “rain” and a positive sentiment for the object “sunshine.”
Tweets that present multiple sentiments for a single object: For example, a tweet such as “Rain is nothing but a thorn of roses” would be classified as neutral by mentioned methods for sentiment analysis although the actual expression is negative. Tweets that express a single sentiment for two different objects: For example, a tweet such as “I like rain, and I like the sunshine” would be classified as positive by mentioned methods for sentiment analysis. Although this is correct, it does not extract the user’s opinion fully.
Additionally, most of the previous studies implemented the following two steps: The first step was to determine whether a tweet expresses an opinion or not, and the second step classified the opinion tweets as positive and negative classes. The result of this process provides only the polarity scores of the tweets, while the intended objects remain unknown. For example, regarding the iPhone, one user tweeted: “The color is good,” while another user tweeted: “The sound is not good.” A sentiment analysis using the previous methods would classify the first tweet as positive and the second tweet as negative. Therefore, the manufacturer would not know which part of the iPhone received a positive response and which part received a negative one.
Therefore, a mechanism is required to address the above problems; and this motivates us to propose a method that would perform a sentiment analysis by integrating the sentiments from tweets with respect to a particular object. The proposed method involves the following steps: First, a large number of tweets that belong to a specific topic from several users are collected. Second, the objects and the sentiments with respect to each object are determined by using the Bidirectional Long Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) algorithms. Third, the sentiment of each tweet is determined by combining the objects and sentiments related to the objects in each tweet. Finally, the sentiment for each object from all tweets is integrated by computing the sentiment score of that object based on the frequency of the positive and negative components in all tweets.
The difference between the proposed method and the previous methods is that not only does this method classify the sentiment of each tweet such as negative, positive, or neutral but it also provides the objects related to the sentiments. In particular, this method integrates the overall sentiment for each object from tweets. The presented differences are also the main contributions of this study.
The rest of the paper is organized as follows: In Section 2, we briefly summarize the previous work on sentiment analysis approaches. The research problem of this study is described in Section 3. Section 4 presents the proposed method and the experimental results are presented in Section 5. The conclusion and future works are presented in Section 6.
Related works
There have been many published studies regarding the sentiment analysis of tweets, which can be divided into the following three methods: lexicon-based, machine learning-based, and a hybrid method. The machine learning-based method focuses on extracting various features related to lexical, syntactic, semantic, and sentiment polarity of tweets such as context information, Part-Of-Speech (POS) tags, n-grams, sentiment words, and word embeddings. A classifier is then built based on these features by using one of the machine learning algorithms, for example, the Naive Bayes [7], Support Vector Machine [3]. The lexicon-based method indicates the overall sentiment tendency of a given text by employing lexicons of words weighted with their sentiment, such as SentiWordNet (SWN) [4]. The hybrid approach combines the speed of the lexical approach and the accuracy of machine learning-based approach; and moreover, deep learning methods are known as the state-of-art methods in the area of sentiment analysis [20, 27]. Thus, the existing research has focused on classifying tweets into three classes namely positive, negative, and neutral. In this section, we analyze some specific cases to understand the motivation for our proposal.
Atanu Dey et al. [6] presented a methodology to create a lexicon known as the Senti-N-Gram. In this approach, the n-gram sentiment scores were extracted by using the rules. These scores were then compared with the analysis made by human annotators and found to be statistically equivalent. Finally, a sentiment classification methodology was proposed by counting the number of positive and negative sentences. The paper also proposed a ratio-based sentence-level sentiment classification approach by counting the number of positive and negative sentences. This approach has the following advantages: i) It is fully automatic and domain independent. ii) Human annotators are eliminated, thereby reducing human bias as well as the cost. iii) The computation time is further reduced in case of n-gram based sentiment analysis as the score from the stored lexicon can be reused. However, the method also has certain limitations: i) It cannot process all types of sentences; for example, the compound, conditional, and irony sentences. ii) The object related to the sentiment remains unknown. iii) The sentiment of all tweets related to each object is not extracted.
Asghar et al. [1] proposed a lexicon-based method to classify user sentiments from online communities. They combined different lexicons (e.g., SentiWordNet (SWN), user-defined dictionaries) and different classifiers (e.g., an SWN-based classifier, negation classifier, domain-specific classifier). The advantage of this approach is that not only the relationships among words but also domain-specific words are extracted to resolve the issue of domain dependency. However, this approach has the following drawbacks: i) Compound sentences and irony sentences are not considered. ii) The object of the sentiment is not identified. iii) The sentiment of all tweets related to each object is not extracted.
In study [2], the authors focused on presenting a unified framework for classifying tweets using a hybrid classification scheme. This method aims to improve the performance of Twitter-based sentiment analysis systems by combining four classifiers namely a slang classifier, an emoticon classifier, the SWN classifier, and an improved domain-specific classifier. This approach has the following advantages: i) Classification results are better than the baseline methods. ii) The framework is generalized and can classify tweets in any domain. However, the method has as the following limitations: i) The results are inaccurate with tweets containing compound sentences. ii) The method only focuses on analyzing the sentiment of each tweet. iii) The object of the sentiment is not extracted.
In the paper [27], the authors introduced a word embeddings method using unsupervised learning based on contextual latent semantic relationships and co-occurrence characteristics of words in tweets. These word embeddings were combined with n-grams features and the sentiment scores for words to create a set of sentiment features of tweets. This features set was integrated into a deep convolutional neural network for training and predicting sentiment labels. The authors compared the performance of their model to the baseline model on five Twitter data sets, and the results indicated that their model performed better for tweet sentiment analysis. However, there were some limitations: they could only classify tweets into two sentiment groups, i.e., positive and negative; they could not decide the object of the sentiment; and could not integrate sentiment for each object from all tweets.
In general, the previous studies have been able to classify tweets in any domain with high accuracy. However, the methods produced inaccurate results for tweets that expressed multiple sentiments; for example, a tweet such as “wind, wind, wind, I hate wind, I only like snow” would be labeled as neutral in their systems, while actually the result is negative for wind and positive for snow. Moreover, the accuracy of the classification results decreased in case of compound tweets; for example, a tweet such as “Rain is a kind of devil” is labeled neutral in their system although it has to be labeled as negative). In addition, the object of the sentiment is not considered; for example, in the cases of tweets such as “I like the most wind and hate the most sunshine,” which contains two objects, “wind″ and “sunshine, ″ corresponding to two sentiments, positive and negative. Moreover, the previous methods only indicated the sentiment of each tweet without presenting the general sentiment of all tweets towards each object.
Through the above analysis of the advantages and limitations of several previous methods, in this study, we focus on addressing, in a complex way, some of the issues including the extraction of the object of the sentiment, analyzing the sentiment of each tweet, and integrating the sentiment of all tweets towards each object by combining the BiLSTM and CRF algorithms as well as using the rating technique based on sentiment scores.
Research problem formalization
This section presents the basic concepts and definitions related to tweets such as objects, the sentiment of each tweet, and the sentiment related to each object in the entire set of tweets. The research problems are stated at the end of this section.
Definitions
We assume a finite set of tweets, T, that represents the opinions of users about a specific topic, and is represented as follows:
Let Pt be a set of positive words and Nt be a set of negative words. To determine necessary elements in tweets, each tweet has to be separated into a set of tokens. Let t i = {w1, w2, . . . , w g } be a set of tokens of tweet t i from T.
A tweet can have many objects. Let O i = {o1, o2, . . . , o m } be a set of objects of t i in which o j (j = 1, 2, …, m) be an object assigned to the sentiment related to the chosen topic.
Assume that S T = {s t 1 , s t 2 , . . . , s t n } be a set of sentiments from T where s t i be a sentiment of tweet t i .
Assume that S
O
= {s
o
1
, s
o
2
, . . . , s
o
m
} be a set of sentiments according to objects in entire tweets, and elements in S
O
can be repetitive. Let
In this study, we focus on finding a method to answer the main question as follows: How can determine the sentiment of objects by integrating sentiments of those objects from tweets? This question is partitioned in the set of three following sub-questions: For given tweets (T), how can the determination objects and their sentiments in each tweet? How to express the sentiment of each tweet based on the sentiment of objects in that tweet? How can a sentiment integration process be performed to determine the sentiment of objects from entire tweets?
Methods
This section presents the method to solve problems identified in Section 3.2. The proposed method consist of three main steps: Firstly, determining objects referring to sentiments; Secondly, indicating sentiment of each tweet; Finally, analyzing sentiment according to each object by integrating sentiment from tweets. The workflow of method is shown in Fig.1. The following subsections explain details of the proposed method.

Workflow of sentiment analysis of objects by integrating sentiment from tweets.
To determine objects related to sentiments, the features are extracted from in each tweet. In this work, the information referring to the lexical, syntactic, semantic, and polarity sentiment of words are employed as features.
n-grams: n-grams includes 1-gram, 2-grams, and 3-grams. Each type of n-grams is created based on the term frequency-inverse document frequency (TF-IDF) values. Each n-grams appearing in a tweet becomes an entry in the feature vector with the corresponding feature value of TF-IDF.
POS tags of words: POS tags of words are used as features for indicating words that are adjectives, verb, adverbs, nouns and noun phrases. These words support to determine sentiments and objects of sentiments existing in each tweet. The NLTK toolkit [5] is used to annotate the POS tags and implement tokenization for all the tweets. POS tags with their corresponding TF-IDF values are the syntactic features and feature values, respectively.
Word embeddings: A significant challenge when handling with tweets is that the lexicons and syntactics used in a tweet are informal and much different from normal sentences. Hence, it is challenging to correctly capture the properties related to the lexical and syntactic features. This problem is dealt by applying the word embeddings approach to compute tweet embedding representations.
In this study, the 300-dimensional pre-trained word embeddings from Glove 2 is used to compute a tweet embedding as the average of the embeddings of words in the tweet.
The Glove model is applied because it is an extension to the Word2vec model. This approach combines both the global statistics of matrix factorization techniques namely latent semantic analysis and social context-based learning in Word2vec [25]. Moreover, rather than using a window to define the local context, the glove model constructs an explicit word-context or a co-occurrence word matrix using statistics across the entire text corpus. Therefore, in general, the model may result in better word embeddings.
Special words: Special words include negation, intensifier, and diminisher words. Special words can reverse the polarity of sentiment words; for example, if a negation word appears before a positive word then the sentiment of the tweet is changed to negative and vice versa. These features are extracted using a window of 1 to 3 words before a sentiment word and search for these kinds of words.
Sentiment words: Sentiment words have a significant role in determining the sentiment polarity of tweets. Sentiment words consist of positive and negative words. A number of sentiment words in each tweet are used as a feature. The sentiment dictionaries provided by Hu and Liu [8] are employed for determining the positive and negative words in a tweet.
Determining objects of sentiments in each tweet
A combination of BiLSTM and CRF models [9] is used to determine objects and their sentiments in each tweet. The combination leverages the advantages of both models: the creative ability to extract the features of the LSTM model and the steady predictability of the CRF model [22].
The above model operates according to the following steps: Word embedding of each word is entered into the BiLSTM layer to extract the features presented in section 4.1. Then the CRF layer treats the above features to predict about label for each word. In addition to the information received from the BiLSTM layer, the CRF layer also relies on information from previously anticipated labels; for example, if the previous word was an object, it is highly probable that the present word is a sentiment word.
The parameters of the model include the word-embedding matrix, the weight matrix of the BiLSTM layer, and the transposition matrix of the CRF layer. All these parameters are updated during training on the labeled tweets via the back-propagation algorithm with Stochastic Gradient Descent (SGD). Moreover, during training, the dropout technique is applied to avoid over-fitting.
In addition, to increase the accuracy of identifying sentiments and the objects of sentiments in each tweet, we produce a set of rules and based on these rules to label the training data as follows:
Rule 1 (r1): If the tweet contains sentiment words complementary for a noun or noun phrase (denoted by n) in a tweet, then n is marked as a word indicating the object; for example, “I have a nice bag.” or “She has nice eyes.”, then “eyes” and “bag” are identified as objects of sentiments.
Rule 2 (r2): If the tweet contains sentiment words and n stands after preposition, then n is a word determining the object in tweet. e.g., “iPhone is good at sound”, then “sound” is one object in this tweet.
Rule 3 (r3): If the tweet contains sentiment words and n stands after normal verb containing sentiment, then n is a word determining the object in tweet. e.g., “I like sunshine”, then “sunshine” is determined as one object.
Rule 4 (r4): If the tweet contains sentiment words and n stands before “to be” verb, then n is determined as a word indicating the object. e.g., “The flower is nice.” Follow this rule, we have one object as “flower.”
The order of priority for the rules is r1, r2, r3, r4; this means that r1 takes precedence over r2, and so on; for example, when in a tweet, n satisfies both r2 and r4 then r2 is preferred.
The following example provides a detailed analysis to clearly understand the steps to determine the objects of sentiments in tweets.
t1: “I do not like rain, but I like the sunshine.”
t2: “Rain is nothing but a thorn of roses.”
t3: “I like rain, and I like the sunshine.”
t4: “Sunshine is lovely, but the wind is uncomfortable.”
t5: “The rain is sad, and the sunshine is also sad.”
t6: “Sunshine is good for me.”
a) For t1: tag(rain) = ‘NOUN’, not like ∈Nt, and Θ (notlike, rain) =1. Thus o1 = rain. tag(sunshine)=‘NOUN’, like ∈Pt, and Θ (like, sunshine) =1. Thus o2 = sunshine. From o1 and o2 as a result O1 = {rain, sunshine}.
b) For t2: tag(rain) = ‘NOUN’, nothing ∈Nt, and Θ (nothing, rain) =1. Thus o1 = rain. From o1 as a result O2 = {rain}.
c) For t3: tag(rain) = ‘NOUN’, like ∈Pt, and Θ (like, rain) =1. Thus o1 = rain. tag(sunshine) = ‘NOUN’, like ∈Pt, and Θ (like, sunshine) =1. Thus o2 = sunshine. From o1 and o2 as a result O3 = {rain, sunshine}.
d) For t4: tag(sunshine) = ‘NOUN’, lovely ∈Pt, and Θ (lovely, sunshine) =1. Thus o1 = sunshine. tag(wind) = ‘NOUN’, uncomfortable ∈Nt, and Θ (uncomfortable, wind) =1. Thus o2 = wind. From o1 and o2 as a result O4 = {sunshine, wind}.
e) For t5: tag(rain) = ‘NOUN’, sad ∈Nt, and Θ (sad, rain) =1. Thus o1 = rain. tag(sunshine) = ‘NOUN’, sad ∈Nt, and Θ (sad, sunshine) =1. Thus o2 = sunshine. From o1 and o2 as a result O5 = {rain, sunshine}.
f) For t6: tag(sunshine) = ‘NOUN’, good ∈Pt, and Θ (good, sunshine) =1. Thus o1 = sunshine. From o1 as a result O6 = {sunshine}. From O1, O2, O3, O4, O5, and O6 as a result O = {rain, sunshine, wind}
Determining sentiment of each tweet
The sentiment of each tweet is determined by implementing the two following steps: First, sentiments of objects in each tweet are determined by using the combination of BiLSTM and CRF models as explained in Section 4.2. Second, objects and sentiments related to these objects in each tweet are combined to create the sentiment of tweet. In addition, to support the determination sentiments of objects, the following set of rules are devised to label the training data:
Rule 5 (r5): Rule related to negation words. This rule is used to identify the sentiments of objects when negation words appear before sentiment words in the tweet (see Table 1).
Description of r5
Description of r5
Rule 6 (r6): Rule related to intensifier words. This rule is applied to decide the sentiments of objects when the intensifier words appear before sentiment words in the tweet (see Table 2).
Description of r6
Rule 7 (r7): Rule related to diminisher words. This rule is employed to determine the sentiments of objects when the diminisher words precede the sentiment words in the tweet (see Table 3).
Description of r7
Rule 8 (r8): Rule related to sentiment words. This rule is used to identify the sentiments of objects when consecutive sentiment words appear in the tweet (see Table 4).
Description of r8
The order of priority of the rules r5, r6, r7 and r8 is from left to right, meaning that in the tweet, if the rule to the left is satisfied first then that rule takes precedence. After objects and sentiments of objects are extracted, the sentiment of each tweet is analyzed by applying Definition 3 in Section 3.1. For greater clarity, the following example is executed and analyzed based on the objects extracted in Ex.1.
a) For t1:
o1 = rain, notlike ∈ Nt, Θ (notlike, rain) =1.
Hence, e o 1 = ¬ rain.
o2 = sunshine, like ∈ Pt, Θ (like, sunshine) =1. Hence, e o 2 = sunshine.
From e o 1 , e o 2 as a result s t 1 ={¬ rain, sunshine}.
b) For t2:
o1 = rain, nothing ∈ Nt, Θ (nothing, rain) =1. Hence, e o 1 = ¬ rain.
From e o 1 as a result s t 2 = {¬ rain}.
c) For t3:
o1 = rain, like ∈ Pt, Θ (like, rain) =1.
Hence, e o 1 = {rain}.
o2 = sunshine, like ∈ Pt, Θ (like, sunshine) =1.
Hence, e o 2 = sunshine.
From e o 1 , e o 2 as a result s t 3 = {rain, sunshine}.
d) For t4:
o1 = sunshine, lovely ∈ Pt, Θ (lovely, sunshine) =1.
Hence, e o 1 = sunshine.
o2 = wind, uncomfortable ∈ Nt,
Θ (uncomfortable, wind) =1.
Hence, e o 2 = ¬ wind.
From e o 1 , e o 2 as a result s t 4 = {sunshine, ¬ wind}.
e) For t5:
o1 = rain, sad ∈ Nt, Θ (sad, rain) =1. Hence, e o 1 = ¬ rain. o2 = sunshine, sad ∈ Nt, Θ (sad, sunshine) =1. Hence, e o 2 = ¬ sunshine.
From e o 1 , e o 2 as a result s t 5 = {¬ rain, ¬ sunshine}.
f) For t6:
o1 = sunshine, good ∈ Pt, Θ (good, sunshine) =1.
Hence, e o 1 = sunshine.
From e o 1 as a result s t 6 = {sunshine}.
Therefore, S T = {{¬ rain, sunshine} , {¬ rain} , {rain, sunshine} , {sunshine, ¬ wind} , {¬ rain, ¬ sunshine} , {sunshine}}.
The results of Ex.2 show that, unlike previous methods, this method yields more detailed results. For example, if s t 2 = {¬ rain}, it means that t2 has a negative sentiment, which is assigned to object rain. In others words, the object of the sentiment is also identified in addition to the sentiment of the tweet.
This section focus on determining the overall sentiment of each objects by integrating sentiments from all tweets. For each object, we compute the sentiment score of positive and negative components by applying Equations 5 and 6 based on the frequency of these components in all tweets. Then the sentiment score of objects is calculated by using Equation 7. Furthermore, to analyze the sentiment of each object in more detail, we give a scale to divide the sentiment of each object into different levels based on the sentiment score of that object. In this work, the thresholds for the sentiment score are chosen in an interval [-1,+1] based on empirical and some previous method [21, 26]. The steps to integrate sentiments according to each object from tweets are explained below in Ex.3.
a) For o1 = rain: frequency (rain+) =1, frequency (rain-) =3,
score (rain+) =0.25, score (rain-) =0.75,
score (rain) =0.25 - 0.75 = -0.5.
Thus se rain = negative .
Therefore, s rain = 〈rain, negative〉 .
b) For o2 = sunshine:
frequency (sunshine+) =4,
frequency (sunshine-) =1,
score (sunshine+) =0.8,
score (sunshine-) =0.2,
score (sunshine) =0.8 - 0.2 = 0.6,
Thus se sunshine = verypositive .
Therefore s sunshine = 〈sunshine, verypositive〉 .
c) For o3 = wind:
frequency (wind+) =0, frequency (wind-) =1,
score (wind+) =0, score (wind-) =1,
score (wind) =0 - 1 = -1,
Thus se wind = verynegative .
Therefore s wind = 〈wind, verynegative〉 .
Hence, S O = {〈rain, negative〉, 〈sunshine, very positive〉, 〈wind, verynegative〉}.
Experiment
Focusing on three research problems presented in Section 3.2, we proposed a method and verified it through experimentation on a real dataset. This section presents and analyzes the experimental results of the proposed method.
Data acquisition
In this study, the Python package Tweepy 3 is used to collect 4758 tweets related to Weather topic in English. The data is pre-processed to remove unnecessary elements such as punctuation marks, re-tweet symbols, URLs, hashtags, and query term. In addition, a describing text is replaced for an emoji icon in a tweet by using the Python emoji package 4 . Since tweets are usually informal in which users can use acronyms as well as wrong spellings, the accuracy of the results can be affected. Therefore, the Python-based Aspell library 5 is employed to correct the spelling of tweets. We scan all words in tweets and search for the words belonging to the two available sentiment dictionaries. This research does not consider tweets that do not contain any word belongs to these dictionaries. Since each tweet represents a person’s opinion on a topic, it is difficult to exist a tweet that contains two opposing sentiments directed at the same object. Therefore, we do not consider tweets that indicate two opposing sentiments for the same object. The number of tweets after pre-processing is 2057 tweets. These tweets are then divided and stored into two separate database files for training and testing. The training data consists of 1440 tweets, and the testing data includes 617 tweets. The tweets are assigned labels. The data was annotated with three labels (i.e., Positive, Negative, and Object) by five manual annotators. The training data file contains two columns separated by a tab space. Each token has placed in a separate line such that the first item on each line is a token and the second item is a label. The tokens that do not belong to factors which need to interest were annotated as Other. We annotated the test set as the gold standard to assess the performance. There are 849 tokens indicating the object, 557 tokens determining positive sentiment, 292 tokens indicating negative sentiment which need to be predicted in the test set.
Evaluation
The effectiveness of the proposed method is evaluated by using different evaluation metrics namely Precision (P), Recall (R), F-measure (F1), and Accuracy as follows:
In addition, the mean absolute error MAE
m
[17] is applied to estimate error ratio of objects polarity method. MAE
m
is calculated as follows:
a. Determining objects
In the testing set, there are twelve objects that are described by at least one sentiment word including: rain, wind, sunshine, snow, fog, ice, cloud, hail, rainbow, frost, temperature, storm. Because of the limitation of space, the objects are denoted as o j (j = 1, . . . , 12), respectively. The result of the object detection process is shown in Table 5.
Confusion Matrix of object detection
Confusion Matrix of object detection
Using the confusion matrix in Table 5 and Equations 8, 9, and 10, the performance of the object detection in the tweets was calculated.
Performance of detecting objects
Table 6 shows that the performance of the object detection is fairly good, in which, the average P is 72.82%; The average R is 80.05%, and the average F1 score is 76.03%. According to our assessment, one of the main reasons to achieve this performance is that the combination of BiLSTM and CRF models has promoted the advantages of both models in detecting objects and their sentiments. In addition, the rules provided help the training model to determine, more accurately, the location of words that indicate objects and words that define sentiments of objects.
Fig.2 shows that determining of the specific objects generally gives well result on classifying objects as o2, o3, o8, o11, o12. However, the performance of classification object o10 does not obtain as desirable. Concretely, the F1 score is only 56.3%. Intuitively, one of the main reasons for the low performance is that the training data has less words indicating object o10. We believe that the construction of a large data warehouse and the balance between words indicating factors are satisfied, this difference can be significantly improved.

Compare performance in specific objects.
b. Analyzing sentiment of each tweet
From 617 tweets in the testing set, we have extracted the sentiment of each tweet, which is presented in Table 7. where 1s - 1o means that the tweet expresses a single sentiment for one object, and the remaining symbols have similar meanings.
Result of sentiment of each tweet
The results indicate that tweets presenting a single sentiment for a single object and two different sentiments for two objects achieved better performance than other cases. Since such tweets have a balance between the words indicating objects and the sentiment words, it is easy to detect and assign a label by using the algorithm. Besides, unlike previous methods, our method analyzes the sentiment of each tweet by combining the sentiment of objects existing in each tweet. The previous methods only extract sentiments of users while the objects at which the sentiments are directed remained unknown. Moreover, most of the other methods only mention sentiment of each tweet without explicitly specifying the sentiment for each object existing in each tweet. The effectiveness of the proposed method can be further clarified by the data in Table 7 and the result of Ex.2, we not only know the sentiments of the speaker but also grasp objects of those sentiments. (e.g., “I do not like rain” shows the negative sentiment with “rain” object), and objects sentiment on the entire tweet is also expressed (e.g., for weather topic, most of the opinions give positive sentiment with “wind” object, very positive with “snow”, etc.). Therefore, the result of the proposed method contains more information.
c. Analyzing sentiment from tweets according to each object
To evaluate the performance of the proposed method, we requested five humans as experts defining sentiment of twelve objects by integrating sentiments from 617 tweets. Then, these results are used to compare to the results which our method has given by using the mean absolute error. The actual and predicted result of sentiment integration from entire tweets according to each specific object are shown in Table 8.
The actual and predicted result of sentiment integration from tweets according to specific objects
From Table 8 and by applying Equation 12, we have the mean absolute error of the proposed method as
Table 9 represents the final result of sentiment integration from sentiments of tweets according to objects. The result shows that the method generally did not perform well with the very positive sentiment (the true predicted ratio is
From the above analysis, we find that the results of the sentiment analysis for each object using the proposed method is fairly good in terms of the error ratio and achieved information.
The result of integrating sentiments from tweets according to specific objects (Very positive, Positive, Neutral, Negative, and Very Negative are denoted by VP, P, NE, N, and VN, respectively)
This study proposed a method for analyzing the overall sentiment related to each object from the entire set of tweets by sentiment integration of each object from different tweets. The method includes the following steps: First, the proposed method extracted features to determine the objects and the sentiments of the objects. Second, the sentiment in each tweet was determined in detail by clearly presenting some information as objects and the sentiment related to these objects. Finally, the sentiment related to a specific object was analyzed by integrating the sentiments from all the tweets. The experimental results show that the proposed method significantly improved the performance in terms of the error ratio and achieved information. Such results can help, for example, a manufacturer grasp the users’ opinion about parts of the product; using this, the manufacturer can come up with the right development strategy for the product, which can create products that meet the needs of the users.
There are however some limitations of the proposed approach, which include the impact of sarcastic tweets and tweets containing conditional sentences that cannot be considered in the sentiment analysis. In future, we plan to address the limitations of the method proposed in this study; especially, we hope to analyze the impact of sarcastic tweets and tweets containing conditional sentences on sentiment analysis, which would be another interesting research direction.
