A sentiment analysis method of objects by integrating sentiments from tweets

Abstract

Sentiment analysis has been gaining importance in many applications such as recommendation systems, the decision making support and prediction models. Sentiment analysis helps to understand and evaluate public opinion regarding social events, product services, and political trends, especially the feelings expressed through comments by users in social networks such as Twitter, Facebook, and Instagram. There have been a lot of research attempts to address the tweets sentiment analysis problem with high accuracy, particularly in case of tweets that express a single sentiment towards a single object. However, the results of the classification are not highly accurate in cases such as the following: a user expresses multiple sentiments towards a single object in a tweet; a user presents multiple sentiments towards multiple objects; and a user indicates a single sentiment towards multiple objects. Furthermore, the previous studies only analyze the sentiment of each tweet without considering the objects and the sentiment towards each object from an entire set of tweets. This study attempts to deal with the limitations of the previous methods; an approach is proposed herein, based on integrating the sentiment towards a particular object from all tweets related to that object. The proposed method focuses on determining the objects and indicating the sentiment towards the specific objects by combining the sentiments related to each object from the entire set of tweets. On experimental evaluation, the proposed method is observed to have achieved a fairly good result in terms of the error ratio and achieved information.

Keywords

Sentiment-analysis sentiment-integration object-determination

1 Introduction

In recent years, social network services have become widespread and necessary communication platforms in everyday life. Twitter is among the most popular social networks worldwide. As of the third quarter of 2018, Twitter averaged at 326 million monthly active users 1 . Twitter is very convenient to use; and celebrities, corporations, and organizations create and manage their Twitter accounts to interact with their fans and the public. Moreover, Twitter is probably the primary communication channel used by many people. Twitter users can freely publish tweets to express their feelings, find their friends, follow famous people they are interested in, and access information at any given period of time. Hence, the number of tweets is increasing rapidly, and their rate of generation is remarkably high. Therefore, Twitter has become a source of meaningful information to anyone who knows how to use it. The sentiments of users expressed through tweets can be used as reviews and survey responses of user opinions and social media, for applications ranging from marketing to customer service. Therefore, sentiment analysis is a potential topic of interest to many researchers, as it can be used to exploit the useful information on social networks [10].

Sentiment analysis is defined as the process of extracting subjective information from a document, sentence, or term [19]. Not only does it include extracting the polarity [24], stance [18], target [8], and the holder of opinion [13] but also it involves in analyzing the implicatures or effects to indicate aspects of the goal that people like or dislike [11].

The methods used for sentiment analysis can be divided into three levels namely the document level, sentence level, and the entity and aspect level [14]. The document level provides the polarity to classify a document as either positive, negative, or neutral. The sentence level provides the polarity for classifying a sentence into either of the classes objective or subjective; and the sentiment of the subjective sentences is in turn classified into two classes: positive or negative. The entity and aspect level explain what people like or dislike. In this study, the second and the third levels of sentiment analysis are examined.

There have been several published methods to solve the sentiment analysis problem. The methods used in previous studies for analyzing sentiment can be divided into three types [23]. The first approach relies on the famous machine learning algorithms. However, this approach has the following limitations: a large number of labeled training tweets are needed for the supervised methods, while the unsupervised techniques have to seek unlabeled training tweets [16]. The second approach is based on the lexicon-based methods [1, 6]. This approach depends on finding the sentiment lexicon. The limitation of the lexicon-based methods is that it is very time consuming, and cannot be used alone; it must be combined with other automated methods as a final check to avoid mistakes. The third approach is the hybrid methods by combining both above approaches [2, 12]. This method is widespread and play a pivotal role in the majority of sentiment analysis methods.

In general, the previous methods could achieve accurate results only in the case of simple tweets that reflect a single sentiment for a single object; for instance, the tweet “I have the worst anxiety because of the weather” presents a negative sentiment for a single object, weather. However, they often generate inaccurate classification results in cases such as the following:

Tweets expressing two different sentiments for two different objects: For example, a tweet such as “I do not like rain, but I like the sunshine” would be classified as neutral by mentioned methods for sentiment analysis. However, this result is inaccurate because the user expresses a negative sentiment for the object “rain” and a positive sentiment for the object “sunshine.”

Tweets that present multiple sentiments for a single object: For example, a tweet such as “Rain is nothing but a thorn of roses” would be classified as neutral by mentioned methods for sentiment analysis although the actual expression is negative.

Tweets that express a single sentiment for two different objects: For example, a tweet such as “I like rain, and I like the sunshine” would be classified as positive by mentioned methods for sentiment analysis. Although this is correct, it does not extract the user’s opinion fully.

Additionally, most of the previous studies implemented the following two steps: The first step was to determine whether a tweet expresses an opinion or not, and the second step classified the opinion tweets as positive and negative classes. The result of this process provides only the polarity scores of the tweets, while the intended objects remain unknown. For example, regarding the iPhone, one user tweeted: “The color is good,” while another user tweeted: “The sound is not good.” A sentiment analysis using the previous methods would classify the first tweet as positive and the second tweet as negative. Therefore, the manufacturer would not know which part of the iPhone received a positive response and which part received a negative one.

Therefore, a mechanism is required to address the above problems; and this motivates us to propose a method that would perform a sentiment analysis by integrating the sentiments from tweets with respect to a particular object. The proposed method involves the following steps: First, a large number of tweets that belong to a specific topic from several users are collected. Second, the objects and the sentiments with respect to each object are determined by using the Bidirectional Long Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) algorithms. Third, the sentiment of each tweet is determined by combining the objects and sentiments related to the objects in each tweet. Finally, the sentiment for each object from all tweets is integrated by computing the sentiment score of that object based on the frequency of the positive and negative components in all tweets.

The difference between the proposed method and the previous methods is that not only does this method classify the sentiment of each tweet such as negative, positive, or neutral but it also provides the objects related to the sentiments. In particular, this method integrates the overall sentiment for each object from tweets. The presented differences are also the main contributions of this study.

The rest of the paper is organized as follows: In Section 2, we briefly summarize the previous work on sentiment analysis approaches. The research problem of this study is described in Section 3. Section 4 presents the proposed method and the experimental results are presented in Section 5. The conclusion and future works are presented in Section 6.

2 Related works

There have been many published studies regarding the sentiment analysis of tweets, which can be divided into the following three methods: lexicon-based, machine learning-based, and a hybrid method. The machine learning-based method focuses on extracting various features related to lexical, syntactic, semantic, and sentiment polarity of tweets such as context information, Part-Of-Speech (POS) tags, n-grams, sentiment words, and word embeddings. A classifier is then built based on these features by using one of the machine learning algorithms, for example, the Naive Bayes [7], Support Vector Machine [3]. The lexicon-based method indicates the overall sentiment tendency of a given text by employing lexicons of words weighted with their sentiment, such as SentiWordNet (SWN) [4]. The hybrid approach combines the speed of the lexical approach and the accuracy of machine learning-based approach; and moreover, deep learning methods are known as the state-of-art methods in the area of sentiment analysis [20, 27]. Thus, the existing research has focused on classifying tweets into three classes namely positive, negative, and neutral. In this section, we analyze some specific cases to understand the motivation for our proposal.

Atanu Dey et al. [6] presented a methodology to create a lexicon known as the Senti-N-Gram. In this approach, the n-gram sentiment scores were extracted by using the rules. These scores were then compared with the analysis made by human annotators and found to be statistically equivalent. Finally, a sentiment classification methodology was proposed by counting the number of positive and negative sentences. The paper also proposed a ratio-based sentence-level sentiment classification approach by counting the number of positive and negative sentences. This approach has the following advantages: i) It is fully automatic and domain independent. ii) Human annotators are eliminated, thereby reducing human bias as well as the cost. iii) The computation time is further reduced in case of n-gram based sentiment analysis as the score from the stored lexicon can be reused. However, the method also has certain limitations: i) It cannot process all types of sentences; for example, the compound, conditional, and irony sentences. ii) The object related to the sentiment remains unknown. iii) The sentiment of all tweets related to each object is not extracted.

Asghar et al. [1] proposed a lexicon-based method to classify user sentiments from online communities. They combined different lexicons (e.g., SentiWordNet (SWN), user-defined dictionaries) and different classifiers (e.g., an SWN-based classifier, negation classifier, domain-specific classifier). The advantage of this approach is that not only the relationships among words but also domain-specific words are extracted to resolve the issue of domain dependency. However, this approach has the following drawbacks: i) Compound sentences and irony sentences are not considered. ii) The object of the sentiment is not identified. iii) The sentiment of all tweets related to each object is not extracted.

In study [2], the authors focused on presenting a unified framework for classifying tweets using a hybrid classification scheme. This method aims to improve the performance of Twitter-based sentiment analysis systems by combining four classifiers namely a slang classifier, an emoticon classifier, the SWN classifier, and an improved domain-specific classifier. This approach has the following advantages: i) Classification results are better than the baseline methods. ii) The framework is generalized and can classify tweets in any domain. However, the method has as the following limitations: i) The results are inaccurate with tweets containing compound sentences. ii) The method only focuses on analyzing the sentiment of each tweet. iii) The object of the sentiment is not extracted.

In the paper [27], the authors introduced a word embeddings method using unsupervised learning based on contextual latent semantic relationships and co-occurrence characteristics of words in tweets. These word embeddings were combined with n-grams features and the sentiment scores for words to create a set of sentiment features of tweets. This features set was integrated into a deep convolutional neural network for training and predicting sentiment labels. The authors compared the performance of their model to the baseline model on five Twitter data sets, and the results indicated that their model performed better for tweet sentiment analysis. However, there were some limitations: they could only classify tweets into two sentiment groups, i.e., positive and negative; they could not decide the object of the sentiment; and could not integrate sentiment for each object from all tweets.

In general, the previous studies have been able to classify tweets in any domain with high accuracy. However, the methods produced inaccurate results for tweets that expressed multiple sentiments; for example, a tweet such as “wind, wind, wind, I hate wind, I only like snow” would be labeled as neutral in their systems, while actually the result is negative for wind and positive for snow. Moreover, the accuracy of the classification results decreased in case of compound tweets; for example, a tweet such as “Rain is a kind of devil” is labeled neutral in their system although it has to be labeled as negative). In addition, the object of the sentiment is not considered; for example, in the cases of tweets such as “I like the most wind and hate the most sunshine,” which contains two objects, “wind″ and “sunshine, ″ corresponding to two sentiments, positive and negative. Moreover, the previous methods only indicated the sentiment of each tweet without presenting the general sentiment of all tweets towards each object.

Through the above analysis of the advantages and limitations of several previous methods, in this study, we focus on addressing, in a complex way, some of the issues including the extraction of the object of the sentiment, analyzing the sentiment of each tweet, and integrating the sentiment of all tweets towards each object by combining the BiLSTM and CRF algorithms as well as using the rating technique based on sentiment scores.

3 Research problem formalization

This section presents the basic concepts and definitions related to tweets such as objects, the sentiment of each tweet, and the sentiment related to each object in the entire set of tweets. The research problems are stated at the end of this section.

3.1 Definitions

We assume a finite set of tweets, T, that represents the opinions of users about a specific topic, and is represented as follows: $T = {t_{1}, t_{2}, . . ., t_{n}},$ where n is the number of gathered tweets.

Let Pt be a set of positive words and Nt be a set of negative words. To determine necessary elements in tweets, each tweet has to be separated into a set of tokens. Let t_i = {w₁, w₂, . . . , w_g} be a set of tokens of tweet t_i from T.

Definition 1. A sentiment relation between token w_k and w_h (w_h ∈ Pt ∪ Nt) is defined by function Θ, as represented by the following expression: $Θ (w_{h}, w_{k}) = {\begin{matrix} 1, & if w_{k} referring to w_{h} \\ 0, & otherwise . \end{matrix}$ (1)

A tweet can have many objects. Let O_i = {o₁, o₂, . . . , o_m} be a set of objects of t_i in which o_j (j = 1, 2, …, m) be an object assigned to the sentiment related to the chosen topic.

Definition 2. An object o_j of sentiments in a tweet is a token w_k that satisfies the two conditions simultaneously such as, w_k has to be a noun or noun phrase, and related to at least one sentiment word existing in that tweet. o_j is defined as follows: $\begin{matrix} o_{j} = {w_{k} | tag (w_{k}) = {squoNOUN}^{'}, \\ \exists w_{h} \in t_{i} : Θ (w_{k}, w_{h}) = 1} . \end{matrix}$

Assume that S_T = {s_{t
₁}, s_{t
₂}, . . . , s_{t
_n}} be a set of sentiments from T where s_{t
_i} be a sentiment of tweet t_i.

Definition 3. The sentiment of the tweet is represented by a set of sentiments associated with objects. The sentiment of the object is denoted by an object variable respective if the sentiment is positive and by a negation symbol (¬) stand is placed before the variable representing the object if the sentiment is negative. s_{t
_i} is defined as follows: $s_{t_{i}} = {e_{o_{j}}, j = 1, . . ., m}$ (2) where e_{o
_j} is the sentiment of the object o_j and is determined as

$e_{o_{j}} = {\begin{matrix} \neg o_{j}, & if ((\exists w_{k} \in Nt) \land (Θ (o_{j}, w_{k}) = 1)) \\ o_{j}, & if ((\exists w_{k} \in Pt) \land (Θ (o_{j}, w_{k}) = 1)) \end{matrix}$ (3)

Assume that S_O = {s_{o
₁}, s_{o
₂}, . . . , s_{o
_m}} be a set of sentiments according to objects in entire tweets, and elements in S_O can be repetitive. Let $o_{j}^{+}$ and $o_{j}^{-}$ be the positive and negative sentiment components of the object o_j, respectively [15].

Definition 4. A sentiment of the object o_j in T, denoted by s_{o
_j}, is determined by integrating sentiments from S_T in which those sentiments associated with object o_j. Let se_{o
_j} be a sentiment degree of object o_j, s_{o
_j} is represented by a double 〈o_j, se_{o
_j}〉, and se_{o
_j} is assessed as follows:

${se}_{o_{j}} = {\begin{matrix} very negative, if score (o_{j}) < - 0.5 \\ negative, if - 0.5 \leq score (o_{j}) < - 0.1 \\ neutral, if - 0.1 \leq score (o_{j}) \leq 0.1 \\ positive, if 0.1 < score (o_{j}) \leq 0.5 \\ very positive, if score (o_{j}) > 0.5 \end{matrix}$ (4) where $score (o_{j}^{+}) = \frac{frequency (o_{j}^{+})}{frequency (o_{j}^{+}) + frequency (o_{j}^{-})} .$ (5) $score (o_{j}^{-}) = \frac{frequency (o_{j}^{-})}{frequency (o_{j}^{+}) + frequency (o_{j}^{-})} .$ (6)

$score (o_{j}) = scre (o_{j}^{+}) - score (o_{j}^{-}) .$ (7)

3.2 Problems

In this study, we focus on finding a method to answer the main question as follows: How can determine the sentiment of objects by integrating sentiments of those objects from tweets? This question is partitioned in the set of three following sub-questions:

For given tweets (T), how can the determination objects and their sentiments in each tweet?

How to express the sentiment of each tweet based on the sentiment of objects in that tweet?

How can a sentiment integration process be performed to determine the sentiment of objects from entire tweets?

4 Methods

This section presents the method to solve problems identified in Section 3.2. The proposed method consist of three main steps: Firstly, determining objects referring to sentiments; Secondly, indicating sentiment of each tweet; Finally, analyzing sentiment according to each object by integrating sentiment from tweets. The workflow of method is shown in Fig.1. The following subsections explain details of the proposed method.

Fig. 1

Workflow of sentiment analysis of objects by integrating sentiment from tweets.

4.1 Feature extraction

To determine objects related to sentiments, the features are extracted from in each tweet. In this work, the information referring to the lexical, syntactic, semantic, and polarity sentiment of words are employed as features.

n-grams: n-grams includes 1-gram, 2-grams, and 3-grams. Each type of n-grams is created based on the term frequency-inverse document frequency (TF-IDF) values. Each n-grams appearing in a tweet becomes an entry in the feature vector with the corresponding feature value of TF-IDF.

POS tags of words: POS tags of words are used as features for indicating words that are adjectives, verb, adverbs, nouns and noun phrases. These words support to determine sentiments and objects of sentiments existing in each tweet. The NLTK toolkit [5] is used to annotate the POS tags and implement tokenization for all the tweets. POS tags with their corresponding TF-IDF values are the syntactic features and feature values, respectively.

Word embeddings: A significant challenge when handling with tweets is that the lexicons and syntactics used in a tweet are informal and much different from normal sentences. Hence, it is challenging to correctly capture the properties related to the lexical and syntactic features. This problem is dealt by applying the word embeddings approach to compute tweet embedding representations.

In this study, the 300-dimensional pre-trained word embeddings from Glove 2 is used to compute a tweet embedding as the average of the embeddings of words in the tweet.

The Glove model is applied because it is an extension to the Word2vec model. This approach combines both the global statistics of matrix factorization techniques namely latent semantic analysis and social context-based learning in Word2vec [25]. Moreover, rather than using a window to define the local context, the glove model constructs an explicit word-context or a co-occurrence word matrix using statistics across the entire text corpus. Therefore, in general, the model may result in better word embeddings.

Special words: Special words include negation, intensifier, and diminisher words. Special words can reverse the polarity of sentiment words; for example, if a negation word appears before a positive word then the sentiment of the tweet is changed to negative and vice versa. These features are extracted using a window of 1 to 3 words before a sentiment word and search for these kinds of words.

Sentiment words: Sentiment words have a significant role in determining the sentiment polarity of tweets. Sentiment words consist of positive and negative words. A number of sentiment words in each tweet are used as a feature. The sentiment dictionaries provided by Hu and Liu [8] are employed for determining the positive and negative words in a tweet.

4.2 Determining objects of sentiments in each tweet

A combination of BiLSTM and CRF models [9] is used to determine objects and their sentiments in each tweet. The combination leverages the advantages of both models: the creative ability to extract the features of the LSTM model and the steady predictability of the CRF model [22].

The above model operates according to the following steps: Word embedding of each word is entered into the BiLSTM layer to extract the features presented in section 4.1. Then the CRF layer treats the above features to predict about label for each word. In addition to the information received from the BiLSTM layer, the CRF layer also relies on information from previously anticipated labels; for example, if the previous word was an object, it is highly probable that the present word is a sentiment word.

The parameters of the model include the word-embedding matrix, the weight matrix of the BiLSTM layer, and the transposition matrix of the CRF layer. All these parameters are updated during training on the labeled tweets via the back-propagation algorithm with Stochastic Gradient Descent (SGD). Moreover, during training, the dropout technique is applied to avoid over-fitting.

In addition, to increase the accuracy of identifying sentiments and the objects of sentiments in each tweet, we produce a set of rules and based on these rules to label the training data as follows:

Rule 1 (r₁): If the tweet contains sentiment words complementary for a noun or noun phrase (denoted by n) in a tweet, then n is marked as a word indicating the object; for example, “I have a nice bag.” or “She has nice eyes.”, then “eyes” and “bag” are identified as objects of sentiments.

Rule 2 (r₂): If the tweet contains sentiment words and n stands after preposition, then n is a word determining the object in tweet. e.g., “iPhone is good at sound”, then “sound” is one object in this tweet.

Rule 3 (r₃): If the tweet contains sentiment words and n stands after normal verb containing sentiment, then n is a word determining the object in tweet. e.g., “I like sunshine”, then “sunshine” is determined as one object.

Rule 4 (r₄): If the tweet contains sentiment words and n stands before “to be” verb, then n is determined as a word indicating the object. e.g., “The flower is nice.” Follow this rule, we have one object as “flower.”

The order of priority for the rules is r₁, r₂, r₃, r₄; this means that r₁ takes precedence over r₂, and so on; for example, when in a tweet, n satisfies both r₂ and r₄ then r₂ is preferred.

The following example provides a detailed analysis to clearly understand the steps to determine the objects of sentiments in tweets.

Example 1: Given a set of tweets, T = {t₁, t₂, t₃, t₄, t₅, t₆},

t₁: “I do not like rain, but I like the sunshine.”

t₂: “Rain is nothing but a thorn of roses.”

t₃: “I like rain, and I like the sunshine.”

t₄: “Sunshine is lovely, but the wind is uncomfortable.”

t₅: “The rain is sad, and the sunshine is also sad.”

t₆: “Sunshine is good for me.”

a) For t₁: tag(rain) = ‘NOUN’, not like ∈Nt, and Θ (notlike, rain) =1. Thus o₁ = rain. tag(sunshine)=‘NOUN’, like ∈Pt, and Θ (like, sunshine) =1. Thus o₂ = sunshine. From o₁ and o₂ as a result O₁ = {rain, sunshine}.

b) For t₂: tag(rain) = ‘NOUN’, nothing ∈Nt, and Θ (nothing, rain) =1. Thus o₁ = rain. From o₁ as a result O₂ = {rain}.

c) For t₃: tag(rain) = ‘NOUN’, like ∈Pt, and Θ (like, rain) =1. Thus o₁ = rain. tag(sunshine) = ‘NOUN’, like ∈Pt, and Θ (like, sunshine) =1. Thus o₂ = sunshine. From o₁ and o₂ as a result O₃ = {rain, sunshine}.

d) For t₄: tag(sunshine) = ‘NOUN’, lovely ∈Pt, and Θ (lovely, sunshine) =1. Thus o₁ = sunshine. tag(wind) = ‘NOUN’, uncomfortable ∈Nt, and Θ (uncomfortable, wind) =1. Thus o₂ = wind. From o₁ and o₂ as a result O₄ = {sunshine, wind}.

e) For t₅: tag(rain) = ‘NOUN’, sad ∈Nt, and Θ (sad, rain) =1. Thus o₁ = rain. tag(sunshine) = ‘NOUN’, sad ∈Nt, and Θ (sad, sunshine) =1. Thus o₂ = sunshine. From o₁ and o₂ as a result O₅ = {rain, sunshine}.

f) For t₆: tag(sunshine) = ‘NOUN’, good ∈Pt, and Θ (good, sunshine) =1. Thus o₁ = sunshine. From o₁ as a result O₆ = {sunshine}. From O₁, O₂, O₃, O₄, O₅, and O₆ as a result O = {rain, sunshine, wind}

4.3 Determining sentiment of each tweet

The sentiment of each tweet is determined by implementing the two following steps: First, sentiments of objects in each tweet are determined by using the combination of BiLSTM and CRF models as explained in Section 4.2. Second, objects and sentiments related to these objects in each tweet are combined to create the sentiment of tweet. In addition, to support the determination sentiments of objects, the following set of rules are devised to label the training data:

Rule 5 (r₅): Rule related to negation words. This rule is used to identify the sentiments of objects when negation words appear before sentiment words in the tweet (see Table 1).

Table 1
Description of r₅

Rule Example

Negation + Negative → Positive “not bad” → Positive

Negation + Positive → Negative “not good → Negative

Negation + Neutral → Positive “no problem” → Positive

Rule	Example
Negation + Negative → Positive	“not bad” → Positive
Negation + Positive → Negative	“not good → Negative
Negation + Neutral → Positive	“no problem” → Positive

Rule 6 (r₆): Rule related to intensifier words. This rule is applied to decide the sentiments of objects when the intensifier words appear before sentiment words in the tweet (see Table 2).

Table 2

Description of r₆

Rule	Example
Intensifier + Negative → Negative	“so bad” → Negative
Intensifier + Positive → Positive	“too good” → Positive
Intensifier → Positive	“absolutely” → Positive

Rule 7 (r₇): Rule related to diminisher words. This rule is employed to determine the sentiments of objects when the diminisher words precede the sentiment words in the tweet (see Table 3).

Table 3

Description of r₇

Rule	Example
Diminisher + Negative → Negative	“quite bad” → Negative
Diminisher + Positive → Positive	“slight good” → Positive
Diminisher + Neutral → Positive	“slight problem” → Positive

Rule 8 (r₈): Rule related to sentiment words. This rule is used to identify the sentiments of objects when consecutive sentiment words appear in the tweet (see Table 4).

Table 4

Description of r₈

Rule	Example
Negative + Negative → Negative	“sad, angry” → Negative
Positive + Positive → Positive	“nice, good” → Positive

The order of priority of the rules r₅, r₆, r₇ and r₈ is from left to right, meaning that in the tweet, if the rule to the left is satisfied first then that rule takes precedence. After objects and sentiments of objects are extracted, the sentiment of each tweet is analyzed by applying Definition 3 in Section 3.1. For greater clarity, the following example is executed and analyzed based on the objects extracted in Ex.1.

Example 2: From the result of Ex.1, the sentiment of each tweet is identified as follows:

a) For t₁:

o₁ = rain, notlike ∈ Nt, Θ (notlike, rain) =1.

Hence, e_{o
₁} = ¬ rain.

o₂ = sunshine, like ∈ Pt, Θ (like, sunshine) =1. Hence, e_{o
₂} = sunshine.

From e_{o
₁}, e_{o
₂} as a result s_{t
₁}={¬ rain, sunshine}.

b) For t₂:

o₁ = rain, nothing ∈ Nt, Θ (nothing, rain) =1. Hence, e_{o
₁} = ¬ rain.

From e_{o
₁} as a result s_{t
₂} = {¬ rain}.

c) For t₃:

o₁ = rain, like ∈ Pt, Θ (like, rain) =1.

Hence, e_{o
₁} = {rain}.

o₂ = sunshine, like ∈ Pt, Θ (like, sunshine) =1.

Hence, e_{o
₂} = sunshine.

From e_{o
₁}, e_{o
₂} as a result s_{t
₃} = {rain, sunshine}.

d) For t₄:

o₁ = sunshine, lovely ∈ Pt, Θ (lovely, sunshine) =1.

Hence, e_{o
₁} = sunshine.

o₂ = wind, uncomfortable ∈ Nt,

Θ (uncomfortable, wind) =1.

Hence, e_{o
₂} = ¬ wind.

From e_{o
₁}, e_{o
₂} as a result s_{t
₄} = {sunshine, ¬ wind}.

e) For t₅:

o₁ = rain, sad ∈ Nt, Θ (sad, rain) =1. Hence, e_{o
₁} = ¬ rain. o₂ = sunshine, sad ∈ Nt, Θ (sad, sunshine) =1. Hence, e_{o
₂} = ¬ sunshine.

From e_{o
₁}, e_{o
₂} as a result s_{t
₅} = {¬ rain, ¬ sunshine}.

f) For t₆:

o₁ = sunshine, good ∈ Pt, Θ (good, sunshine) =1.

Hence, e_{o
₁} = sunshine.

From e_{o
₁} as a result s_{t
₆} = {sunshine}.

Therefore, S_T = {{¬ rain, sunshine} , {¬ rain} , {rain, sunshine} , {sunshine, ¬ wind} , {¬ rain, ¬ sunshine} , {sunshine}}.

The results of Ex.2 show that, unlike previous methods, this method yields more detailed results. For example, if s_{t
₂} = {¬ rain}, it means that t₂ has a negative sentiment, which is assigned to object rain. In others words, the object of the sentiment is also identified in addition to the sentiment of the tweet.

4.4 Integrating sentiments according to each object from tweets

This section focus on determining the overall sentiment of each objects by integrating sentiments from all tweets. For each object, we compute the sentiment score of positive and negative components by applying Equations 5 and 6 based on the frequency of these components in all tweets. Then the sentiment score of objects is calculated by using Equation 7. Furthermore, to analyze the sentiment of each object in more detail, we give a scale to divide the sentiment of each object into different levels based on the sentiment score of that object. In this work, the thresholds for the sentiment score are chosen in an interval [-1,+1] based on empirical and some previous method [21, 26]. The steps to integrate sentiments according to each object from tweets are explained below in Ex.3.

Example 3: Based on a set of objects and a set of sentiments of objects extracted in Ex.1 and Ex.2, the sentiment of each object is calculated as follows:

a) For o₁ = rain: frequency (rain⁺) =1, frequency (rain^-) =3,

score (rain⁺) =0.25, score (rain^-) =0.75,

score (rain) =0.25 - 0.75 = -0.5.

Thus se_rain = negative .

Therefore, s_rain = 〈rain, negative〉 .

b) For o₂ = sunshine:

frequency (sunshine⁺) =4,

frequency (sunshine^-) =1,

score (sunshine⁺) =0.8,

score (sunshine^-) =0.2,

score (sunshine) =0.8 - 0.2 = 0.6,

Thus se_sunshine = verypositive .

Therefore s_sunshine = 〈sunshine, verypositive〉 .

c) For o₃ = wind:

frequency (wind⁺) =0, frequency (wind^-) =1,

score (wind⁺) =0, score (wind^-) =1,

score (wind) =0 - 1 = -1,

Thus se_wind = verynegative .

Therefore s_wind = 〈wind, verynegative〉 .

Hence, S_O = {〈rain, negative〉, 〈sunshine, very positive〉, 〈wind, verynegative〉}.

5 Experiment

Focusing on three research problems presented in Section 3.2, we proposed a method and verified it through experimentation on a real dataset. This section presents and analyzes the experimental results of the proposed method.

5.1 Data acquisition

In this study, the Python package Tweepy 3 is used to collect 4758 tweets related to Weather topic in English. The data is pre-processed to remove unnecessary elements such as punctuation marks, re-tweet symbols, URLs, hashtags, and query term. In addition, a describing text is replaced for an emoji icon in a tweet by using the Python emoji package 4 . Since tweets are usually informal in which users can use acronyms as well as wrong spellings, the accuracy of the results can be affected. Therefore, the Python-based Aspell library 5 is employed to correct the spelling of tweets. We scan all words in tweets and search for the words belonging to the two available sentiment dictionaries. This research does not consider tweets that do not contain any word belongs to these dictionaries. Since each tweet represents a person’s opinion on a topic, it is difficult to exist a tweet that contains two opposing sentiments directed at the same object. Therefore, we do not consider tweets that indicate two opposing sentiments for the same object. The number of tweets after pre-processing is 2057 tweets. These tweets are then divided and stored into two separate database files for training and testing. The training data consists of 1440 tweets, and the testing data includes 617 tweets. The tweets are assigned labels. The data was annotated with three labels (i.e., Positive, Negative, and Object) by five manual annotators. The training data file contains two columns separated by a tab space. Each token has placed in a separate line such that the first item on each line is a token and the second item is a label. The tokens that do not belong to factors which need to interest were annotated as Other. We annotated the test set as the gold standard to assess the performance. There are 849 tokens indicating the object, 557 tokens determining positive sentiment, 292 tokens indicating negative sentiment which need to be predicted in the test set.

5.2 Evaluation

The effectiveness of the proposed method is evaluated by using different evaluation metrics namely Precision (P), Recall (R), F-measure (F₁), and Accuracy as follows: $P = \frac{TP}{TP + FP}$ (8) $R = \frac{TP}{TP + FN}$ (9) $F_{1} = 2 \times \frac{P \times R}{P + R}$ (10) $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (11) where True Positive (TP) is the number of exactly assigned labels (as positive, negative, object); False Positive (FP) is the number of misassigned labels; True Negative (TN) is the number of exactly assigned Other labels; False Negative (FN) is the number of misassigned Other labels.

In addition, the mean absolute error MAE_m [17] is applied to estimate error ratio of objects polarity method. MAE_m is calculated as follows: ${MAE}_{m} = \frac{1}{2 m} \sum_{i = 1}^{m} | v_{i} - v_{i}^{*} |$ (12) where m is the number of actual objects; v_i is the predicted average polarity of o_j; v_i is based on the classes assigned to the occurrences of o_j by the proposed method. $v_{i}^{*}$ is the actual average polarity of o_j; $v_{i}^{*}$ is based on the classes assigned to the occurrences of o_j by the human annotators.

5.3 Result and discussion

a. Determining objects

In the testing set, there are twelve objects that are described by at least one sentiment word including: rain, wind, sunshine, snow, fog, ice, cloud, hail, rainbow, frost, temperature, storm. Because of the limitation of space, the objects are denoted as o_j (j = 1, . . . , 12), respectively. The result of the object detection process is shown in Table 5.

Table 5
Confusion Matrix of object detection

Predicted classes

o ₁ o ₂ o ₃ o ₄ o ₅ o ₆ o ₇ o ₈ o ₉ o ₁₀ o ₁₁ o ₁₂

Actual classes o ₁ 70 1 2 2 10 11

o ₂ 72 1 1 13

o ₃ 71 10

o ₄ 1 57 7 2 6 1

o ₅ 6 1 30 4 1

o ₆ 2 28 6

o ₇ 2 70 9

o ₈ 3 4 22

o ₉ 11 10 88

o ₁₀ 2 3 18

o ₁₁ 10 1 3 80 1

o ₁₂ 7 5 1 1 2 80

Using the confusion matrix in Table 5 and Equations 8, 9, and 10, the performance of the object detection in the tweets was calculated.

Table 6

Performance of detecting objects

	o ₁	o ₂	o ₃	o ₄	o ₅	o ₆	o ₇	o ₈	o ₉	o ₁₀	o ₁₁	o ₁₂
TP	70	72	71	57	30	28	70	22	88	18	80	80
TN	722	756	753	771	796	797	763	815	708	803	753	739
FP	31	6	15	4	11	16	5	5	32	23	1	14
FN	26	15	10	17	12	8	11	7	21	5	15	16
P	69.3	72.3	82.6	73.4	73.2	63.6	73.3	81.5	73.3	43.9	78.8	85.1
R	72.9	82.8	87.7	77.0	71.4	77.8	86.4	75.9	80.7	78.3	84.2	83.3
F ₁	71.1	77.2	85.1	75.2	72.3	70.0	79.3	78.6	76.8	56.3	81.4	84.2

Table 6 shows that the performance of the object detection is fairly good, in which, the average P is 72.82%; The average R is 80.05%, and the average F₁ score is 76.03%. According to our assessment, one of the main reasons to achieve this performance is that the combination of BiLSTM and CRF models has promoted the advantages of both models in detecting objects and their sentiments. In addition, the rules provided help the training model to determine, more accurately, the location of words that indicate objects and words that define sentiments of objects.

Fig.2 shows that determining of the specific objects generally gives well result on classifying objects as o₂, o₃, o₈, o₁₁, o₁₂. However, the performance of classification object o₁₀ does not obtain as desirable. Concretely, the F₁ score is only 56.3%. Intuitively, one of the main reasons for the low performance is that the training data has less words indicating object o₁₀. We believe that the construction of a large data warehouse and the balance between words indicating factors are satisfied, this difference can be significantly improved.

Fig. 2

Compare performance in specific objects.

b. Analyzing sentiment of each tweet

From 617 tweets in the testing set, we have extracted the sentiment of each tweet, which is presented in Table 7. where 1s - 1o means that the tweet expresses a single sentiment for one object, and the remaining symbols have similar meanings.

Table 7

Result of sentiment of each tweet

	1s-1o	1s-2o	1s-3o	2s-2o	2s-3o
# t_i	393	87	11	86	40
# t_i contains o_j	219	35	4	0	0
# t_i contains ¬o_j	174	52	7	0	0
# t_i contains both	0	0	0	86	40
o_j and ¬o_k
Accuracy	81.3	77.1	75.4	80.0	73.6

The results indicate that tweets presenting a single sentiment for a single object and two different sentiments for two objects achieved better performance than other cases. Since such tweets have a balance between the words indicating objects and the sentiment words, it is easy to detect and assign a label by using the algorithm. Besides, unlike previous methods, our method analyzes the sentiment of each tweet by combining the sentiment of objects existing in each tweet. The previous methods only extract sentiments of users while the objects at which the sentiments are directed remained unknown. Moreover, most of the other methods only mention sentiment of each tweet without explicitly specifying the sentiment for each object existing in each tweet. The effectiveness of the proposed method can be further clarified by the data in Table 7 and the result of Ex.2, we not only know the sentiments of the speaker but also grasp objects of those sentiments. (e.g., “I do not like rain” shows the negative sentiment with “rain” object), and objects sentiment on the entire tweet is also expressed (e.g., for weather topic, most of the opinions give positive sentiment with “wind” object, very positive with “snow”, etc.). Therefore, the result of the proposed method contains more information.

c. Analyzing sentiment from tweets according to each object

To evaluate the performance of the proposed method, we requested five humans as experts defining sentiment of twelve objects by integrating sentiments from 617 tweets. Then, these results are used to compare to the results which our method has given by using the mean absolute error. The actual and predicted result of sentiment integration from entire tweets according to each specific object are shown in Table 8.

Table 8

The actual and predicted result of sentiment integration from tweets according to specific objects

		o ₁	o ₂	o ₃	o ₄	o ₅	o ₆	o ₇	o ₈	o ₉	o ₁₀	o ₁₁	o ₁₂
Actual	total	96	87	81	74	42	36	81	29	109	23	95	96
	$o_{j}^{+}$	90	50	74	60	20	24	21	17	97	5	76	23
	$o_{j}^{-}$	6	37	7	14	22	12	60	12	12	18	19	73
Score	$o_{j}^{+}$	0.94	0.57	0.91	0.81	0.48	0.67	0.26	0.59	0.89	0.22	0.80	0.24
	$o_{j}^{-}$	0.06	0.43	0.09	0.19	0.52	0.33	0.74	0.41	0.11	0.78	0.20	0.76
	$o_{j}^{+}$ - $o_{j}^{-}$	0.88	0.14	0.82	0.62	–0.04	0.34	–0.48	0.18	0.78	–0.56	0.6	–0.52
Predicted	total	70	72	71	57	30	28	70	22	88	18	80	80
	$o_{j}^{+}$	6	50	50	47	14	15	31	17	80	15	61	9
	$o_{j}^{-}$	64	22	21	10	16	29	44	10	8	3	19	71
Score	$o_{j}^{+}$	0.09	0.69	0.70	0.82	0.47	0.34	0.41	0.63	0.91	0.83	0.76	0.11
	$o_{j}^{-}$	0.91	0.31	0.30	0.18	0.53	0.66	0.59	0.37	0.09	0.17	0.24	0.89
	$o_{j}^{+}$ - $o_{j}^{-}$	–0.82	0.38	0.4	0.64	–0.06	–0.32	–0.18	0.26	0.82	0.66	0.52	–0.78

From Table 8 and by applying Equation 12, we have the mean absolute error of the proposed method as ${MAE}_{12} = \frac{1}{2 \times 12} \times 5.06 = 0.21$ , which is an acceptable error ratio.

Table 9 represents the final result of sentiment integration from sentiments of tweets according to objects. The result shows that the method generally did not perform well with the very positive sentiment (the true predicted ratio is $\frac{3}{5}$ ) and the very negative sentiment (the true predicted ratio is $\frac{1}{2}$ ); however, it has performed well with the two remaining sentiments (the true predicted ratio is 1 in both cases). Moreover, the average accuracy (78.9% by applying Equation 11) can prove that this method is relatively good in determining the sentiment of each object by integrating sentiments from entire tweets.

From the above analysis, we find that the results of the sentiment analysis for each object using the proposed method is fairly good in terms of the error ratio and achieved information.

Table 9

The result of integrating sentiments from tweets according to specific objects (Very positive, Positive, Neutral, Negative, and Very Negative are denoted by VP, P, NE, N, and VN, respectively)

	Rain	Wind	Sunshine	Snow	Fog	Ice	Cloud	Hail	Rainbow	Frost	Temperature	Storm
Actual	VP	P	VP	VP	NE	P	N	P	VP	VN	VP	VN
Predicted	VN	P	P	VP	NE	N	N	P	VP	VP	VP	VN

6 Conclusion and Future Work

This study proposed a method for analyzing the overall sentiment related to each object from the entire set of tweets by sentiment integration of each object from different tweets. The method includes the following steps: First, the proposed method extracted features to determine the objects and the sentiments of the objects. Second, the sentiment in each tweet was determined in detail by clearly presenting some information as objects and the sentiment related to these objects. Finally, the sentiment related to a specific object was analyzed by integrating the sentiments from all the tweets. The experimental results show that the proposed method significantly improved the performance in terms of the error ratio and achieved information. Such results can help, for example, a manufacturer grasp the users’ opinion about parts of the product; using this, the manufacturer can come up with the right development strategy for the product, which can create products that meet the needs of the users.

There are however some limitations of the proposed approach, which include the impact of sarcastic tweets and tweets containing conditional sentences that cannot be considered in the sentiment analysis. In future, we plan to address the limitations of the method proposed in this study; especially, we hope to analyze the impact of sarcastic tweets and tweets containing conditional sentences on sentiment analysis, which would be another interesting research direction.

Footnotes

Acknowledgment

This work was supported by the 2018 Yeungnam University Research Grant. And this research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (2017R1A2B4009410).

References

Asghar

M.Z.

, Khan

, Ahmad

, Qasim

and Khan

I.A.

Lexicon-enhanced sentiment analysis framework using rulebased classification scheme, PloS One 12(2) (2017)e0171649.

Asghar

M.Z.

, Kundi

F.M.

, Ahmad

, Khan

and Khan

T-SAF: Twitter sentiment analysis framework using a hybrid classification scheme, Expert Systems 35(1) (2018), DOI: 10.1111/exsy.12233.

Ahmad

, Aftab

and Ali

Sentiment analysis of tweets using svm, Int J Comput Appl 177(5) (2017)25–29.

Baccianella

, Esuli

and Sebastiani

, Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 1723 May 2010, Valletta, Malta, 2010, http://www.lrec-conf.org/proceedings/lrec2010/summaries/769.html.

Bird

and Loper

, Nltk: The natural language toolkit, In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions. p.31. Association for Computational Linguistics, 2004.

Dey

, Jenamani

and Thakkar

J.J.

Senti-N-Gram: An n-gram lexicon for sentiment analysis, Expert Syst Appl 103 (2018)92–105.

Gamallo

and García

, Citius: A naive-bayes strategy for sentiment analysis on english tweets. In: Proceedings of the 8th International Workshop on Semantic Evaluation, SemEval@ COLING 2014 Dublin, Ireland, 2014. pp. 171–175. http://aclweb.org/anthology/S/S14/S14-2026.pdf.

and Liu

, Mining and summarizing customer reviews, In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2004, pp. 168–177.

Huang

, Xu

and Yu

, Bidirectional lstm-crf models for sequence tagging, arXiv preprint arXiv:1508.01991, 2015.

10.

Jankowski

, Ziemba

, Watrobski

and Kazienko

, Towards the Tradeoff Between Online Marketing Resources Exploitation and the User Experience with the Use of Eye Tracking, Intelligent Information And Database Systems, ACIIDS 2016, Edited by: N.T. Nguyen, B. Trawinski, H. Fujita and T.P. Hong, Book Series: Lecture Notes in Artificial Intelligence, Volume:962, 2016, pp. 330–343. DOI: 10.1007/978-3-662-49381-6_32

11.

and Oh

A.H.

, Aspect and sentiment unification model for online review analysis, In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, ACM, 2011, pp. 815–824.

12.

Khan

A.Z.

, Atique

and Thakare

, Combining lexicon-based and learning-based methods for twitter sentiment analysis, International Journal of Electronics, Communication and Soft Computing Science & Engineering (IJECSCSE) (2015), 89.

13.

Kim

S.M.

and Hovy

, Extracting opinions, opinion holders, and topics expressed in online news media text. In: Proceedings of the Workshop on Sentiment and Subjectivity in Text, Association for Computational Linguistics, 2006, pp. 1–8.

14.

Liu

, Sentiment analysis and opinion mining, Synthesis Lectures on Human Language Technologies 5(1) (2012), 1–167.

15.

Nguyen

N.T.

, Advanced methods for inconsistent knowledge management, Springer Science & Business Media, 2007.

16.

Pablos

A.G.

, Cuadros

and Rigau

, W2VLDA: Almost unsupervised system for aspect based sentiment analysis, Expert Syst Appl 91 (2018), 127–137. DOI: 10.1016/j.eswa.2017.08.049.

17.

Pavlopoulos

, Aspect based sentiment analysis, Athens University of Economics and Business, 2014.

18.

Remus

, Quasthoff

and Heyer

, Sentiws – a publicly available germanlanguage resource for sentiment analysis, In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10) European Language Resources Association (ELRA), Valletta, Malta, 2010.

19.

Rojas-Barahona

L.M.

, Deep learning for sentiment analysis, Language and Linguistics Compass 10(12) (2016), 701–719.

20.

dos Santos

and Gatti

, Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 2014, pp. 69–78.

21.

Saldan a

Z.W.

, Sentiment analysis for exploratory data analysis, Programming Historian, 2018.

22.

Tang

and Zhang

, Deep learning in sentiment analysis, In: Deep Learning in Natural Language Processing, Sringer, 2018, pp. 219–253.

23.

Thakkar

and Patel

, Approaches for sentiment analysis on twitter: A state-of-art study, arXiv preprint arXiv:3, 2015.

24.

Turney

P.D.

, Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Comutational Linguistics, 2002, pp. 417–424.

25.

Tran

V.C.

, Nguyen

N.T.

, Fujita

, Hoang

D.T.

and Hwang

, A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields, Knowledge-Based Systems 132 (2017), 179–187.

26.

Tian

, Lai

and Moore

J.D.

, Polarity and intensity: The two aspects of sentiment analysis. arXiv preprint arXiv:1807.01466, 2018.

27.

Zhao

, Gui