Accurate frequency-based lexicon generation for opinion mining

Abstract

Sentiment analysis deals with classifying the opinions in text. Twitter is the most popular microblogging platform in social media, with hundreds of millions of tweets posted every day. A considerable number of tweets contain opinions. The goal of this paper is to classify the polarity of the tweets into positive and negative classes using dynamic sentiment lexicons based on frequencies of words in positive and negative classes. We extract five meta-level features incorporating the generated sentiment lexicons and classify the text based on them. We also incorporate some previously known lexicon-based and corpus-based features. The proposed method is assessed on six datasets, and outperforms previous papers on accuracy on four datasets, and on f-measure on three datasets. This method generates sentiment lexicons dynamically. The changes of meanings of words can be captured by the generated lexicons. Our research produces very promising results in sentiment analysis in terms of accuracy and f-measure. The accuracy of our method on four datasets and the f-measure of our method on three datasets are higher than 85%.

Keywords

Sentiment analysis opinion mining sentiment lexicons twitter

1 Introduction

Social media users generate massive amounts of text. They share their feelings about different subjects and entities, such as products, politicians, and corporations [1]. There are numerous media outlets in which people can express themselves. One of those is Twitter, in which the posts are called tweets and should not exceed 140 characters. Twitter has about 320 million active users and about 500 million tweets per day.

As an increasing number of users share their views online, microblogging websites are a valuable source of opinions [2]. The opinion people share in microblogging platforms can be used for marketing of products [3], social studies [2], world news [4], and prediction of future events [5]. The user-generated content on the web enables information technologies to benefit from diversity of the knowledge the users provide [6]. Hence, companies are eager to mine Twitter to find out about people’s opinion [7].

Automatic analyzing of the opinions expressed in text is a discipline named opinion mining or sentiment analysis. This field is about sentiments, opinions and emotions people express about various subjects, such as products, organizations, companies and famous people [1]. Sentiment analysis deals with opinionated text such as reviews [8, 9], blog posts [10, 11], and news [12].

The limitation on tweet lengths makes opinion mining in Twitter most similar to sentence-level opinion mining [7]. The special culture and language of Twitter users is a challenge for sentiment analysis in Twitter [13]. Some literatures [14 –16] considered the problem of sentiment analysis as a three-class classification consisting of positive, negative and neutral classes. There are other related fields in sentiment analysis, such as emotion mining and strength detection [17 –20].

In this research, we create sentiment lexicons from datasets based on corpus, and then make use of these lexicons to classify text. Our focus in this paper is on polarity classification of subjective tweets, in which we classify text into positive and negative classes. One of the challenges in sentiment analysis in Twitter is the problem of “slangs” that are widely used in tweets. Slangs present the different backgrounds the Twitter users have [13]. Our method addresses this problem by creating dynamic sentiment lexicons from of all of the words present in the datasets; some existing sentiment lexicons, such as AFINN [21] and Bing Liu’s lexicon [1] do not contain most of slangs. Also, the shortness of tweets makes them difficult to evaluate using existing lexicons [22]. Malformed words are also prevalent in Twitter, and this is another shortcoming in using of lexicons [23].

An optimal general-purpose lexicon for all domains cannot be created and the words are sensitive to the domain [24]. Our method generates dynamic lexicons by using the training datasets, and then extracts five meta-level features from tweets using these lexicons. We classify the text using a machine learning technique, SMO [25]. We divide the dataset into training and test datasets, and the lexicons are built from training data. Using these lexicons, the values for meta-level features are calculated for both training and test data.

Our contributions in this paper are as follows:

we propose a novel approach for generating sentiment lexicons and assigning numerical scores to words present in text, based on the frequencies of words in positive and negative text; these lexicons are used to calculate meta-level features;

we show that the features of these lexicons can be used alongside the features of other lexicons to improve their accuracy and f-measure.

The paper is organized as follows: In Section 2, we explore the related work in the field of polarity classification. We describe our method in Section 3. In Section 4, the results are presented and discussed. Finally, in Section 5, we conclude the paper.

2 Related work

We describe both works done in the field of sentiment analysis on Twitter, and works that focus on sentiment lexicons. We also explore fuzzy sentiment analysis and web resources.

2.1 Sentiment analysis in twitter

Tweets are 140 characters or less, and hence, are usually straightforward. They are considered a great resource for sentiment analysis [26].

Records in the Twitter datasets, i.e. tweets, should be labeled for classification. It is difficult to label a huge amount of records manually, and some works benefit from emoticons for labeling tweets [27 –29]. Authors in [26] argue that using emoticons generates noise in labeling data, since a positive emoticon does not necessarily make a tweet positive.

Go et al. [28] used emoticons for creating an automatically labeled dataset of 1,600,000 tweets. Liu et al. [29] created another dataset and labeled the records with use of emoticons and manual labeling.

To classify sentiments of Tweets, Gonçalves et al. [30] proposed a combination of sentiment analysis methods. Agarwal et al. [3] used POS-specific features and a tree kernel for feature engineering for sentiment analysis in Twitter. Zhang et al. [31] combined lexicon-based and learning-based methods for sentiment classification of tweets. Mohammad et al. [32] proposed two SVM classifiers; one for sentiment classification of tweets, and one for sentiment classification of a term within a tweet. Hu et al. [33] used networked data, and investigated if social relations can be useful in detection of sentiments in tweets. Saif et al. [34] added semantic features as additional features for sentiment classification of tweets. Bravo-Marquez et al. [35] used meta-level features based on sentiment lexicons for classification of tweets. Kaewpitakkun et al. [36] used a hybrid approach that incorporated sentiment lexicons and machine learning techniques. Saif et al. [37] proposed a lexicon adaptation method that considers the context in which the words are used. Coletta et al. [38] combined classification and clustering for classification of tweets. Da Silva et al. [13] used an ensemble of classifier for sentiment classification of tweets. Speriosu et al. [39] used label propagation with a maximum entropy classifier on Twitter datasets. Carvalho et al. [40] used word co-occurrences in a statistical and evolutionary model for sentiment classification of tweets. Lu [41] used a semi-supervised approach and incorporated microblog-microblog relations. Saif et al. [42] captured patterns of words of which the contextual semantics is similar. Baecchi et al. [43] incorporated a multimodal approach. Hu et al. in [44] used emotional signals, such as emoticons or product ratings in sentiment classification. Saif et al. [45] incorporated a lexicon-based method. Keshavarz and Saniee proposed a genetic algorithm approach for polarity classification of microblogs. They defined an optimization problem, and solved it [46]. Another work in the field of optimization algorithms is [47] in which the cuckoo search is incorporated. Rout et al. [48] used n-gram and part-of-speech features to analyze social media texts from sentiment and emotion points of view. The problem of tweet sentiment analysis in the Spanish language has been explored in [49], by using convolutional neural networks.

2.2 Web resources

Sentiment analysis works on text gathered from web; i.e. a web resource. Numerous web resource features, such as n-grams, phrases, terms, hypernyms, document categories, and named entities are cited in [50] that are used in web mining. Incorporating text features in web mining is addressed in [51]. Features such as terms, keywords and phrases are used in [52]. Linguistic features are incorporated in [53] as well. Using additional aspects to web resources, such as geographical information is explored in [54].

2.3 Fuzzy sentiment analysis

Several research papers also have studied the effect of combining fuzzy logic and sentiment analysis. Fuzzy logic is used for modeling polarity in [55]. The assumption in this paper is that a text can be highly or mildly positive or negative. The problem studied in [56] is to build a fuzzy product ontology in aspect-based opinion mining. In [57], the reviews are divided into very positive to very negative groups. A fuzzy lexicon is used in [58], in which the degree of positivity or negativity of words and reviews are decided by fuzzy sets. A neuro-fuzzy model is incorporated for sentiment analysis in [59]. WordNet is explored and the membership of near 8000 words in fuzzy categories of sentiment are calculated in [60]. The propagation in social networks is explored using fuzzy sets for better sentiment analysis in [61]. In [62], the authors argue that there is an inherent vagueness in definition of positivity, objectivity and negativity, and try to address the issue using fuzzy sets. Finally, the problem of creating fuzzy domain sentiment ontology is studied in [63].

2.4 Sentiment lexicons

One of the main approaches for sentiment analysis is to use sentiment lexicons. A sentiment lexicon is a set of sentiment words or phrases with assigned scores. Since our work here is to generate dynamic lexicons, we describe existing well-known lexicons. Bradley and Lang [65] proposed the ANEW lexicon, which stands for Affective Norms for English Words. This lexicon was introduced before the rise of microblogging. Nielsen [21] proposed a new lexicon inspired from ANEW, and named it AFINN. Bing Liu’s lexicon is created by Bing Liu [1]. This lexicon contains positive and negative words. The words of EmoLex (NRC-emotion) [66] are labelled by their polarity and emotion. The NRC-hashtag lexicon is for the SemEval task by the NRC-Canada team [32]. The OpinionFinder lexicon is based on Multi-Perspective Question-Answering dataset (MPQA) and is proposed by Wilson et al. [67]. The Sentiment140 lexicon is also created by the NRC-Canada team. Another lexicon is SentiWordNet which was first proposed by Esuli and Sebastiani [68]. SentiWordNet 3.0 is created by Baccianella et al. [69].

Using these lexicons, one should consider challenges such as intensification and negation [70].

One shortcoming of the lexicons is that they rate some words as positive (or negative), without considering that these words may have different meanings in social media [46]. Also, the coverage of some of these lexicons, such as AFINN [21], Bing Liu’s lexicon [1] and OpinionFinder [67] on tweets are low.

3 Frequency-based sentiment analysis (FBSA)

First, we propose an algorithm, FBSA, that generates a sentiment lexicon, and then using the generated lexicon, the classification task is done. In our method, a sentiment lexicon is generated in the training phase of the algorithm on the training dataset, and this lexicon is used for classification of records in the test dataset. We use the 10-fold cross-validation scheme for our method, build sentiment lexicons based on the 9 parts that form the training set, create a model on the training data, and test the model on the test data. In the proposed method, all of the words are considered for lexicon generation, because of the short length of tweets, and that every word can be decisive in classification [46]. A salient advantage of our algorithm is not omitting any words in the datasets, even stop-words.

3.1 The proposed method

FBSA first calculates the sentiment score for each word, based on term frequencies of words in the training dataset. Suppose we have p datasets, D₁ to D_p, that contain a number of tweets and labels showing if they are positive or negative. We split a dataset into training and test datasets according to the 10-fold cross-validation mechanism. For each word w_j in the training dataset D_i, we define two cumulative values; positive and negative frequencies: $\begin{matrix} {freq}_{+} (w_{j}, D_{i}) = \sum_{k = 1}^{n_{i}} TF (w_{j}, R_{i, k}) \\ if Class (R_{i, k}) is positive \end{matrix}$ (1) $\begin{matrix} {freq}_{-} (w_{j}, D_{i}) = \sum_{k = 1}^{n_{i}} TF (w_{j}, R_{i, k}) \\ if Class (R_{i, k}) is negative \end{matrix}$ (2) in which, n_i is the number of records in D_i, R_i,k is record number k in the dataset D_i, and TF (w_j, R_i,k) is the number of occurrences of w_j in R_i,k. In other words, freq₊ (w_j, D_i) and freq _ (w_j, D_i) are the number of occurrences of w_j in positive and negative records in the dataset D_i, respectively.

The D_i dataset is the training dataset, and thus it does not include records from the test dataset. Comparing freq₊ and freq_- is only meaningful when the number of positive and negative records are equal. We use a coefficient based on records in the positive and negative classes, and calculate normalized frequency using Equation (3): $nfreq_(w_{j}, D_{i}) = \frac{n_{P} (i)}{n_{N} (i)} . freq_(w_{j}, D_{i})$ (3)

n_P (i) and n_N (i) represent the number of positive and negative records in D_i, respectively.

We use an equation for calculating the sentiment score for each word in the lexicon. We propose Equation (4) to calculate this value for each word: $Score (w_{j}, D_{i}) = \frac{{freq}_{+} (w_{j}, D_{i}) - nfreq_(w_{j}, D_{i})}{{freq}_{+} (w_{j}, D_{i}) + nfreq_(w_{j}, D_{i})}$ (4)

Equation (4) provides a sentiment score between –1 and +1 for each word. If the score is near 0, it means that word is more objective than subjective. Scores near +1 indicate the positivity of the word, and words with scores near –1 are more negative.

For example, if the word “love” appears 28 times in positive tweets (e.g. in 28 separate tweets) and 3 times in negative tweets (e.g. two times in one negative tweet and one time in another negative tweet) in dataset number 1, freq₊ (“ love”, D₁) =28 and freq _ (“ love”, D₁) =3. However, in our example, the training dataset is imbalanced and contains 182 positive and 177 negative tweets. Therefore, we compute the normalized frequency for tackling the imbalance issue: nfreq _ (“ love” $, D_{1}) = \frac{182}{177} * 3 = 3.084 .$ The Score is computed as follows: Score (“ love” $, D_{1}) = \frac{28 - 3.084}{28 + 3.084} = 0.802 .$ This implies that the word “love” is a very positive word.

We introduce five meta-level features to do the classification task on the datasets. The meta-features are as follows:

FPos: Sum of Score for positive words in tweet

FNeg: Sum of Score for negative words in tweet

PWords: Number of positive words in the record based on Score

NWords: Number of negative words in the record based on Score

Score: Sum of Score for all the words in tweet

In this paper, when we refer to an FBSA-generated sentiment lexicon on a dataset, it means using a sentiment lexicon generated for the whole dataset. There is an exception for the cases that we want to classify text and generate lexicons based on training datasets.

The core of the feature vector is made of the aforementioned features. An example of computing the features for a certain record is as follows. Consider a very simple lexicon, created by the FBSA method, which is demonstrated in Table 1.

Table 1

A simple lexicon created using FBSA

It	For	Is	Good	Of
+0.228	+0.307	–0.186	+0.621	–0.074

Now, each record in the dataset (each tweet) should be transformed into a feature vector. Assume a record, consisting of the following text: “It is good”. Its feature vector is calculated as shown in Table 2.

Table 2

The feature values for the record “It is good”

FPos	FNeg	PWords	NWords	Score
0.849	–0.186	2	1	0.663

Here, each record is converted into five features. Then, a model is built based on the training data, and applied on the test data. The lexicon is created solely based on the training data, and it is used to calculate features for both training and test data.

Since the datasets are from Twitter, they may be accompanied by a hashtag (#), used to indicate the keywords of the tweets, and to emphasize a word. It is interesting to know whether there is a significant difference between words with and without hashtags. Because of that, a word with hashtag and without hashtag is treated as two different words.

3.2 Additional features

We also incorporate additional features from five sentiment lexicons to improve the accuracy and f-measure. Moreover, incorporating additional features can be used to check if our lexicons complement other lexicons by improving their accuracy.

By adding any other sentiment lexicons, we add two meta-level features for each lexicon to our five features. These meta-level features are presented in Table 3 [26]. The lexicons are chosen focusing on diversity. Some of the lexicons are manually created, such as Bing Liu’s lexicon, while NRC-hashtag is created automatically. Bing Liu’s lexicon and OpinionFinder group words into positive and negative ones, but the scores in Sentiment140 have three decimal places. We also add bigram features, which are term frequencies of unigrams and bigrams.

Table 3
Features based on lexicons for classification of sentiment

Lexicon Features Descriptions

Bing Liu BP/BN Number of positive/negative words in the record

AFINN AP/AN Sum of scores of positive/negative words in the record

OpinionFinder OP/ON Number of positive/negative words in the record

NRC-hash NP/NN Sum of scores of positive/negative words in the record

Sentiment140 SP/SN Sum of scores of positive/negative words in the record

Lexicon	Features	Descriptions
Bing Liu	BP/BN	Number of positive/negative words in the record
AFINN	AP/AN	Sum of scores of positive/negative words in the record
OpinionFinder	OP/ON	Number of positive/negative words in the record
NRC-hash	NP/NN	Sum of scores of positive/negative words in the record
Sentiment140	SP/SN	Sum of scores of positive/negative words in the record

Our method uses SMO of WEKA 3.6 as our classifier. It should be noted that the HCR dataset defines the training and test datasets beforehand, and thus, no cross-validation is applied on this dataset.

4 Experimental evaluation

In this section, the datasets are introduced, and then the results of running our method are reported and discussed. Only the subjective text in the datasets (positive and negative) are considered here.

4.1 Datasets

We run FBSA on six datasets that are made of tweets [28 , 72]. These datasets are widely used for sentiment analysis: Sanders 1 , OMD and Strict OMD [72], HCR [39], STS-Test [28], and STS-Gold [71]. The Obama-McCain Debate (OMD) dataset consists of 3238 tweets [72], of which, only the tweets with a majority vote are considered. The strict version of OMD consists of the tweets in which the votes are 100 percent unanimous. The Healthcare Reform (HCR) dataset consists of a training dataset with 621 tweets and a test dataset with 665 records.

The Stanford dataset consists of a training dataset with 1.6 million tweets, and a test dataset (STS-Test) with 177 negative and 182 positive tweets that are manually labeled [28]. Several studies such as [26] just consider the STS-Test dataset. We here use the test version of Stanford dataset. We also compare our results with methods that use STS-Train for training.

Table 4 shows an overview about these datasets.

Table 4
The negative, positive and total tweets in each dataset

Sanders SOMD HCR OMD STS STS-Gold

Negative 654 569 917 (406+511) 1196 177 1402

Positive 570 347 369 (215+154) 710 182 632

Total 1224 916 1286 (621+665) 1906 359 2034

	Sanders	SOMD	HCR	OMD	STS	STS-Gold
Negative	654	569	917 (406+511)	1196	177	1402
Positive	570	347	369 (215+154)	710	182	632
Total	1224	916	1286 (621+665)	1906	359	2034

4.2 Experimental setup

A program in C# 2015 was written to implement the algorithm. Since it only calculates frequencies of words and meta-level features, its execution time is very fast. The runtimes of feature generation on different datasets are shown in Table 5.

Table 5
Runtime of feature generation in datasets (in seconds)

Dataset Sanders SOMD HCR OMD STS STS-Gold

Time 0.246 0.226 0.255 0.276 0.069 0.645

Dataset	Sanders	SOMD	HCR	OMD	STS	STS-Gold
Time	0.246	0.226	0.255	0.276	0.069	0.645

We used 10-fold cross-validation for our algorithm in the datasets, and report the average value for each of the measures.

4.3 Results

We compare the results of our algorithm (Tables 6 to 11) with state of the art methods, such as [13, 26, 41] and baseline methods, such as [21]. The baselines results are based on the meta-level features explained in Table 3 2 . The FBSA method uses FPos, FNeg, PWords, NWords and Score as meta-level features. Then we incrementally add additional features, such as meta-level features from other lexicons and bi-grams. We performed Wilcoxon Signed-Ranks test, a statistical non-parametric test to compare our results with other methods over datasets (Tables 12 and 13) for accuracy and f-score, respectively. Tables 14 and 15 also show the results of this test for comparing the FBSA method (without additional features) with bigrams and static lexicons. The results show that using only the five core FBSA features outperforms static lexicons, and is competitive with bigrams. However, classification based on bigram features is very time-consuming compared to the FBSA features.

Table 6
Results of FBSA on HCR dataset (%)

Method Accuracy Average F1

FBSA 77.29 68.31

FBSA + bigrams 78.65 68.69

FBSA + Bing Liu’s Lexicon 77.29 68.31

FBSA + Bing Liu’s Lexicon + bigrams 78.95 68.47

FBSA + AFINN 76.54 67.90

FBSA + AFINN + bigrams 78.95 69.28

FBSA + OpinionFinder 76.54 67.90

FBSA + OpinionFinder + bigrams 79.25 69.57

FBSA + Sentiment140 74.59 65.73

FBSA + Sentiment140 + bigrams 78.80 68.83

FBSA + NRC Hashtag 75.49 66.81

FBSA + NRC Hashtag + bigrams 78.00 68.99

FBSA + All 5 lexicons 74.29 65.18

FBSA + All 5 lexicons + bigrams 80.00 70.60

FBSA + All 5 lexicons + balanced train 65.86 60.85

FBSA + All 5 lexicons + balanced train + bigrams 80.15 69.33

Method	Accuracy	Average F1
FBSA	77.29	68.31
FBSA + bigrams	78.65	68.69
FBSA + Bing Liu’s Lexicon	77.29	68.31
FBSA + Bing Liu’s Lexicon + bigrams	78.95	68.47
FBSA + AFINN	76.54	67.90
FBSA + AFINN + bigrams	78.95	69.28
FBSA + OpinionFinder	76.54	67.90
FBSA + OpinionFinder + bigrams	79.25	69.57
FBSA + Sentiment140	74.59	65.73
FBSA + Sentiment140 + bigrams	78.80	68.83
FBSA + NRC Hashtag	75.49	66.81
FBSA + NRC Hashtag + bigrams	78.00	68.99
FBSA + All 5 lexicons	74.29	65.18
FBSA + All 5 lexicons + bigrams	80.00	70.60
FBSA + All 5 lexicons + balanced train	65.86	60.85
FBSA + All 5 lexicons + balanced train + bigrams	80.15	69.33

Table 7

Results of FBSA on OMD dataset (%)

Method	Accuracy	Average F1
FBSA	79.52	77.66
FBSA + bigrams	80.04	78.16
FBSA + Bing Liu’s Lexicon	80.52	78.51
FBSA + Bing Liu’s Lexicon + bigrams	81.12	79.35
FBSA + AFINN	79.73	77.89
FBSA + AFINN + bigrams	80.59	78.70
FBSA + OpinionFinder	80.05	78.10
FBSA + OpinionFinder + bigrams	80.85	78.74
FBSA + Sentiment140	80.32	78.33
FBSA + Sentiment140 + bigrams	80.23	78.02
FBSA + NRC Hashtag	79.17	77.19
FBSA + NRC Hashtag + bigrams	80.33	78.19
FBSA + All 5 lexicons	80.84	79.16
FBSA + All 5 lexicons + bigrams	81.98	79.93
FBSA + All 5 lexicons + balanced train	79.09	77.59
FBSA + All 5 lexicons + balanced train + bigrams	81.97	79.06

Table 8

Results of FBSA on SOMD Dataset (%)

Method	Accuracy	Average F1
FBSA	81.40	79.48
FBSA + bigrams	83.68	82.40
FBSA + Bing Liu’s Lexicon	83.39	82.01
FBSA + Bing Liu’s Lexicon + bigrams	85.53	84.17
FBSA + AFINN	83.77	82.47
FBSA + AFINN + bigrams	85.74	84.62
FBSA + OpinionFinder	82.47	80.65
FBSA + OpinionFinder + bigrams	84.31	83.40
FBSA + Sentiment140	83.06	81.57
FBSA + Sentiment140 + bigrams	84.13	83.19
FBSA + NRC Hashtag	81.82	80.29
FBSA + NRC Hashtag + bigrams	84.03	83.11
FBSA + All 5 lexicons	84.69	83.58
FBSA + All 5 lexicons + bigrams	86.44	85.58
FBSA + All 5 lexicons + balanced train	84.32	83.28
FBSA + All 5 lexicons + balanced train + bigrams	86.91	85.87

Table 9

Results of FBSA on STS dataset (%)

Method	Accuracy	Average F1
FBSA	79.95	79.73
FBSA + bigrams	80.97	80.79
FBSA + Bing Liu’s Lexicon	81.87	81.72
FBSA + Bing Liu’s Lexicon + bigrams	83.44	83.30
FBSA + AFINN	80.39	80.22
FBSA + AFINN + bigrams	81.62	81.45
FBSA + OpinionFinder	83.14	83.01
FBSA + OpinionFinder + bigrams	83.60	83.51
FBSA + Sentiment140	83.45	83.37
FBSA + Sentiment140 + bigrams	83.26	83.11
FBSA + NRC Hashtag	80.84	80.80
FBSA + NRC Hashtag + bigrams	81.69	81.54
FBSA + All 5 lexicons	84.85	84.71
FBSA + All 5 lexicons + bigrams	85.34	85.35
FBSA + All 5 lexicons + balanced train	83.94	83.78
FBSA + All 5 lexicons + balanced train + bigrams	81.31	80.87

Table 10

Results of FBSA on STS-Gold dataset (%)

Method	Accuracy	Average F1
FBSA	82.61	78.09
FBSA + bigrams	83.89	79.59
FBSA + Bing Liu’s Lexicon	83.26	79.12
FBSA + Bing Liu’s Lexicon + bigrams	84.49	81.21
FBSA + AFINN	84.17	80.20
FBSA + AFINN + bigrams	85.16	81.42
FBSA + OpinionFinder	82.49	78.10
FBSA + OpinionFinder + bigrams	83.87	79.78
FBSA + Sentiment140	84.57	80.76
FBSA + Sentiment140 + bigrams	85.20	81.36
FBSA + NRC Hashtag	83.16	78.85
FBSA + NRC Hashtag + bigrams	84.52	80.39
FBSA + All 5 lexicons	85.87	82.47
FBSA + All 5 lexicons + bigrams	87.21	84.17
FBSA + All 5 lexicons + balanced train	85.61	82.92
FBSA + All 5 lexicons + balanced train + bigrams	87.03	83.79

Table 11

Results of FBSA on Sanders dataset (%)

Method	Accuracy	Average F1
FBSA	81.99	81.78
FBSA + bigrams	83.61	83.41
FBSA + Bing Liu’s Lexicon	82.68	82.50
FBSA + Bing Liu’s Lexicon + bigrams	84.72	84.52
FBSA + AFINN	82.90	81.96
FBSA + AFINN + bigrams	84.41	85.03
FBSA + OpinionFinder	82.69	82.49
FBSA + OpinionFinder + bigrams	84.19	84.00
FBSA + Sentiment140	83.18	82.99
FBSA + Sentiment140 + bigrams	84.04	83.91
FBSA + NRC Hashtag	82.66	82.46
FBSA + NRC Hashtag + bigrams	83.52	83.31
FBSA + All 5 lexicons	83.81	83.58
FBSA + All 5 lexicons + bigrams	85.67	85.46
FBSA + All 5 lexicons + balanced train	83.39	83.18
FBSA + All 5 lexicons + balanced train + bigrams	84.74	84.38

Table 12

Wilcoxon Signed-Ranks test on accuracy values

Paper	Wins	Losses	Mean	Mean	Standard	Standard	Z	Asymp. Sig.	Exact Sig.
			(FBSA)	(Paper)	Deviation	Deviation		(2-tailed)	(2-tailed)
					(FBSA)	(Paper)
[13]	5	0	83.89	80.02	2.76	3.33	–2.023	0.043	0.063
[35]	2	0	85.50	78.65	0.23	1.92	–1.342	0.180	0.500
[71]	4	1	84.04	82.21	2.96	2.81	–1.753	0.080	0.375
[36]	1	0	85.67	81.20	0.00	0.00	N/A	N/A	N/A
[26]	1	1	85.50	90.85	0.23	10.25	–0.477	0.655	1.000
[38]	4	0	83.25	79.70	2.73	3.22	–1.826	0.068	0.125
[42]	3	0	83.25	80.34	2.73	3.40	–1.461	0.144	0.625
[43]	2	0	85.50	83.41	0.23	0.61	–1.342	0.180	0.500
[34]	3	0	82.44	77.18	2.70	5.64	–1.604	0.109	0.250
[37]	3	0	83.06	72.90	3.72	8.20	–1.604	0.109	0.250
[39]	3	0	82.44	74.20	2.70	9.37	–1.604	0.109	0.250
[41]	1	1	80.99	78.90	1.40	4.67	–0.447	0.655	1.000
[45]	3	0	83.06	74.52	3.73	6.74	–1.604	0.109	0.250
[40]	2	0	82.67	76.35	3.78	1.20	–1.342	0.180	0.500
[44]	2	0	83.66	70.90	2.38	2.38	–1.342	0.180	0.500
[33]	2	0	83.66	77.95	2.38	2.33	–1.342	0.180	0.500

Table 13

Wilcoxon Signed-Ranks test on f-measure values

Paper	Wins	Losses	Mean	Mean	Standard	Standard	Z	Asymp. Sig.	Exact Sig.
			(FBSA)	(Paper)	Deviation	Deviation		(2-tailed)	(2-tailed)
					(FBSA)	(Paper)
[13]	5	0	81.44	76.70	6.54	8.11	–2.023	0.043	0.063
[35]	2	0	85.40	79.50	0.08	2.12	–1.342	0.180	0.500
[71]	3	2	81.10	79.90	6.29	4.63	–1.214	0.225	1.000
[36]	1	0	85.46	81.20	0.00	0.00	N/A	N/A	N/A
[26]	1	1	85.40	90.95	0.08	10.25	–0.447	0.655	1.000
[38]	4	0	80.33	74.08	6.98	10.37	–1.826	0.068	0.125
[42]	3	1	80.33	77.97	6.98	6.41	–1.461	0.144	0.625
[34]	3	0	78.62	76.63	7.46	7.95	–1.604	0.109	0.250
[37]	3	0	78.23	67.98	6.94	10.80	–1.604	0.109	0.250
[45]	3	0	78.23	67.78	6.94	9.34	–1.604	0.109	0.250

Table 14

Wilcoxon Signed-Ranks test on accuracy values for FBSA without additional features

Method	Wins	Losses	Mean	Mean	Standard	Standard	Z	Asymp.	Exact
			(FBSA)	(Method)	Deviation	Deviation		Sig.	Sig.
					(FBSA)	(Method)		(2-tailed)	(2-tailed)
AFINN	6	0	80.46	72.51	1.95	5.08	–2.201	0.028	0.031
Bing Liu’s Lexicon	6	0	80.46	71.86	1.95	4.62	–2.201	0.028	0.031
OpinionFinder	6	0	80.46	67.62	1.95	6.25	–2.201	0.028	0.031
Sentiment140	5	1	80.46	75.82	1.95	8.81	–1.572	0.116	0.219
NRC Hashtag	6	0	80.46	69.24	1.95	5.71	–2.201	0.028	0.031
Bigrams	3	3	80.46	80.23	1.95	1.92	–0.734	0.463	1.000
All Lexicons	3	3	80.46	80.85	1.95	5.72	–0.105	0.917	1.000

Table 15

Wilcoxon Signed-Ranks test on f-measure values for FBSA without additional features

Method	Wins	Losses	Mean	Mean	Standard	Standard	Z	Asymp.	Exact
			(FBSA)	(Method)	Deviation	Deviation		Sig.	Sig.
					(FBSA)	(Method)		(2-tailed)	(2-tailed)
AFINN	6	0	77.51	62.96	4.73	12.02	–2.201	0.028	0.031
Bing Liu’s Lexicon	6	0	77.51	61.39	4.73	11.78	–2.201	0.028	0.031
OpinionFinder	6	0	77.51	52.36	4.73	52.37	–2.201	0.028	0.031
Sentiment140	5	1	77.51	66.40	4.73	19.01	–1.363	0.173	0.219
NRC Hashtag	6	0	77.51	53.31	4.73	15.45	–2.201	0.028	0.031
Bigrams	4	2	77.51	76.72	4.73	5.77	–1.153	0.249	0.688
All Lexicons	4	2	77.51	77.63	4.73	8.30	–0.314	0.753	0.688

The comparisons have these implications: (i) the five core features extracted using the FBSA method (the FBSA row) generally outperform the method of using all of the static lexicons, combined. This means that using a dynamic lexicon which can be created very fast, can outperform not only single static lexicons, but the combination of static lexicons as well. This can be contributed to the dynamic lexicons being domain specific, which calls for creation of dynamic lexicons; and (ii) the dynamic lexicons are useful additions to the existing lexicons, and significantly improve their accuracy values.

In one of the experiments, we have used balanced training datasets. To make them balanced, we applied sampling with replacement on the training datasets. The weights of SMO show that FPos and Score features have more weights in classifying tweets than the others.

The intersection of all the lexicons is shown in Fig. 1. It shows the number of words that lexicons generated by FBSA and other static lexicons. The coverage of AFINN and Bing Liu’s lexicon on words of Sanders and HCR datasets is less than 10 percent. The high accuracy and f-measure results show that there can be implicit sentiment words. For example, the score of the word “for” is usually mildly positive. It can be inferred that when “for” is used in a sentence, it tends to be positive. This effect is shown in [46] as well. However, in numerous research papers, this word is omitted. Saif et al. [73] and Keshavarz and Saniee [46] did not omit the stop-words.

Fig.1

Intersection of words in lexicons.

The concordance of lexicons is shown in Fig. 2. It shows that how sentiment lexicons agree on the sentiment direction of the common words. The bolder color shows higher concordance.

The effect of using hashtags alongside words can be seen in Table 15. This table shows the values of words with and without hashtags. It is better to treat words with and without hashtags as different words.

Fig.2

Concordance of lexicons over common words.

Table 16

The score of words with and without hashtags

Word	Score	Dataset
#siri/siri	–0.506/0.040	Sanders
#ios5/ios5	–0.287/–0.524	Sanders
#economy/economy	–1.000/–0.151	OMD
#tcot/tcot	–0.492/–1.000	HCR

One particular problem is that the polarity of a word can be dependent of the context. For example, the word “mouse” may be a negative word in a hotel review; but it can be an objective word in the computer context. This is a problem that most domain-dependent works suffer from. However, using domain-specific lexicons can be useful. For example, a hotel manager can create lexicons based on hotel reviews, in which the word “mouse” gets a negative score. This lexicon can be used for hotel reviews.

5 Conclusion and future works

In our paper, we performed sentiment analysis on twitter. We have considered frequencies of words in positive and negative records, and then calculated sentiment scores for them. Then, five meta-level features based on these lexicons were extracted.

We have also incorporated other lexicon-based and corpus-based features to improve the accuracy. This shows that the lexicons we have created, existing lexicons, and bigrams, complement each other. Our results outperform other previous methods in three of six datasets in f-measure.

The advantages of the proposed method are as follows: (i) desirable runtime, in which the lexicons are built and the features are calculated very fast; (ii) high accuracy and f-measure, especially when used alongside other lexicon features; (iii) an insight about words; and (iv) the ability to use different classifiers.

We want to explore low power algorithms for sentiment analysis on social media, for mood detection and stress monitoring. Since the runtime of algorithm is very low, it is useful to get integrated into small devices that can benefit from sentiment analysis such as text that users type through their smartphones. Smartphones are capable of collecting contextual information [74], and individuals use them to access to their social media [75]. Another direction for the future work is to create a low power algorithms (low power algorithms are requirements for these devices [76]), that can accurately extract lexicons and operates on users smartphones. This can have several application varied from mood detection and stress monitoring [77] to quantified self systems.

The fast runtime of the algorithm enables considering the timing aspect. Because of the dynamic nature of the social media, the words meanings may change overtime, such as in events that cause peaks in Twitter. The important role of timing in social media has been addressed in [79, 80] as well. The lexicons can be updated very quickly in the events [46].

Footnotes

The bold value represents the best value.

Acknowledgments

This work was supported by Iran National Science Foundation (INSF) under grant number 93036378.

References

Liu

, Sentiment analysis and opinion mining, Morgan & Claypool Publishers, 2012.

Pak

and Paroubek

, Twitter as a corpus for sentiment analysis and opinion mining, LREc10 (2010), 1320–1326.

Agarwal

, Xie

, Vovsha

, Rambow

and Passonneau

, Sentiment analysis of twitter data, in Proceedings of the Workshop on Languages in Social Media, LSM’11, Association for Computational Linguistics, Stroudsburg, PA, USA, 2011, pp. 30–38.

Wang

, Wei

, Liu

, Zhou

and Zhang

, Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach, In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, ACM, 2011, pp. 1031–1040.

Maynard

, Bontcheva

and Rout

, Challenges in developing opinion mining tools for social media, Proceedings of the@ NLP Can u Tag# Usergeneratedcontent, 2012, pp. 15–22.

Momeni

, Cardie

and Diakopoulos

, A survey on assessment and ranking methodologies for user-generated content on the web, ACM Computing Surveys (CSUR)48(3) (2015), 41.

Kouloumpis

, Wilson

and Moore

, Twitter sentiment analysis: The good the bad and the omg!, Icwsm11 (2011), 538–541.

Turney

P.D.

, Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 02, Association for Computational Linguistics, Stroudsburg, PA, USA, 2002, pp. 417–424.

Pang

, Lee

and Vaithyanathan

, Thumbs up? Sentiment classification using machine learning techniques, Proceedings of EMNLP, 2002, pp. 79–86.

10.

, Macdonald

, He

and Ounis

, An effective statistical approach to blog post opinion retrieval, Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM’08, 2008, pp. 1063–1072.

11.

Melville

, Gryc

and Lawrence

R.D.

, Sentiment analysis of blogs by combining lexical knowledge with text classification, Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’09, ACM, New York, NY, USA, 2009, pp. 1275–1284.

12.

Balahur

, Steinberger

, Kabadjov

, Zavarella

, van der Goot

, Halkia

, Pouliquen

and Belyaeva

, Sentiment analysis in the news, in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 2010.

13.

da Silva

N.F.F.

, Hruschka

E.R.

and Hruschka

E.R.

, Tweet sentiment analysis with classifier ensembles, Decision Support Systems66 (2014), 170–179.

14.

Jiang

, Yu

, Zhou

, Liu

and Zhao

, Target-dependent twitter sentiment classification, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, 2011, pp. 151–160.

15.

Barbosa

and Feng

, Robust sentiment detection on twitter from biased and noisy data, In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, 2010, pp. 36–44.

16.

Zhao

, Liu

and Wang

, Adding redundant features for CRFs-based sentence sentiment classification, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008, pp. 117–126.

17.

Shen

, Li

, Zheng

, Ren

and Cheng

, Emotion mining research on micro-blog, In Web Society, 2009, 1st IEEE Symposium on SWS’09, 2009, pp. 71–75.

18.

Cambria

and White

, Jumping NLP curves: A review of natural language processing research, IEEE Comput Intell Mag9(2) (2014), 48–57.

19.

Chaffar

and Inkpen

, Using a heterogeneous dataset for emotion analysis in text, Advances in Artificial Intelligence (2011), 62–67.

20.

Godbole

, Srinivasaiah

and Skiena

, Large-scale sentiment analysis for news and blogs, ICWSM7(21) (2007), 219–222.

21.

Nielsen

, A new anew: Evaluation of a word list for sentiment analysis in microblogs, in Proceedings of the ESWC2011 Workshop on ‘Making Sense of Microposts’: Big Things Come in Small Packages, 2011.

22.

Zimbra

, Ghiassi

and Lee

, Brand-Related Twitter Sentiment Analysis Using Feature Engineering and the Dynamic Architecture for Artificial Neural Networks, In 2016 49th Hawaii International Conference on System Sciences (HICSS), 2016, pp. 1930–1938.

23.

Saif

, He

, Fernandez

and Alani

, Contextual semantics for sentiment analysis of Twitter, Information Processing & Management52(1) (2016), 5–19.

24.

, Castellanos

, Dayal

and Zhai

, Automatic construction of a context-aware sentiment lexicon: An optimization approach, In Proceedings of the 20th International Conference on World Wide Web, 2011, pp. 347–356.

25.

Platt

J.C.

, 12 fast training of support vector machines using sequential minimal optimization, Advances in Kernel Methods (1999), 185–208.

26.

Bravo-Marquez

, Mendoza

and Poblete

, Meta-level sentiment models for big social data analysis, Knowledge-Based Systems69 (2014), 86–99.

27.

Carvalho

, Sarmento

, Silva

M.J.

and de

, Oliveira, Clues for detecting irony in user-generated contents: Oh...!! it’s so easy;-), in Proceeding of the 1st International CIKM Workshop on Topic-sentiment Analysis for Mass Opinion, 2009.

28.

, Bhayani

and Huang

, Twitter sentiment classification using distant supervision, Technical report Stanford University, 2010.

29.

Liu

, Li

and Guo

, Emoticon smoothed language models for Twitter sentiment analysis, in Proceedings of the 26th AAAI Conference on Artificial Intelligence, 2012.

30.

Gonçalves

, Araújo

, Benevenuto

and Cha

, Comparing and combining sentiment analysis methods, in Proceedings of the First ACM Conference on Online Social Networks, COSN ’13, ACM, New York, NY, USA, 2013, pp. 27–38.

31.

Zhang

, Ghosh

, Dekhil

, Hsu

and Liu

, Combining lexicon-based and learning based methods for twitter sentiment analysis, HP Laboratories, Technical Report HPL-2011, 89, 2011.

32.

Mohammad

, Kiritchenko

and Zhu

, Nrc-Canada: Building the state-of-the-art in sentiment analysis of tweets, in Proceedings of the Seventh International Workshop on Semantic Evaluation Exercises (SemEval-2013), Atlanta, Georgia, USA, 2013.

33.

, Tang

and Liu

, Exploiting social relations for sentiment analysis in microblogging, in Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 2013, pp. 537–546.

34.

Saif

, He

and Alani

, Semantic sentiment analysis of twitter, in Proceedings of the 11th International Conference on The Semantic Web—Volume Part I, ISWC’12, Springer-Verlag, Berlin, Heidelberg, 2012, pp. 508–524.

35.

Bravo-Marquez

, Mendoza

and Poblete

, Combining strengths, emotions and polarities for boosting twitter sentiment analysis, in Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, ACM, 2013.

36.

Kaewpitakkun

, Shirai

and Mohd

, Sentiment Lexicon Interpolation and Polarity Estimation of Objective and Out-Of-Vocabulary Words to Improve Sentiment Classification on Microblogging, in Proceedings of 28th Pacific Asia Conference on Language, Information and Computation, 2014, pp. 204–213.

37.

Saif

, He

, Fernandez

and Alani

, Adapting Sentiment Lexicons Using Contextual Semantics for Sentiment Analysis of Twitter, The Semantic Web: ESWC 2014 Satellite Events, Springer International Publishing, 2014, pp. 54–63.

38.

Coletta

L.F.S.

, da Silva

N.F.F.

, Hruschka

E.R.

and Hruschka

E.R.

Jr , Combining Classification and Clustering for Tweet Sentiment Analysis, in Proceedings of 2014 Brazilian Conference on Intelligent Systems (BRACIS), 2014.

39.

Speriosu

, Sudan

, Upadhyay

and Baldridge

, Twitter polarity classification with label propagation over lexical links and the follower graph, Proceedings of the First Workshop on Unsupervised Learning in NLP, EMNLP’11, Association for Computational Linguistics, 2011, pp. 53–63.

40.

Carvalho

, Prado

and Plastino

, A Statistical and Evolutionary Approach to Sentiment Analysis, Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on, Vol. 2, IEEE, 2014.

41.

T.J.

, Semi-supervised microblog sentiment analysis using social relation and text similarity, Big Data and Smart Computing (BigComp), 2015 International Conference on, 2015.

42.

Saif

, He

, Fernandez

and Alani

, Semantic patterns for sentiment analysis of Twitter, In The Semantic Web–ISWC, Springer International Publishing, 2014, pp. 324–340.

43.

Baecchi

, Uricchio

, Bertini

and Del Bimbo

, A multimodal feature learning approach for sentiment analysis of social network multimedia, Multimedia Tools and Applications (2015), 1–19.

44.

, Tang

, Gao

and Liu

, Unsupervised sentiment analysis with emotional signals, In Proceedings of the 22nd international conference on World Wide Web, 2013, pp. 607–618.

45.

Saif

, Fernandez

, He

and Alani.

, Senticircles for contextual and conceptual semantic sentiment analysis of twitter, In The Semantic Web: Trends and Challenges, 2014, pp. 83–98.

46.

Keshavarz

and Abadeh

M.S.

, ALGA: Adaptive lexicon learning using genetic algorithm for sentiment analysis of microblogs, Knowledge-Based Systems122 (2017), 1–16.

47.

Pandey

A.C.

, Rajpoot

D.S.

and Saraswat

, Twitter sentiment analysis using hybrid cuckoo search method, Information Processing & Management53(4) (2017), 764–779.

48.

Rout

J.K.

, Choo

K.R.

, Dash

A.K.

, Bakshi

, Jena

S.K.

and Williams

K.L.

, A model for sentiment and emotion analysis of unstructured social media text, Electronic Commerce Research (2017), 1–19.

49.

Segura-Bedmar

, Quirós

and Martinez

, Exploring Convolutional Neural Networks for Sentiment Analysis of Spanish tweets.

50.

Kosala

and Blockeel

, Web mining research: A survey, ACM Sigkdd Explorations Newsletter2(1) (2000), 1–15.

51.

Mobasher

, Dai

, Luo

, Sun

and Zhu

, Integrating web usage and content mining for more effective personalization, In International Conference on Electronic Commerce and Web Technologies, SpringerBerlin Heidelberg, pp. 165–176.

52.

Kao

H.Y.

, Lin

S.H.

, Ho

J.M.

and Chen

M.S.

, Mining web informative structures and contents based on entropy analysis, IEEE Transactions on Knowledge and Data Engineering16(1) (2004), 41–55.

53.

, Bao

, Yu

, Fei

and Su

, Towards effective browsing of large scale social annotations, In Proceedings of the 16th International Conference on World Wide Web, ACM, 2007, pp. 943–952.

54.

Silva

M.J.

, Martins

, Chaves

, Afonso

A.P.

and Cardoso

, Adding geographic scopes to web resources, Computers, Environment and Urban Systems30(4) (2006), 378–399.

55.

Dragoni

, Tettamanzi

A.G.

and da Costa Pereira

, A fuzzy system for concept-level sentiment analysis, In Semantic Web Evaluation Challenge (2014), 21–27.

56.

Lau

R.Y.K.

, Li

and Liao

S.S.Y.

, Social analytics: Learning fuzzy product ontologies for aspect-oriented sentiment analysis, Decision Support Systems65 (2014), 80–94.

57.

Indhuja

and Reghu

R.P.C.

, Fuzzy logic based sentiment analysis of product review documents, Computational Systems and Communications (ICCSC), 2014 First International Conference on IEEE, 2014.

58.

Jusoh

and Alfawareh

H.M.

, Applying fuzzy sets for opinion mining, Computer Applications Technology (ICCAT), 2013 International Conference on IEEE, 2013.

59.

Rustamov

, Mustafayev

and Clements

M.A.

, Sentiment analysis using Neuro-Fuzzy and Hidden Markov models of text, Southeastcon, 2013 Proceedings of IEEE, IEEE, 2013.

60.

Andreevskaia

and Bergler

, Mining WordNet for a fuzzy sentiment: Sentiment tag extraction from WordNet glosses, EACL6 (2006).

61.

Trung

D.N.

and Jung

J.J.

, Sentiment analysis based on fuzzy propagation in online social networks: A case study on TweetScope, Computer Science and Information Systems11(1) (2014), 215–228.

62.

and Wang

, Chinese sentence-level sentiment classification based on fuzzy sets, Proceedings of the 23rd International Conference on Computational Linguistics, 2010.

63.

Liu

, Nie

and Wang

, Toward a fuzzy domain sentiment ontology tree for sentiment analysis, Image and Signal Processing (CISP), 2012 5th International Congress on IEEE, 2012.

64.

O’Connor

, Balasubramanyan

, Routledge

B.R.

and Smith

N.A.

, From tweets to polls: Linking text sentiment to public opinion time series, Proceedings of the International AAAI Conference on Weblogs and Social Media, 2010.

65.

Bradley

M.M.

and Lang

P.J.

, Affective Norms for English Words (ANEW) Instruction Manual and Affective Ratings, Technical Report C-1, The Center for Research in Psychophysiology University of Florida, 2009.

66.

Mohammad

S.M.

and Turney

P.D.

, Crowdsourcing a word–emotion association lexicon, Computational Intelligence29(3) (2013), 436–465.

67.

Wilson

, Wiebe

and Hoffmann

, Recognizing contextual polarity in phraselevel sentiment analysis, in Proceedings of Human Language Technologies Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), Vancouver, British Columbia, Canada, 2005, pp. 347–354.

68.

Esuli

and Sebastiani

, Sentiwordnet: A publicly available lexical resource for opinion mining, in Proceedings of the 5th Conference on Language Resources and Evaluation, 2006, pp. 417–422.

69.

Baccianella

, Esuli

and Sebastiani

, Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining, in Proceedings of the Seventh International Conference on Language Resources and Evaluation, Valletta, Malta, 2010, pp. 2200–2204.

70.

Taboada

, Brooke

, Tofiloski

, Voll

and Stede

, Lexicon-based methods for sentiment analysis, Computational linguistics37(2) (2011), 267–307.

71.

Saif

, Fernandez

, He

and Alani

, Evaluation datasets for twitter sentiment analysis: A survey and a new dataset, the STS-Gold, ESSEM@AI*IA, Vol 1096 of CEUR Workshop Proceedings, CEUR-WS.org, 2013, pp. 9–21.

72.

Shamma

D.A.

, Kennedy

and Churchill

E.F.

, Tweet the debates: Understanding community annotation of uncollected sources, In WSM?09: Proceedings of the International Workshop on Workshop on Social, 2009.

73.

Rawassizadeh

, Momeni

, Dobbins

, Gharibshah

and Pazzani

, Scalable daily human behavioral pattern mining from multivariate temporal data, IEEE Transactions on Knowledge and Data Engineering28(11) (2016), 3098–3112.

74.

Rawassizadeh

, Momeni

, Dobbins

, Mirza-Babaei

and Rahnamoun

, Lesson learned from collecting quantified self information via mobile and wearable devices, Journal of Sensor and Actuator Networks4(4) (2015), 315–335.

75.

Rawassizadeh

, Tomitsch

, Nourizadeh

, Momeni

, Peery

, Ulanova

and Pazzani

, Energy-efficient integration of continuous context sensing and prediction into smartwatches, Sensors15(9) (2015), 22616–22645.

76.

Momeni

, Kalchgruber

, Ramsauer

and Rawassizadeh

, Leveraging Social Affect for Identifying Individual Mood, Proceedings of SEMANTiCS, Vienna, Austrian, 2015.

77.

Saif

, Fernandez

, He

and Alani

, On stopwords, filtering and data sparsity for sentiment analysis of Twitter, In Proceedings of the 9th Language Resources and Evaluation Conference (LREC), 2014, pp. 810–817.

78.

De Maio

, Fenza

, Loia

and Parente

, Time aware knowledge extraction for microblog summarization on twitter, Information Fusion28 (2016), 60–74.

79.

Cuzzocrea

, De Maio

, Fenza

, Loia

and Parente

, Towards OLAP analysis of multidimensional tweet streams, In Proceedings of the ACM Eighteenth International Workshop on Data Warehousing and OLAP, ACM, 2015, pp. 69–73.

80.

Stilo

and Velardi

, Time makes sense: Event discovery in twitter using temporal similarity, Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on, Vol. 2, IEEE, 2014.

Accurate frequency-based lexicon generation for opinion mining

Abstract

Keywords

1 Introduction

2 Related work

2.1 Sentiment analysis in twitter

2.2 Web resources

2.3 Fuzzy sentiment analysis

2.4 Sentiment lexicons

3 Frequency-based sentiment analysis (FBSA)

3.1 The proposed method

4.1 Datasets

Table 4 The negative, positive and total tweets in each dataset Sanders SOMD HCR OMD STS STS-Gold Negative 654 569 917 (406+511) 1196 177 1402 Positive 570 347 369 (215+154) 710 182 632 Total 1224 916 1286 (621+665) 1906 359 2034

Table 5 Runtime of feature generation in datasets (in seconds) Dataset Sanders SOMD HCR OMD STS STS-Gold Time 0.246 0.226 0.255 0.276 0.069 0.645

Footnotes

Acknowledgments

References

Table 4
The negative, positive and total tweets in each dataset

Sanders SOMD HCR OMD STS STS-Gold

Negative 654 569 917 (406+511) 1196 177 1402

Positive 570 347 369 (215+154) 710 182 632

Total 1224 916 1286 (621+665) 1906 359 2034

Table 5
Runtime of feature generation in datasets (in seconds)

Dataset Sanders SOMD HCR OMD STS STS-Gold

Time 0.246 0.226 0.255 0.276 0.069 0.645