Sentiment analysis using lexico-semantic features

Abstract

Sentiment analysis of the text deals with the mining of the opinions of people from their written communication. With the increasing usage of online social media platforms for user interactions, abundant opinionated textual data emerges. Therefore, it leads to increased mining of opinions and sentiments and hence greater interest in sentiment analysis. The article introduces the novel Lexico-Semantic features and their use in the sentiment polarity task of English language text. These features are derived using the semantic extension of the lexicons by employing sentiment lexicons and semantic models. These features make data sample size consistent when used in deep learning settings, thereby eliminating the zero padding. For evaluation, we use different semantic models and lexicons to determine the role and impact of Lexico-Semantic features in classification performance. These features, along with the other features, are used to train the different classifiers. Our experimental evaluation shows that introducing Lexico-Semantic features to various state-of-the-art methods of both machine and deep learning improves the overall performance of classifiers.

Keywords

Lexico-semantic features sentiment analysis sentiment lexicon supervised learning

1. Introduction

Sentiment analysis or opinion analysis is the research field in cognitive computing used for effective mining and leveraging of individual’s opinion or sentiments towards issues, such as product use, organisation topics or their attributes [1]. Sentiment analysis provides insights about the perception of people towards some specific events, products or services.

Sentiment analysis uses machine learning, natural language processing (NLP), data mining and computational linguistics. It has a deeper connection with sociology and psychology, although it emerged in the early 1950s but started gaining more considerable interest since 2005, with the widespread usage of social media [2 –4]. The volume of opinionated data is increasing since the inception and growth of online social media networks; sentiment analysis becomes more relevant and desirable.

Sentiment analysis is leveraged across various domains ranging from commercial use to social–political forums. On the commercial end, both customers and the sellers use it. Customers check user reviews before buying anything. Online reviews help e-commerce sites to obtain valuable feedback of their products and improve their products and services. From political end, the political parties leverage the public sentiments about their policies. Online social media users typically share their political opinions with their network, and these conversations are mined for public sentiments towards political parties. Moreover, sociopolitical events are controlled by efficient sentiment analysis systems to maintain the public security [5].

Typically, sentiment analysis engines receive textual data stream as input, look for different patterns and then classify the text into positive or negative polarity based on identified patterns. Automatic sentiment analyzers use computer programs to classify the text. The methods that the automatic analyzers use are categorised into three groups, namely supervised, unsupervised and semi-supervised. The supervised learning algorithms learn from the sufficient number of training examples annotated with sentiment labels, leading to the generation of trained models that are used to infer unseen data for classification into different classes. These supervised algorithms produce excellent results. However, they require abundant training data that are difficult to create. Unsupervised algorithms employ sentiment lexicons that are then used to bootstrap the classification task. Usually, techniques such as clustering and bagging are used by these classes of algorithms.

Both these approaches of sentiment analysis rely on the presence of affective words in the text. However, they lack the critical component of any textual form of communication, that is, the semantics of text. Some advances are made on using semantics for sentiment analysis [6 –11]. The approach used by previous works [7 –10] uses Framester API¹ and BabelNet synsets to obtain semantic frames and then replaces the data set using these frames, and thus obtain semantic extensions of sentence in the data set. Saif et al. [11] use AlchemyAPI and Zemanta API to obtain semantic frames for the text. The main drawback of both these approaches is that they rely on the external API for obtaining semantic frames. These semantic frames are then used to replace the original data set. This extension is computationally expensive. Also, this approach may fail in cases where an empty semantic frame is obtained for data in the data set.

We propose to use semantics through sentiment lexicons and distributional semantic models. We incorporate sentiment lexicons and semantic models to obtain new features during feature engineering. We use these expanded features to train sentiment classifiers. Our system uses the data set in its original form instead of relying on external APIs and hence, overcomes the drawbacks of previous studies that change the data set using semantic frames.

The sentiment lexicons are rich source of polarity words. The lexicon contains lists of positive and negative words that describe the polarity. However, the scope of these lexicons is limited and domain-dependent; for example, the National Research Council Canada (NRC) hashtag lexicon is mainly useful for social media posts [12,13]. Thus, to improve lexicon coverage and make it domain-independent, we propose using the semantic extension of lexicon using distributional semantic models. The distributional semantic models work on distributional hypothesis, which states that the words used in the same context have similar meanings and can infer and correlate the words based on their usage.

In the classification tasks, if lexicons are used in their original form, they contribute as lexical features. We propose using semantic extensions of the lexicons to obtain new features, which we call the Lexico-Semantic features that aid in sentiment classification tasks. We use these features to make the data of uniform length when used with deep learning systems. In deep learning settings, data need to be of uniform length and usually zero padding is employed. The zero padding leads to bias, we augment the data with the semantically rich sentiment words instead of adding zeros. The main motive is to study how Lexico-Semantic features enhance the accuracy in sentiment classification.

Our proposed approach consists of three main steps: (1) extension of the lexicon, (2) capturing Lexico-Semantic features and (3) classification using Lexico-Semantic features into binary sentiment polarity. Our usage of Lexico-Semantic features allows us to achieve cross-data set generalisation [14]. To validate our approach’s applicability in different domains, we evaluated our approach on eight different state-of-art sentiment data sets. The data sets include reviews, social media tweets and comments. We also use different classifiers and lexicons to investigate the performance of Lexico-Semantic features. We compare our method with other semantic approaches of sentiment analysis and found our approach outperforming them.

In this article, we also investigate whether the usage of different sentiment lexicons has any effect on the generation of Lexico-Semantic features. In particular, we try to answer the following research questions:

How beneficial are the Lexico-Semantic features for boosting the performance of the classification task?

Is there any effect on the performance of the classifier if we use different lexicons?

The rest of the article is organised into following sections: section ‘Related works’ discusses the methods, lexicons and data sets used in sentiment analysis, section ‘Methodology’ discusses the methodology employed, section ‘Evaluation’ discusses experiments and results obtained, section ‘Discussion’ includes discussion, and section ‘Conclusions and future work’ describes conclusions and future work.

2. Related works

In this section, we emphasise on the various aspects of sentiment analysis and highlight the drawbacks and benefits of each area to draw conclusions and determine the outstanding issues. On the one hand, we present the review of methods and data sets, and on the other hand, we analyse the lexicons and semantic approaches employed in sentiment analysis.

2.1. Supervised learning

Supervised learning aims to infer unseen data from a sufficient number of training examples that are annotated by human annotators. Several widely used supervised algorithms are as follows.

2.1.1. Decision tree algorithms

In the sentiment classification tasks, decision trees are commonly employed. A decision tree is a tree with branches that leads to many classification paths. For sentiment classification of English documents, Phu et al. [15] use the Iterative Dichotomiser 3 (ID3) algorithm. They use training examples to build an ID3 tree that classifies text into positive, negative and neutral categories.

2.1.2. Rule-based algorithms

For the sentiment polarity detection task, this method employs if-then-else with lexicon. The rules classify text using the rule-sets that are related to words of the lexicon. Asghar et al. [16] developed a lexicon based on public product feedback. The lexicon is divided into positive and negative labels to identify and classify sentiments in text using the rule-based method. This method is simple to use but has limited scope and results in classification bias. The lexicon expansion can improve this strategy.

2.1.3. Support vector machine

In the literature, support vector machines (SVMs) are often used for classification task. The different features are extracted for the training of classifiers. Go et al. [17] employed n-grams and part-of-speech (POS) tags to train the binary SVM for the detection of sentiment polarity of tweets. Gu et al. [18] employ v-parameters to generalise the support vectors. Balabantaray et al. [19] used a multi-class SVM for emotion mining of Twitter data. Huq et al. [20] use SVM for the classification of the Twitter data set into positive and negative polarities. Liu et al. [21] employ the one-vs-one method with the multi-class SVM, and the features are identified by information gain.

2.2. Unsupervised algorithms

Unsupervised learning is an observed learning that employs examples without labels to identify patterns rather than sampled supervised learning. Some of the most popular unsupervised methods used for sentiment analysis are described in the following.

2.2.1. Clustering

Clustering is the method for arranging data into clusters or groups wherein elements in the same group are highly similar and elements in two different groups are highly distinct. The lexicon is composed of two sets of positive and negative words employed as cluster centroids, and sentences are mapped for polarities based on similarities to the centroids. Suresh and Raj [22] employed a fuzzy clustering for sentiment classification of the tweets [23]. Phu et al. [24] used parallel network, and also the Hadoop platform with fuzzy c-means for sentiment identification in big data. Hassan et al. [25] employed density-based clustering to detect outliers and produced an effective sentiment detection model. Riaz et al. [26] used the k-means clustering at the phrase level and mapped sentences to different clusters based on the coherence of centroid and the sentences [27].

Wu et al. [28] created an opinion-flow system for visual analysis by Bayesian rose tree and stacked topic tree, which is used for training opinion flow among Twitter users. Vaziripour et al. [29] employed cluster merging to detect the change in user sentiments over time. Data are organised in a tree-like cluster structure called a dendrogram in hierarchical clustering [30]. The dendrograms are useful for determining the appropriate level of clustering for a particular process.

2.3. Artificial neural networks and deep learning

The artificial neural networks (ANNs) act in the same way as the human central nervous system does. Yessenalina and Cardie [31] utilised iterative matrix multiplication to represent all the words of the document as matrices. Tang et al. [32] employed neural networks in a bottom-up strategy to learn vector-based document representation. They used gated recurrent neural network (RNN) on IMDB and Yelp data sets for document-level sentiment classification.

Deep learning is based on learning data representations rather than being task-specific and falls in the machine learning paradigm. Bespalov et al. [33] employed latent semantic analysis (LSA) to initialise word embeddings and represented the documents as weighted n-grams for document-level sentiment analysis. Glorot et al. [34] employed auto-encoders for sentiment analysis. Hermann and Blunsom [35] used recursive auto-encoder and combinatory categorical encoders for the sentiment classification. Bengio [36] employed continuous representations of words as a feature for sentiment detection. Tang et al. [37] acquired Sentiment-Specific Word Embeddings (SSWE) by training their system on 10 million tweets containing both positive and negative words. They proposed the ‘COOOOLL’ sentiment analysis system, which combined the SSWE and handcrafted features. Ombabi et al. [38] use one-layer convolutional neural network (CNN) architecture and two-layered long short-term memory (LSTM). The features generated by CNNs and LSTM are fed to SVM classifier for classification. Maqsood et al. [39] use stock exchange-related tweets to determine event sentiments by employing deep learning, linear regression and support vector regression. Yang et al. [40] proposed sentiment analysis model based on sentiment lexicon, CNN and bidirectional gated recurrent unit.

2.4. Ensemble learning

Ensemble learning combines multiple learning algorithms to produce better results. The existing machine learning techniques are integrated with a deep layering strategy for effective classification and thus reducing error rates. Araque et al. [41] created a sentiment analysis system by the word embedding model and linear machine learning model. They also employed models to integrate the deep- and surface-level features. Akhtar et al. [42] employed an ensemble classifier using cascaded features for aspect-based sentiment classification.

2.5. Sentiment lexicon

The lexicons are the principal part of unsupervised learning techniques used for sentiment analysis. Furthermore, lexicon-based features are used for sentiment detection in both supervised and unsupervised techniques. Thus, the lexicon is a central component for an efficient sentiment detection process. Sentiment lexicons are a collection of polarity words annotated as per their polarity orientations, for example, Afinn lexicon uses integer between −5 (negative) and +5 (positive) to describe the polarity of the words.

Lexicons are created using three approaches, namely manual approach, dictionary approach and corpus-based approach. In the manual approach, all the entries in the lexicon are added manually by the compiler of the lexicon. In dictionary-based approach, compiler of lexicon starts with a small set of words that are known to have positive and negative polarities. Then these words are searched for synonyms in WordNet [43] or other online dictionaries. The process continues till no new words can be added to the lexicon [44,45]. In literature, we wind the following commonly used sentiment lexicons.

General Inquirer [46]: General Inquirer is the oldest sentiment lexicon available. Harvard IV-4 and Laswell’s dictionary are used to derive different category words. As far as valence is concerned, there are 1915 positive and 2291 negative words in it.

SentiWordNet [47]: SentiWordNet is a lexical recourse built using the synsets of WordNet. There are two versions of SentiWordNet available, 1.0 [47] and 3.0 [48], built using the two versions of WordNet 2.0 and 3.0, respectively. There are three annotations to the synsets: positive, negative and neutral.

NTUSD [49]: National Taiwan University Sentiment Dictionary has 8726 negative words and 2812 positive words. It is constructed automatically by translating General Inquirer and Chinese network sentiment dictionary.

Bing Liu’s opinion lexicon [50]: This lexicon is widely used and created using manual annotations. There are 2006 positive and 4783 negative words in the lexicon.

Sentiment140 [51]: The Sentiment140 lexicon is a list of words and their associations with positive and negative sentiment. The lexicon has both unigrams and bigrams with positive and negative scores for each term.

WordNet Affect [45]: WordNet Affect created a lexical extension of affective words using WordNet [43] synonyms called WordNet affect. WordNet affect is a subset of synsets of affective words built over the WordNet. It is organised as a collection of synsets of affective labels (a-label).

Emolex [52]: Emolex created NRC word-emotion lexicon also called Emolex of 14,000 English words annotated with Plutchik’s eight basic emotions and two sentiments: positive and negative. The annotations for each lexical entry are done using Amazon Mechanical Turk (AMT).

AFFIN [53]: AFFIN has 2477 words and phrases. There are 1610 negative words and 867 positive words.

MPQA [54]: Multi-Perspective Question Answering is a subjectivity lexicon developed automatically from several sources. There are 4914 negative words, 2721 positive words and 571 neutral words in the MPQA lexicon.

LIWC [55]: The LIWC2015 is the master lexicon composed of LIWC2001, LIWC2007 and LIWC 2015. It has 6400 words, word stems and emoticons. The lexicon is arranged in a way that, for each word, there exists one or more categories corresponding to that word.

Table 1 summarises the details about the lexicon.

Table 1.

Sentiment lexicons.

Lexicon	Size	Positive	Negative	Others	Type
General Inquirer	4206	1915	2291	None	Manual
NTUSD	11,538	2812	8726	None	Manual
Bing Liu’s opinion	6789	2006	4783	None	Manual
Emolex	14,000	1915	2291	9794	AMT
AFFIN	2477	1610	867	None	Manual
MPQA	8206	2721	4914	571	Manual

NTUSD: National Taiwan University Sentiment Dictionary; AMT: Amazon Mechanical Turk; MPQA: Multi-Perspective Question Answering.

2.6. Sentiment data sets

Sentiment data sets are key for evaluating the performance of the system. Several studies annotated rich sentiment corpus by employing annotators or AMT. Table 2 summarises the commonly used data sets for sentiment polarity detection task. Table 2 also defines the nomenclature that is used for naming data sets in the rest of the article.

Table 2.

Statistics of sentiment data sets.

Data set	Size	Positive	Negative	Annotation
BBC comments [56]	1000	99	653	Manual
DIGG comments [56]	1077	210	572	Manual
New York times [57]	5190	2204	2742	AMT
Movie reviews [57]	106,605	5242	5326	AMT
Amazon product reviews [57]	3708	2128	1482	AMT
Twitter debates [58]	3238	730	1249	AMT
Random tweets [59]	500	139	119	Manual
Specific domain tweets [17]	359	182	177	Manual
SemEval-2013 tweets	6087	2223	837	Manual
TED talks [60]	3407	1665	767	Manual
IMDB movie review [61]	50,000	25,000	25,000	Manual
UMICH SI650	7068	3990	3050	Manual
SemEval-2015 Task 11	7230	690	6540	Manual

BBC: British Broadcasting Corporation; AMT: Amazon Mechanical Turk; TED: technology, entertainment and design; IMDB: Internet Movie Database; UMICH: University of Michigan.

2.7. Semantic approach

The sentiment conveyed through the text is not limited to the individual words, rather relations and dependencies between words. These relations and inter-dependencies are directly related to the semantics of text. Hence, semantics should be leveraged for extracting meaning and related semantic features. Saif et al. [11] introduced semantic features to the sentiment analysis. They added semantic features to the training set by extracting entities from the data set and added entity semantic concept to the training set. Using this approach, they recorded an increase of around 6.5% to the baseline of unigrams and 4.8% to the baseline of POS. Furthermore, Saif et al. [62] designed a system called ‘SentiCircle’ that involves lexicon-based approach, conceptual and contextual features. An extension of SentiCircle, Saif et al. [63], uses fixed and static prior sentiment polarities of lexicon words regardless of the context of words. Their system took into account the co-occurrence of patterns of words used in different contexts to capture semantics. Maas et al. [61] captured semantics using word vectors through the probabilistic model of documents. Dridi et al. [7] used BableNet to extract semantic features for sentiment classification of social media posts.

The supervised algorithms are data-dependent, that is, they learn through the annotated training examples, and hence these approaches are domain-dependent. To overcome this problem, unsupervised algorithms are centred around lexicons. Lexicons play an important role to capture the domain-dependent aspects of sentiment analysis. Although these algorithms overcome the data domain-dependency shortcomings of supervised algorithms, they are not competitive in comparison to the supervised approach in terms of accuracy. Moreover, several systems are using semantics as a feature, but none used it as a Lexico-Semantic feature; instead, they created extensions of the data set by including semantic features in the data sets.

In this article, we propose the Lexico-Semantic feature in the supervised setting. We capture semantics using distributional hypothesis and then use these features in the supervised learning for sentiment classification. We compare our system with the baseline features and report the effects introduced by Lexico-Semantic features. We also compare our system with the other semantic approaches to show the competitive efficacy of our algorithm. Figure 1 shows the steps followed during literature selection.

Figure 1.

Research flow diagram.

3. Methodology

This section describes our model for sentiment analysis using a supervised learning approach based on lexicon supervision. We extend the lexicon with the distributional semantic model and then extract Lexico-Semantic features from the text along with other features. We classify the sentences based on these extracted features. The architecture of our model is shown in Figure 2. Precisely, our approach consists of (1) preprocessing for removing irregularities and normalisation of the data set; (2) lexicon expansion using distributional semantic models; (3) feature engineering; (4) vectorisation of all the extracted features and (5) classification.

Figure 2.

Overall working of the system.

3.1. Preprocessing

Preprocessing is the primary and first step in our sentiment classification process. It cleans data for uniformity, removes noise and inconsistencies. Preprocessing has the following steps:

Tokenisation- Each sentence in the data set is tokenised into words. We use Stanford Core NLP package [64] for this process.

Removing punctuation- Punctuations do not add any meaning during the classification task and hence removed. However, any text enclosed in double quotes are not removed as it indicates shouting or raising the voice and also replicates the person’s opinion.

Removing Stopwords- Stopwords are the propositions and articles, such as a, and, the, is. Stopwords are non-significant for the sentiment classification task. Moreover, these words do not emphasise any sentiment and, are thus, removed to reduce noise from the main text.

Lemmatisation- Lemmatisation is an important part of NLP. Lemmatisation reduces words to their stems, and hence, all the extensions of same words are brought to their stemmed form, thus allowing uniformity in the text. We use Stanford Core NLP package [64] for this process.

3.2. Lexicon expansion

We use the distributional semantic models (Word2Vec, Glove and FastText) to expand the lexicon since they are generic and do not require any lexical and linguistic training. The expansion incorporates semantics in the lexicon to make it semantically rich, thus enabling us to extract Lexico-Semantic features under lexicon supervision. The distributional semantic models capture semantics by creating a semantic information base. Moreover, these models are independent of external sources [65].

The distributional semantic models employ the distributional hypothesis, which states that the words used in the same context have similar meanings and can infer and correlate the words based on their usage. The distributional models use statistics derived from the occurrences of words to construct vector space model. Thus for all words, we have high-dimensional real-valued vectors represented as word vectors or semantically rich word embeddings. These word embeddings, along with their geometric properties, are syntactically, semantically and contextually useful for finding coherence of words and hence leveraged for finding semantics at different granularity levels of text [66].

To capture the semantics of the lexical units for the expansion of lexicon, we employ the distributional semantic models Word2Vec [67], FastText [68] and Glove [69] to capture the semantics of text. These models are used in several NLP tasks for the purpose of capturing semantics, such as [70,71] and [72] employed Word2Vec for emotion mining, [73,74] used Word2Vec and LSA for bootstrapping opinion corpus, [66] used in the process of text summarisation, [75] used Word2Vec for recommending different multi-media in a recommendation settings. Word2Vec has been effectively used in different domains as well such as graph embeddings [76 –82] to learn the embedding of nodes in the graph for machine learning tasks.

Word2Vec is a two-layer neural network that processes textual data and produces vectors for each given word. The produced vectors are semantically rich feature vectors. Moreover, there is a one-to-one correspondence between each word and its retrieved feature vector. The Word2Vec model is trained as a vector space representation of terms. It has a two-layered neural network and uses a distributional hypothesis for deriving semantics of the lexical entity.

Word2Vec has two architectures; the skip-gram model and continuous bag of words (CBOW). These architectures define how embeddings are created and used in different ways. The CBOW model predicts words from the context, and the skip-gram model predicts it from different contexts. In this article, we use pretrained Word2Vec model², trained on the Google news data set.

FastText is a free and open-source tool that enables users to learn text representations and classification. It is premised on n-gram attributes, dimension reduction and approximation method. It transforms the input tokens into n-gram characters. It is a resource for effective learning of token representations and categorising sentences. It also performs better with rare words. If a word is not observed during training, its embeddings could be determined by splitting it into n-grams [83].

Glove is an unsupervised learning technique that generates word vector representations. It creates the paradigm for transforming the frequency of co-occurring words present in the overall data. The inferring is based on compiled global co-occurrence of word-stats of data [69].

The need to expand the lexicon is since the approaches using lexicon for sentiment analysis employ sentiment-conveying terms in their general purpose lexicon and thus are not scalable because recall of these lexicons depends on the coverage of lexicon terms used. We first use Semantic models (Word2Vec, FastText and Glove) to obtain semantics, and then WordNet [43] to further expand the lexicon’s semantic version and hence increases recall and overall coverage of the lexicon. The lexicon expansion is explained in Algorithm 1.

Algorithm 1.

Lexicon expansion.

Let D be the input Lexicon,

D^{'}

be the output Lexicon and

w_{i}

the word of Lexicon

D

for all

w_{i}

\in

D do

W \leftarrow Lemmatization (w_{i})

where

W = {w_{1}, w_{2}, w_{3}, . . . ., w_{n}}

Append W to output lexicon

D'

end for
for all

w_{i} \in W

V_{i} \leftarrow SemanticModels (w_{i})

V_{i} = {v_{1} : s_{1}, v_{2} : s_{2}, v_{3} : s 3, . . . . . v_{n} : s_{n}}

where

v_{1}, v_{2}, v_{3} . . . v_{n}

are the words in the word-vector

V_{i}

for the word

w_{i}

and

s_{1}, s_{2}, s_{3} . . . . s_{n}

are the similarity scores of the vector

v_{i} \in V_{i}

s_{i}

>

0.75
Append

v_{i}

to lexicon

D^{'}

end for

The expansion consists of the following steps:

Step 1 Each word in the lexicon is lemmatised; repeated lemmas, if any, are removed.

Step 2 Each lemma in the lexicon is fed to the Semantic models to retrieve its semantic word vector.

Step 3 The retrieved word vectors are filtered, and only those words with semantic similarity score greater than 75% to the matching lemma are added to the lexicon.

Step 4 Since antonyms are direct opposite words, their induction in the lexicon is useful for lexicon expansion. For each word in the lexicon, we retrieve its antonyms from the WordNet. Then, if the retrieved words are not in the lexicon, we add the retrieved antonyms.

Step 5 Next, we utilise the hyponymy relation of the WordNet. Hyponymy relations are the essential semantic relations in the WordNet [84]. To capture the hyponymy relations, we apply the same approach as used by Neviarouskaya et al. [85]. Nouns in the lexicon are probed for hyponyms in WordNet, and the retrieved hyponyms are added to the lexicon.

To understand lexicon expansion, let us take an example of some words: Cherish, Bloom, Celebrate and Dilemma of the lexicons Bing Liu, AFFIN, MPQA subjectivity lexicon and AFFIN, respectively. These words are then fed to semantic models to retrieve their word vectors. The retrieved word vectors for the corresponding word along with its semantic score are described in Table 3.

Table 3.

Examples of lexicon expansion.

Word	Word vectors	Similarity score	Word	Word vectors	Similarity score
Cherish	Cherishes	0.67	Celebrate	Celebrating	0.82
	Cherishing	0.63		Celebrated	0.77
	Cherished	0.62		Celebration	0.74
	Treasured	0.60		Commemorate	0.71
	Cherish forever	0.60		Celebrates	0.68
	Savour	0.55		Celebrate	0.67
	Love	0.54		Celebrating	0.65
	Treasuring	0.53		Celebrations	0.61
	Enjoy	0.53		Celebrate	0.59
	Yearn	0.51		Rejoice	0.59
Bloom	Blooming	0.82	Dilemma	Quandary	0.88
	Blooms	0.82		Conundrum	0.87
	Flowering	0.76		Dilemmas	0.75
	Rebloom	0.68		Quandary	0.70
	Blossom	0.68		Predicament	0.69
	Blooming shrubs	0.67		Dilemma	0.68
	Flowering bulbs	0.66		Moral_dilemma	0.65
	Bloomed	0.66		Quandaries	0.64
	Fragrant flowers	0.65		Problem	0.64
	Blossoms	0.74		Ethical dilemma	0.62

Figures 3 and 4 summarise the lexicon expansion process using Semantic Models and WordNet, respectively.

Figure 3.

Lexicon expansion using Semantic Models.

Figure 4.

Lexicon expansion using WordNet.

3.3. Classification

In this section, we discuss the classification and the classifiers. We create sentiment classifiers that classify any domain text as per its sentiment polarity. The classifier predicts the positive and negative labels of the text [86]. For classification, we use a supervised learning algorithm. Particularly, we use SVM, Random Forest and LSTM as classifiers. SVM is a state-of-art machine learning algorithm effective for text classification tasks with robust performance on large vector spaces. We use SVM^light with the linear kernel, which is a C implementation of SVMs [87]. The advantage of using SVM^light is that it supports standard kernel functions and allows the definition of extra functionalities over the basic ones. SVM^light is a widely used algorithm for several tasks: text classification, image recognition, medical and bioinformatics application. SVM^light is publicly available³.

Random Forest is composed of an enormous number of individual decision trees that operate as an ensemble [88]. Each tree in the Random Forest spits out a class prediction, and the class with the most votes become our model’s prediction. The rationale behind Random Forest is a simple but powerful one, that is, the wisdom of crowds. It uses bagging and feature randomness to build each tree to create an uncorrelated forest of trees whose prediction is more accurate than any individual tree.

LSTM maintains a separate memory cell inside it to learn long-term dependencies, which updates and exposes its content only when deemed necessary, thus making it possible to capture the content as needed [89,90].

3.4. Feature engineering

Feature extraction is the process of selecting a subset of features that are relevant for model construction. The key to successful sentiment classification is selection of useful features. The selection of effective features leads to better classification as it simplifies the model and reduces the classifier’s training time. Following surface-level features are used during the training:

n-grams: contiguous sequences of 1, 2, 3 or 4 tokens.

Negated-bigrams: for capturing the context of negations, we employed negated-bigrams. Negated-bigrams are captured using the following set of words ‘no, not, rather, won’t, never, none, nobody, nothing, neither, nor, nowhere, cannot, without, n’t’. If a word among them precedes a word from the expanded lexicon, we create a negated-bigram for that.

POS: It is another feature that we use. As reported by Wilks and Stevenson [91], POS serves as a raw form of word-sense disambiguation. It distinguishes the different forms in which the sentiment polarity words are expressed such as hate in ‘Hate is an expression of emotions’ (hate does not indicate polarity) while hate in ‘I cannot describe, how much I hate you’ (hate indicates sentiment polarity). In the first sentence, hate has POS noun, while as in second, hate has POS verb. Hence, passing this information to the classifier is essential for the sentiment analysis systems. We use Stanford Core NLP package [64] for identifying POS.

All these surface-level features are assigned equal weights by our system.

Identification and extraction of Lexico-Semantic features using lexical supervision: Successful sentiment classification relies mainly on the proper selection of polarity words as features since they represent sentiment polarities. Unsupervised algorithms are centred around lexicon. We use lexicon in supervised settings, we call it lexicon supervision. We extract additional features through sentiment lexicons and distributional semantic models, if the sentence contains a lexicon word. These features are given extra importance by setting their weight higher than the features discussed in section ‘Feature engineering’, and we call these features as Lexico-Semantic features.

Identification and extraction of Lexico-Semantic features for LSTM: We use high-quality sentiment lexicons to generate the Lexico-Semantic features that are then feed to the LSTM model to generate the sentiment classifier. However, the LSTM model requires consistent data samples for processing. To make the text of uniform length, text is padded with extra zeroes. One of the biggest challenges for text classification is the variable length data instances in the data set. Usually the length of a sentence ranges from one word to almost one hundred words in the text data sets. To counter this, the text is padded by adding zeros to the sentences, commonly known as zero padding. Zero padding is commonly used in image classification to pad the borders of an image. However when used with textual systems, it results in a bias since the length of text changes frequently. To overcome this problem, we employ our Lexico-Semantic features to make data of uniform length and exploit the benefits of our Lexico-Semantic features. Enhancing using Lexico-Semantic information enlarges the proportion of sentiment information in text, which is useful for sentiment polarity classification.

Zero padding leads to large quantities of invalid information and reduces the performance of classifiers. In addition, it has a large effect on the result of the LSTM and CNN family model in text classification since it influences the pooling and weight updating. For instance, if there are two sentences with different sentiment polarities containing only one or two words, the zero padding operation makes it impossible to classify these two sentences into correct sentiment polarities. Thus, we propose introducing Lexico-Semantic features, which extend a sentence or message to a fixed length to better match text classification tasks.

We fix the length of all sentences to be 300. Thus, all sentences should be 300 words in length. We apply our padding algorithm to achieve sentence uniformity if a sentence is less than 300 words long.

Given a sentence $S = (w_{1}, w_{2}, w_{3}, . . . ., w_{n})$ containing n words. This sentence is transformed into a word vector $V = ({\vec{v}}_{1} : s_{1}, {\vec{v}}_{2} : s_{2}, {\vec{v}}_{3} : s_{3}, . . ., {\vec{v}}_{n} : s_{n})$ , where ${\vec{v}}_{i}$ corresponds to the ith word vector and $s_{i}$ corresponds to the score for each word in the ${\vec{v}}_{i}$ generated for the word w₁ belonging to sentence S. We concatenate the word vectors of sentiment words present in sentence which match with the lexicon and having score greater than 75%. Thus, the generated word vector V for the sentence S replaces it in the data set. Precisely, the word vector for a sentence S is given by $V = ({\vec{v}}_{1} \oplus {\vec{v}}_{2} \oplus {\vec{v}}_{3} \oplus . . . \oplus {\vec{v}}_{m})$ . If the sentence length is still less than 300, we append the V with the words having sentiment score greater than 75% in the descending order.

We then use the TensorDataset utility from PyTorch to create batches of data. Then, we use the four-layered LSTM architecture with 0.2 dropout in between the layers for the Lexico-Semantic LSTM model.

3.5. Vectorisation

Vectorisation is the process of converting extracted features into their vector representation. We introduce a new scheme for vectorisation of features: for all the extracted features, we create a dictionary of features with one feature per line of dictionary along with the class of feature to which this feature belongs. Each extracted feature is identified uniquely by appending a unique label according to the type of feature: for example, unigram features are appended by the label ‘unigram’, POS feature by POS. Features are not repeated in the dictionary. Each new feature generated is checked in the dictionary and added if it is not present in the dictionary, thus ensuring uniqueness. Since each line uniquely represents a feature in the dictionary, we choose its line number to represent the feature numerically, and hence, we map each feature uniquely to a numerical representation using this scheme.

To understand the vectorisation properly, let us consider the example ‘John likes the blue house at the end of the street’. In this sentence, the POS tags are John – noun, likes – verb, the – determiner, blue – adjective, house – noun, at – preposition, end – preposition and street – noun. The n-grams possible with this sentence are John-likes, likes-blue, blue-house, house-end and end-street. These features are then added to the dictionary of features, each at a unique line in the dictionary. The line numbers of these features are then mappings that are used to represent these features.

4. Evaluation

We hypothesised that using Lexico-Semantic features for sentiment analysis under lexicon supervision will improve sentiment detection. The following section describes the test setup and data sets used to verify the hypothesis.

This research aims to assess the effect of using Lexico-Semantic features for sentiment detection under lexicon supervision. To capture Lexico-Semantic features, a standard lexicon is extended by semantic algorithms, and then the model uses lexicon supervision to extract Lexico-Semantic features, combined with other features to study the effect on the classification process. To check our algorithm’s scalability, we check it on two fronts: effectiveness of expanded lexicon using different lexicons and the impact of lexicon expansion on classification by extracting Lexico-Semantic features. Another novel characteristic of our algorithm is that it is flexible to work on different domain data streams since we employ distributional semantic algorithms for capturing semantics. The distributional semantic models employ a generic model that requires no lexical or linguistic analysis and are trained in such a way that they require no external source of semantic knowledge. To test this claim, we tested our system on different genres of text data sets.

4.1. Data sets

We test our approach on eight different sentiment data sets. These data sets contain both long and short texts for an extensive evaluation of our approach. Also, these data sets are publicly available and used in various sentiment detection approaches. Hence, our algorithm can be used for different comparison studies. Table 4 shows the stats of different data sets used for the evaluation.

Table 4.

Statistics of the data sets.

Data set	Examples	Positive	Negative	Neutral
Comments DIGG	1077	210	572	295
Comments NYT	5170	2204	2742	244
Comments TED	839	318	409	112
Comments YTB	3407	1665	767	975
MySpace	1041	702	132	207
RW	1046	484	221	341
IMDB movie reviews	3740	1820	1920	none
UMICH SI650	7040	3990	3050	none

NYT: New York Times; TED: Technology, entertainment and design; YTB: YouTube; IMDB: Internet Movie Database; UMICH: University of Michigan; RW: Runners World Forum.

Comments DIGG [56] The comments DIGG data set has 1077 comments of Digg. Digg is a social news website allowing people to vote web content up or down, called digging and burying, respectively. The data set has 1077 comments labelled into 210 positive, 572 negative and 295 neutral labels. The annotation is done using three non-expert annotators.

Comments NYT [57] The comments NYT data set is created using the comments received for the newspaper New York Times. The data set has 5170 comments and is labelled into 2204 positive, 2742 negative and 244 neutral labels. The labelling of the data set is done using the AMT.

Comments TED [60] The Comments TED data set has 839 TED comments labelled into 318 positive, 409 negative and 112 neutral labels. The data set contains comments received on different TED talks. The data set has been annotated using six non-expert annotators.

Comments YTB [92] Comments YTB refers to the data set compiled from the YouTube comments. The data set has 3407 YouTube comments labelled into 1665 positive labels, 767 negative labels and 975 neutral labels. The data set has been annotated by three non-expert annotators.

MySpace [92] The data set MySpace is composed of MySpace posts and has 1041 posts labelled into 702 positive, 132 negative and 207 neutral posts. Three non-expert annotators are used to label the data set.

RW [92] The data set RW is composed of the text snippets from the Runners World Forum. The data set has 1046 text snippets labelled into 484 positive, 221 negative and 341 neutral. Three non-expert annotators are used to annotate the data set.

IMDB movie review data set [61] IMDB movie review data set has 50,000 sentiment labelled sentences and is generated by [93]. The data set is publicly available at Kaggle⁴. The data set has 50,000 labelled movie reviews. Out of the 50,000, we chose randomly 3740 reviews, out of which 1280 are positive and 1920 are negative. The reduced number of reviews is chosen to reduce the computational cost.

The UMICH SI1650 data set The UMICH SI1650 data set is a sentiment data set created by the University of Michigan for Kaggle competition. Each sentence in the data set is extracted from the social media blog. Each sentence is labelled with 1 (positive sentiment) or 0 (negative sentiment). There are 7040 sentences in total, out of which 3990 are labelled positive and 3050 are negatively labelled.

In our experiments, all neutral labelled sentences are dropped. To create a balanced data set for classification, we use Synthetic Minority Oversampling Technique (SMOTE) [94] to have an equal number of positive and negative examples. Thus, all the data sets contain a balanced number of training examples.

4.2. Baselines

To compare the competitive performance of our system for the sentiment analysis task in different text domains, we use the following Baselines:

Lexicon supervision method using machine learning algorithms In this, we measure the effect of using Lexico-Semantic features on the classification task. We extend the lexicon using a distributional semantic algorithm and then lexicon supervision to capture and extract semantic features. We compare our system against the system that does not employ the Lexico-Semantic features as a Baseline. We call the system that does not employ the lexicon as Baseline-2. We also compare our approach with the system that employs lexicon, but without semantic extension, we call it Baseline-1, and our system is called Lexico-Semantic supervision.

Lexicon supervision method using deep learning algorithms In this, we measure the effectiveness of using Lexico-Semantic features in deep learning architecture. Each word in the data set is probed against the lexicon, and if there is a hit, then that word is replaced with its word vector retrieved from the semantic models. Then these features are fed to LSTM for classification and we compare this approach with the LSTM model that does not employ Lexico-Semantic features and we call this as Baseline-3.

Semantic sentiment systems We also compare the performance of our approach against the semantic sentiment analysis systems. Particularly, we compared our system with Saif et al. [11] method. They used an approach named as SentiCircles, where they use co-occurrent patterns of words to capture semantics. They used only three data sets: Obama-McCain Debate, Health Care Reform data set and Stanford Sentiment Gold Standard data set.

4.3. Evaluation methodology

We evaluate our approach of lexicon expansion and its effect on capturing Lexico-Semantic features for sentiment classification using three widely used standard sentiment lexicons. To validate our hypothesis, that Lexico-Semantic features aid in sentiment classification tasks, we used three different lexicons for three separate experiments. The aim is to investigate the ability of distributional semantic algorithms to capture Lexico-Semantic features and their subsequent usage in the sentiment classification task. We selected three state-of-the-art sentiment lexicons for this study: (1) Bing Liu’s opinion lexicon [50], (2) the MPQA subjectivity lexicon [54] and (3) AFFIN [53]. Tables 5 –7 summarise the number of lexical entries in each lexicon before and after the different stages of the lexicon expansion using Word2Vec, Glove and FastText semantic expansion algorithms, respectively.

Table 5.

Statistics of lexicon using Word2Vec.

Lexicon	Initial	Word2Vec	WordNet
Bing Liu	6788	7488	16,798
MPQA subjectivity lexicon	8222	9144	12,194
AFFIN	4087	7277	9527

MPQA: Multi-Perspective Question Answering.

Table 6.

Statistics of lexicon using Glove.

Lexicon	Initial	Glove	WordNet
Bing Liu	6788	7362	16,628
MPQA subjectivity lexicon	8222	9032	12,028
AFFIN	4087	7168	9436

MPQA: Multi-Perspective Question Answering.

Table 7.

Statistics of lexicon using FastText.

Lexicon	Initial	Word2Vec	WordNet
Bing Liu	6788	7598	16,853
MPQA subjectivity lexicon	8222	9205	12,263
AFFIN	4087	7312	9620

MPQA: Multi-Perspective Question Answering.

All lexical entries from the three different lexicons are passed to Word2Vec, Glove and FastText for semantic expansion, and then hyponyms and antonyms of the combined lexicon are retrieved from the WordNet.

For Bing Liu’s lexicon, the initial entries in the lexicon are 6788, the Word2Vec expansion yields 700 semantic extensions of the initial entries in the lexicon and we retrieve 9310 hyponyms and antonyms. Thus, the total number of entries in the extended Bing Liu’s lexicon is 16,798. By the expansion using Glove, we obtained 574 semantic extensions, and then WordNet yields 9266; thus, the total entries extended are 16,628. The FastText expansion yields 810 semantic extensions of the initial entries in the lexicon, and then WordNet yields 9255; thus, we retrieve 16,853 total entries in Bing Liu’s lexicon.

Initial lexical entries in the MPQA subjectivity lexicon are 8222. After Word2Vec extension, the number of semantic entries added is 922, and then 3050 antonyms and hyponyms are added to the lexicon, and thus the total number of entries is 12,194. By the expansion using Glove, we obtained 810 semantic extensions, and then WordNet yields 2996; thus, the total entries extended is 12,028. The FastText expansion yields 983 semantic extensions of the initial entries in the lexicon, and then WordNet yields 3058; thus, we retrieve 12,263 total entries in MPQA lexicon.

For AFFIN, the initial entries are 4087, the Word2Vec semantic extension yields 3190 entries and from WordNet, we retrieve 2250 antonyms and hyponyms. Hence, the total number of lexical entities in AFFIN is 9527. By the expansion using Glove, we obtained 3081 semantic extensions, and then WordNet yields 2268; thus, the total entries extended is 12,028. The FastText expansion yields 3225 semantic extensions of the initial entries in the lexicon, and then WordNet yields 2308; thus, we retrieve 9620 total entries in AFFIN lexicon.

After the lexicon expansion, we extract the features as discussed in section 3.4.

Then, the extracted features are vectorised using our vectorising algorithms described in section 3.5, and the vectors are passed to the different classifiers: Random Forest and SVM^light for training. Each sentence is represented as a feature vector where each feature is represented as a combination of feature and its corresponding weight.

After extracting the features as discussed in section 3.4.2, the extracted features are passed to four-layered LSTM architecture for training.

4.4. Results

For each data set, we use leave-one-out cross-validation and report precision-positive (PP), precision-negative (PN), recall-positive (RP), recall-negative (RN), F1-positive (F1-P) and F1-negative (F1-N) scores. Results obtained by the SVM on different lexicons using Baseline-1, Baseline-2 and Lexico-Semantic supervision are reported in Tables 8 and 9, respectively. The results obtained by the Random Forest on different lexicons using Baseline-1, Baseline-2 and Lexico-Semantic supervision are reported in Tables 10 and 11. We use three lexicons AFFIN [53], Bing Liu’s lexicon [50] and MPQA [54]. The Baseline-1 uses original unexpanded lexicon, and the classifier is simply fed with the lexical features. For Baseline-2, no lexical features are fed to the classifier and in our approach, Lexico-Semantic features are fed to the classifier. The Baseline-3 uses features extracted by LSTM. Precision (PP and PN), Recall (RP and RN) and F1-values (F1-P and F1-N) are used for evaluation of four systems (Baseline-1, Baseline-2, Baseline-3 and our system). However, comparison of our system with the state-of-the-art system that uses semantic features is reported in Table 16.

Table 8.

Results of the Lexico-Semantic approach and Baseline-1 using SVM.

Data set	Lexicon	Baseline-1						Lexico-Semantic approach
		PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
Digg	AFFIN	0.692	0.532	0.602	0.839	0.879	0.859	0.834	0.791	0.812	0.842	0.940	0.888
	BING-LIU	0.736	0.631	0.679	0.845	0.854	0.849	0.851	0.871	0.861	0.967	0.940	0.953
	MPQA	0.774	0.613	0.684	0.835	0.868	0.851	0.895	0.915	0.905	0.948	0.924	0.936
NYT	AFFIN	0.641	0.611	0.626	0.751	0.675	0.711	0.863	0.863	0.863	0.954	0.901	0.927
	BING-LIU	0.647	0.639	0.643	0.761	0.669	0.712	0.854	0.854	0.854	0.945	0.911	0.928
	MPQA	0.640	0.627	0.633	0.755	0.767	0.761	0.858	0.858	0.858	0.946	0.923	0.934
TED	AFFIN	0.637	0.623	0.630	0.759	0.750	0.754	0.871	0.891	0.881	0.972	0.923	0.947
	BING-LIU	0.482	0.530	0.505	0.675	0.704	0.689	0.922	0.942	0.932	0.938	0.911	0.924
	MPQA	0.587	0.530	0.557	0.743	0.794	0.768	0.865	0.885	0.875	0.981	0.891	0.933
YTB	AFFIN	0.691	0.648	0.669	0.693	0.733	0.712	0.834	0.834	0.834	0.826	0.921	0.871
	BING-LIU	0.706	0.648	0.676	0.699	0.746	0.722	0.828	0.828	0.828	0.829	0.928	0.876
	MPQA	0.721	0.640	0.678	0.700	0.772	0.734	0.880	0.900	0.890	0.965	0.891	0.924
MySpace	AFFIN	0.782	0.756	0.769	0.621	0.658	0.639	0.884	0.884	0.884	0.863	0.873	0.868
	BING-LIU	0.778	0.798	0.788	0.656	0.63	0.643	0.880	0.880	0.880	0.856	0.823	0.839
	MPQA	0.779	0.782	0.780	0.643	0.643	0.643	0.882	0.882	0.882	0.859	0.954	0.904
RW	AFFIN	0.723	0.703	0.713	0.800	0.820	0.810	0.901	0.901	0.901	0.942	0.963	0.952
	BING-LIU	0.712	0.709	0.710	0.764	0.766	0.765	0.895	0.895	0.895	0.945	0.929	0.937
	MPQA	0.698	0.664	0.681	0.741	0.773	0.757	0.892	0.892	0.892	0.934	0.952	0.943
IMDB	AFFIN	0.623	0.592	0.607	0.621	0.523	0.568	0.929	0.949	0.939	0.874	0.923	0.896
	BING-LIU	0.593	0.617	0.605	0.617	0.514	0.561	0.929	0.843	0.884	0.874	0.895	0.884
	MPQA	0.761	0.614	0.680	0.712	0.523	0.603	0.992	0.905	0.947	0.911	0.925	0.918
UMICH	AFFIN	0.679	0.552	0.609	0.723	0.612	0.663	0.854	0.763	0.806	0.924	0.915	0.919
	BING-LIU	0.771	0.623	0.689	0.719	0.733	0.726	0.921	0.804	0.859	0.891	0.905	0.898
	MPQA	0.792	0.712	0.750	0.873	0.764	0.815	0.954	0.914	0.934	0.983	0.923	0.952

PP: precision-positive; RP: recall-positive; F1-P: F1-positive; PN: precision-negative; RN: recall-negative; F1-N: F1-negative; MPQA: Multi-Perspective Question Answering; NYT: New York Times; TED: Technology, entertainment and design; YTB: YouTube; IMDB: Internet Movie Database; UMICH: University of Michigan.Bold value signifies that the corresponding entry has the highest value.

Table 9.

Results of the Lexico-Semantic approach and Baseline-2 using SVM.

Data set	Baseline-2						Lexico-Semantic approach
	PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
Digg	0.720	0.684	0.691	0.833	0.793	0.813	0.864	0.888	0.876	0.952	0.935	0.943
NYT	0.651	0.698	0.704	0.806	0.653	0.721	0.858	0.858	0.858	0.948	0.912	0.929
TED	0.700	0.640	0.687	0.783	0.697	0.738	0.886	0.906	0.896	0.963	0.908	0.934
YTB	0.780	0.701	0.738	0.774	0.679	0.723	0.847	0.834	0.850	0.871	0.913	0.890
MySpace	0.740	0.670	0.703	0.721	0.625	0.670	0.882	0.882	0.882	0.859	0.883	0.870
RW	0.759	0.760	0.757	0.814	0.745	0.778	0.896	0.896	0.896	0.940	0.948	0.944
IMDB	0.653	0.680	0.680	0.732	0.503	0.596	0.950	0.899	0.923	0.886	0.913	0.899
UMICH	0.770	0.662	0.729	0.789	0.668	0.723	0.909	0.827	0.866	0.932	0.914	0.923

PP: precision-positive; RP: recall-positive; F1-P: F1-positive; PN: precision-negative; RN: recall-negative; F1-N: F1-negative; NYT: New York Times; TED: Technology, entertainment and design; YTB: YouTube; IMDB: Internet Movie Database; UMICH: University of Michigan.Bold value signifies that the corresponding entry has the highest value.

Table 10.

Results of the Lexico-Semantic approach and Baseline-1 using Random Forest.

Data set	Lexicon	Baseline-1						Lexico-Semantic approach
		PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
Digg	AFFIN	0.702	0.552	0.618	0.829	0.878	0.853	0.846	0.801	0.823	0.853	0.950	0.899
	BING-LIU	0.746	0.641	0.690	0.735	0.853	0.790	0.863	0.872	0.867	0.966	0.950	0.958
	MPQA	0.784	0.623	0.694	0.814	0.867	0.840	0.907	0.916	0.911	0.949	0.934	0.941
NYT	AFFIN	0.661	0.621	0.640	0.730	0.686	0.707	0.875	0.864	0.869	0.955	0.911	0.932
	BING-LIU	0.667	0.649	0.658	0.740	0.680	0.709	0.866	0.855	0.860	0.942	0.912	0.927
	MPQA	0.660	0.637	0.648	0.734	0.778	0.755	0.868	0.859	0.863	0.947	0.933	0.940
TED	AFFIN	0.657	0.633	0.645	0.738	0.761	0.749	0.883	0.892	0.887	0.973	0.913	0.942
	BING-LIU	0.502	0.561	0.530	0.696	0.715	0.705	0.932	0.943	0.937	0.939	0.921	0.930
	MPQA	0.607	0.561	0.583	0.764	0.805	0.784	0.877	0.886	0.881	0.981	0.880	0.928
YTB	AFFIN	0.701	0.658	0.679	0.714	0.744	0.729	0.846	0.844	0.845	0.815	0.931	0.869
	BING-LIU	0.716	0.658	0.686	0.720	0.757	0.738	0.840	0.838	0.839	0.840	0.918	0.877
	MPQA	0.731	0.650	0.688	0.721	0.783	0.751	0.890	0.901	0.895	0.961	0.902	0.931
MySpace	AFFIN	0.792	0.766	0.779	0.632	0.669	0.650	0.896	0.885	0.890	0.864	0.862	0.863
	BING-LIU	0.788	0.808	0.798	0.667	0.740	0.702	0.890	0.881	0.885	0.855	0.834	0.844
	MPQA	0.789	0.792	0.790	0.654	0.753	0.700	0.894	0.883	0.888	0.860	0.944	0.900
RW	AFFIN	0.733	0.713	0.723	0.821	0.831	0.826	0.911	0.902	0.906	0.941	0.973	0.957
	BING-LIU	0.722	0.719	0.720	0.785	0.777	0.781	0.907	0.896	0.901	0.946	0.930	0.938
	MPQA	0.708	0.674	0.691	0.762	0.784	0.773	0.904	0.893	0.898	0.955	0.953	0.954
IMDB	AFFIN	0.643	0.604	0.623	0.632	0.633	0.632	0.941	0.950	0.945	0.875	0.921	0.897
	BING-LIU	0.613	0.627	0.620	0.628	0.624	0.626	0.941	0.853	0.895	0.875	0.906	0.890
	MPQA	0.771	0.624	0.690	0.733	0.633	0.679	0.993	0.906	0.948	0.912	0.926	0.919
UMICH	AFFIN	0.689	0.564	0.620	0.744	0.722	0.733	0.866	0.773	0.817	0.925	0.916	0.920
	BING-LIU	0.781	0.633	0.699	0.740	0.744	0.742	0.933	0.814	0.869	0.892	0.916	0.904
	MPQA	0.802	0.722	0.760	0.894	0.775	0.830	0.964	0.915	0.939	0.984	0.924	0.953

Table 11.

Results of the Lexico-Semantic approach and Baseline-2 using Random Forest.

Data set	Baseline-2						Lexico-Semantic approach
	PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
Digg	0.740	0.704	0.722	0.863	0.80	0.830	0.872	0.863	0.867	0.922	0.944	0.933
NYT	0.690	0.718	0.704	0.836	0.670	0.744	0.869	0.859	0.864	0.938	0.918	0.928
TED	0.720	0.640	0.678	0.813	0.720	0.764	0.897	0.907	0.902	0.964	0.914	0.938
YTB	0.800	0.720	0.758	0.804	0.720	0.760	0.858	0.844	0.851	0.872	0.917	0.894
MySpace	0.760	0.690	0.723	0.731	0.630	0.677	0.893	0.883	0.888	0.859	0.880	0.869
RW	0.774	0.780	0.777	0.844	0.760	0.800	0.907	0.897	0.902	0.947	0.952	0.949
IMDB	0.700	0.680	0.690	0.730	0.510	0.600	0.958	0.903	0.930	0.887	0.917	0.902
UMICH	0.790	0.682	0.732	0.810	0.670	0.733	0.921	0.834	0.875	0.933	0.918	0.925

5. Discussion

In this section, we discuss in detail the results obtained on the data sets using our Lexico-Semantic approach. We evaluate the integration of the Lexico-Semantic features on the sentiment polarity detection classifiers.

First, we investigate the ability of distributional semantic algorithms to capture Lexico-Semantic features and their subsequent usage to enhance the sentiment classification task. Second, we discuss the scalability of our approach for application in different domain data sets since we employ distributional semantic algorithms for capturing Lexico-Semantic features. These algorithms employ a generic model that requires no lexical, linguistic analysis or external source of semantic knowledge and hence can be used with different domains. To verify this claim, we tested our system on different genres of text data sets. Questions that we want to answer are: How beneficial Lexico-Semantic features are for the sentiment classification task? What will be the effect of the usage of different lexicons on the performance of different classifiers? and What is the impact of using different semantic models on classification?

Tables 8 and 10 report the results of our system on eight data sets using SVM and Random Forests, respectively, and compare the system with the Baseline-1. Baseline-1 uses lexical and other surface-level features while as our system employs the Lexico-Semantic features in combination with other surface-level features as discussed in section 3.4.

Tables 9 and 11 report results of our system on eight data sets using SVM and Random Forests, respectively, and compares the system with the Baseline-2. Baseline-2 does not employ lexical features at all, while our system employs the Lexico-Semantic features. For comparison with the Baseline-2, we averaged the Lexico-Semantic approach obtained on a data set over the three lexicons because the Baseline-2 does not employ lexicon as a feature.

Table 12 reports results of our system on eight data sets using LSTM and compares the system with the Baseline-3. Baseline-3 employs features extracted by LSTM, while our system employs the Lexico-Semantic features. For comparison with the Baseline-3, we averaged the Lexico-Semantic approach obtained on a data set over the three lexicons because the Baseline-3 does not employ lexicon as a feature.

Table 12.

Results of the Lexico-Semantic approach and Baseline-3.

Data set	Baseline-3						Lexico-Semantic approach
	PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
Digg	0.823	0.776	0.799	0.872	0.897	0.884	0.893	0.884	0.888	0.943	0.965	0.954
NYT	0.782	0.782	0.782	0.846	0.872	0.859	0.890	0.880	0.885	0.959	0.939	0.949
TED	0.831	0.736	0.781	0.808	0.818	0.813	0.918	0.928	0.923	0.985	0.935	0.959
YTB	0.851	0.817	0.834	0.816	0.813	0.814	0.879	0.934	0.906	0.893	0.938	0.915
MySpace	0.842	0.795	0.818	0.792	0.726	0.758	0.914	0.904	0.909	0.880	0.901	0.890
RW	0.823	0.816	0.819	0.864	0.848	0.856	0.928	0.918	0.923	0.968	0.973	0.970
IMDB	0.812	0.778	0.795	0.823	0.824	0.823	0.979	0.924	0.951	0.908	0.938	0.923
UMICH	0.796	0.793	0.794	0.817	0.885	0.850	0.942	0.924	0.933	0.954	0.939	0.946

Table 13 reports the results of different baselines and our system. From the results of Table 13, it is evident that the Lexico-Semantic features improve the results of sentiment classification in comparison with the three baselines. Thus, we conclude that the Lexico-semantic features aid in the sentiment classification task. All the three baselines and our system employ the same set of features apart from Lexico-Semantic features with same classifier settings. The improved functionality of our system is attributed to Lexico-Semantic features and hence, better results.

Table 13.

Comparison of the Lexico-Semantic approach and other baselines based on averages.

Classifier	Metrics
	Baseline	PP	RP	F1-P	PN	RN	F1-N
SVM	Baseline-1	0.693	0.641	0.665	0.731	0.715	0.721
	Baseline-2	0.721	0.686	0.711	0.782	0.670	0.720
	Lexico-Semantic approach	0.886	0.873	0.880	0.918	0.915	0.916
Random forest	Baseline-1	0.707	0.665	0.678	0.734	0.749	0.741
	Baseline-2	0.740	0.702	0.723	0.803	0.685	0.738
	Lexico-Semantic approach	0.897	0.874	0.885	0.917	0.919	0.917
LSTM	Baseline-3	0.822	0.786	0.802	0.829	0.835	0.832
	Lexico-Semantic approach	0.917	0.912	0.914	0.936	0.941	0.938

PP: precision-positive; RP: recall-positive; F1-P: F1-positive; PN: precision-negative; RN: recall-negative; F1-N: F1-negative. SVM: support vector machine; LSTM: long short-term memory.Bold value signifies that the corresponding entry has the highest value.

Tables 14 and 15 show the comparison of averaged results of the Baseline-1 and our system with lexicons: MPQA, AFFIN and Bing Liu’s using SVM and Random Forest, respectively. From this, we get the answer to our second research question: what is the impact of using different lexicons on the performance of the classifier? Since the changes in the effective performance are negligible, thus, we conclude it as the process of obtaining Lexico-Semantic features and not the lexicon itself, that aids in the better performance of the classifier. We also tested our Lexico-Semantic approach using two other semantic models: Glove and FastText. We found that all the semantic models produce an excellent mechanism for incorporating Lexico-Semantic features. All the semantic models yield consistent results when compared with the baselines. The results are almost identical, and hence, we have omitted reporting Glove and FastText results in the tables.

Table 14.

Comparison of the averages of Baseline-1 and Lexico-Semantic approach of all lexicons using SVM.

Lexicon	Baseline-1						Lexico-Semantic approach
	PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
AFFIN	0.683	0.627	0.653	0.725	0.706	0.714	0.871	0.859	0.865	0.899	0.919	0.908
Bing Liu’s	0.678	0.649	0.661	0.717	0.702	0.708	0.885	0.864	0.874	0.905	0.905	0.904
MPQA	0.719	0.647	0.680	0.750	0.738	0.741	0.902	0.893	0.897	0.940	0.922	0.930

Table 15.

Comparison of the averages of Baseline-1 and Lexico-Semantic approach of all lexicons using Random Forest.

Lexicon	Baseline-1						Lexico-Semantic approach
	PP	RP	F1-P	PN	RN	F1-N	PP	RP	F1-P	PN	RN	F1-N
AFFIN	0.697	0.638	0.665	0.730	0.740	0.734	0.883	0.863	0.872	0.900	0.922	0.909
Bing Liu’s	0.691	0.662	0.675	0.713	0.736	0.724	0.896	0.869	0.881	0.906	0.910	0.908
MPQA	0.731	0.660	0.693	0.759	0.772	0.764	0.912	0.894	0.902	0.943	0.924	0.933

The comparison of our Lexico-Semantic system with other system using semantic features is reported in Table 16. From Table 16, it is evident that our system outperforms the system using semantics [62].

Table 16.

Comparison of the Lexico-Semantic approach and other system using semantics.

System	PP	PN	F1-P	RP	RN	F1-N
Lexico-Semantic approach	0.788	0.822	0.745	0.720	0.891	0.864
Saif et al. [62]	0.714	0.829	0.663	0.621	0.885	0.856

PP: precision-positive; RP: recall-positive; F1-P: F1-positive; PN: precision-negative; RN: recall-negative; F1-N: F1-negative.Bold value signifies that the corresponding entry has the highest value.

6. Conclusion and future work

The article presents a sentiment polarity classifier employing the Lexico-Semantic features to aid the sentiment classification task. According to the evaluation and comparative analysis, the appropriateness, reliability and scalability of sentiment classifier are better than the baselines. Our main conclusions are as follows: (1) The Lexico-Semantic features aid in the sentiment classification task. (2) The choice of the lexicon is not important since all the results are overall consistent. (3) The distributional semantic algorithms are an excellent choice for capturing semantic features since they do not require domain knowledge, and thus our algorithm becomes scalable and hence applicable for testing different domain data sets. The main drawback of our approach is that it is computationally exhaustive and thus requires more time to execute.

Future scope of the work could deal with (1) using more than one distributional semantic models in different combinations for capturing semantics in the text so that semantics are captured at the fine-grain level and (2) using transformers for capturing semantics.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Nowsheena

Notes

References

Liu

. Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 2012; 5(1): 1–167.

Cherry

Mohammad

De Bruijn

. Binary classifiers and latent sequence models for emotion detection in suicide notes. Biomed Inform Insights 2012; 5(Suppl. 1): 147–154.

Desmet

Hoste

. Emotion detection in suicide notes. Exp Syst Appl 2013; 40(16): 6351–6358.

Mohd

Shah

Bhat

et al. Sumdoc: a unified approach for automatic text summarization. In: Pant

Deep

Bansal

et al. (eds) Advances in intelligent systems and computing: proceedings of the 5th international conference on soft computing for problem solving. Singapore: Springer, 2016, pp. 333–343.

Shayaa

Jaafar

Bahri

et al. Sentiment analysis of big data: methods, applications, and open challenges. IEEE Access 2018; 6: 37807–37827.

Madani

Erritali

Bengourram

. Sentiment analysis using semantic similarity and Hadoop MapReduce. Knowl Inform Syst 2019; 59(2): 413–436.

Dridi

Reforgiato Recupero

. Leveraging semantics for sentiment polarity detection in social media. Int J Mach Learn Cybern 2019; 10(8): 2045–2055.

Dridi

Recupero

. Leveraging semantics for sentiment polarity detection in social media. Int J Mach Learn Cybern 2019; 10(8): 2045–2055.

Dridi

Atzeni

Recupero

. FineNews: fine-grained semantic sentiment analysis on financial microblogs and news. Int J Mach Learn Cybern 2019; 10(8): 2199–2207.

10.

Recupero

Alam

Buscaldi

et al. Frame-based detection of figurative language in tweets [application notes]. IEEE Comput Intell Mag 2019; 14(4): 77–88.

11.

Saif

Fernandez

et al. Semantic patterns for sentiment analysis of Twitter. Lect Note Comput Sci 2014; 8797: 324–340.

12.

Mohammad

Kiritchenko

Zhu

. NRC-Canada: building the state-of-the-art in sentiment analysis of tweets, https://arxiv.org/abs/1308.6242

13.

Mir

Kar

AKK

Gupta

. AI-enabled digital identity – inputs for stakeholders and policymakers. J Sci Technol Policy Manage 2022; 13: 514–541.

14.

Kirmani

Manzoor Hakak

Mohd

et al. Hybrid text summarization: a survey. In: Ray

Sharma

Rawat

(eds) Soft computing: theories and applications. Singapore: Springer, 2019, pp. 63–73.

15.

Phu

Tran

VTN

Chau

VTN

et al. A decision tree using ID3 algorithm for English semantic analysis. Int J Speech Technol 2017; 20(3): 593–613.

16.

Asghar

Khan

Ahmad

et al. Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 2017; 12(2): e0171649.

17.

Bhayani

Huang

. Twitter sentiment classification using distant supervision. CS224N Project Report, 2009, https://www-cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf

18.

Sheng

Wang

et al. Incremental learning for ν-support vector regression. Neur Netw 2015; 67: 140–150.

19.

Balabantaray

Mohammad

Sharma

. Multi-class twitter emotion classification: a new approach. Int J Appl Inform Syst 2012; 4(1): 48–53.

20.

Huq

Ali

Rahman

. Sentiment analysis on Twitter data using KNN and SVM. Int J Adv Comput Sci Appl 2017; 8(6): 19–25.

21.

Liu

Fan

. A method for multi-class sentiment classification based on an improved one-vs-one (OVO) strategy and the support vector machine (SVM) algorithm. Inform Sci 2017; 394: 38–52.

22.

Suresh

Raj

. An unsupervised fuzzy clustering method for Twitter sentiment analysis. In: Proceedings of the 2016 international conference on computation system and information technology for sustainable solutions (CSITSS), Bengaluru, India, 6–8 October 2016, pp. 80–85. New York: IEEE.

23.

Mir

Kar

Dwivedi

et al. Realizing digital identity in government: prioritizing design and implementation objectives for Aadhaar in India. Govern Inform Quart 2020; 37(2): 101442.

24.

Phu

Dat

Tran

VTN

et al. Fuzzy C-means for English sentiment classification in a distributed system. Appl Intell 2017; 46(3): 717–738.

25.

Hassan

Bajwa

Hassan

. Prediction of terrorist activities by using unsupervised learning techniques. J Appl Emerg Sci 2016; 6(2): 56–60.

26.

Riaz

Fatima

Kamran

et al. Opinion mining on large scale data using sentiment analysis and k-means clustering. Clust Comput 2019; 22(3): 7149–7164.

27.

Mohd

Hashmy

. Question classification using a knowledge-based semantic kernel. In: Pant

Ray

Sharma

et al. (eds) Soft computing: theories and applications. Singapore: Springer, 2018, pp. 599–606.

28.

Liu

Yan

et al. OpinionFlow: visual analysis of opinion diffusion on social media. IEEE Trans Vis Comput Graph 2014; 20(12): 1763–1772.

29.

Vaziripour

Giraud-Carrier

Zappala

. Analyzing the political sentiment of tweets in Farsi. In: Proceedings of the 10th international AAAI conference on web and social media, 2016, https://ojs.aaai.org/index.php/ICWSM/article/view/14791

30.

Yue

Chen

et al. A survey of sentiment analysis in social media. Knowl Inform Syst 2019; 60(2): 617–663.

31.

Yessenalina

Cardie

. Compositional matrix-space models for sentiment analysis. In: Proceedings of the conference on empirical methods in natural language processing, 2011, pp. 172–182. Edinburgh: Association for Computational Linguistics, http://lowrank.net/ainur/pubs/emnlp2011_compomatrix.pdf

32.

Tang

Qin

Liu

. Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 conference on empirical methods in natural language processing, Lisbon, 17–21 September 2015, pp. 1422–1432. Lisbon: Association for Computational Linguistics.

33.

Bespalov

Bai

et al. Sentiment classification based on supervised latent n-gram analysis. In: Proceedings of the 20th ACM international conference on information and knowledge management, Glasgow, 24–28 October 2011, pp. 375–382. New York: ACM.

34.

Glorot

Bordes

Bengio

. Domain adaptation for large-scale sentiment classification: a deep learning approach, 2011, https://icml.cc/2011/papers/342_icmlpaper.pdf

35.

Hermann

Blunsom

. The role of syntax in vector space models of compositional semantics. Stroudsburg, PA: Association for Computational Linguistics, 2013.

36.

Bengio

. Deep learning of representations: looking forward. In: Proceedings of the international conference on statistical language and speech processing, Tarragona, 29–31 July 2013, pp. 1–37. Berlin: Springer.

37.

Tang

Wei

Qin

et al. Coooolll: a deep learning system for Twitter sentiment classification. In: Proceedings of the SemEval 2014, pp. 208–212, https://alt.qcri.org/semeval2014/cdrom/pdf/SemEval033.pdf

38.

Ombabi

Ouarda

Alimi

. Deep learning CNN–LSTM framework for Arabic sentiment analysis using textual information shared in social networks. Soc Netw Anal Min 2020; 10(1): 1–13.

39.

Maqsood

Mehmood

Maqsood

et al. A local and global event sentiment based efficient stock exchange forecasting using deep learning. Int J Inform Manage 2020; 50: 432–451.

40.

Yang

Wang

et al. Sentiment analysis for E-commerce product reviews in Chinese based on sentiment lexicon and deep learning. IEEE Access 2020; 8: 23522–23530.

41.

Araque

Corcuera-Platas

Sánchez-Rada

et al. Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Exp Syst Appl 2017; 77: 236–246.

42.

Akhtar

Gupta

Ekbal

et al. Feature selection and ensemble construction: a two-step method for aspect based sentiment analysis. Knowl Based Syst 2017; 125: 116–135.

43.

Miller

Beckwith

Fellbaum

et al. Introduction to WordNet: an on-line lexical database. Int J Lexicograph 1990; 3(4): 235–244.

44.

Liu

. Mining opinion features in customer reviews. AAAI 2004; 4: 755–760.

45.

Strapparava

Valitutti

. WordNet affect: an affective extension of WordNet, 2004, https://www.researchgate.net/publication/254746105_WordNet-Affect_an_Affective_Extension_of_WordNet

46.

Stone

Dunphy

Smith

. The general inquirer: a computer approach to content analysis. Am Soc Rev 1966; 4(4): 1161774.

47.

Esuli

Sebastiani

. SentiWordNet: a publicly available lexical resource for opinion mining. In: Proceedings of the LREC 2006, vol. 6, pp. 417–422, http://www.lrec-conf.org/proceedings/lrec2006/pdf/384_pdf.pdf

48.

Baccianella

Esuli

Sebastiani

. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. LREC 2010; 10: 2200–2204.

49.

Chen

. Mining opinions from the Web: beyond relevance retrieval. J Am Soc Inform Sci Technol 2007; 58(12): 1838–1850.

50.

Ding

Liu

. A holistic lexicon-based approach to opinion mining. In: Proceedings of the 2008 international conference on web search and data mining, 2008, pp. 231–240, https://www.cs.uic.edu/~liub/FBS/opinion-mining-final-WSDM.pdf

51.

Mohammad

Kiritchenko

Zhu

. NRC-Canada: building the state-of-the-art in sentiment analysis of tweets. In: Proceedings of the 7th international workshop on semantic evaluation exercises (SemEval-2013), Atlanta, GA, 2013, https://www.researchgate.net/publication/256187019_NRC-Canada_Building_the_State-of-the-Art_in_Sentiment_Analysis_of_Tweets

52.

Mohammad

Turney

. Crowdsourcing a Word-Emotion Association Lexicon. Comput Intell 2013; 29(3): 436–465.

53.

Nielsen

FÅ

. A new ANEW: evaluation of a word list for sentiment analysis in microblogs, https://arxiv.org/abs/1103.2903

54.

Riloff

Wiebe

. Learning extraction patterns for subjective expressions. In: Proceedings of the 2003 conference on empirical methods in natural language processing, 2003, pp. 105–112, https://www.cs.utah.edu/~riloff/pdfs/emnlp03.pdf

55.

Pennebaker

Francis

Booth

. Linguistic inquiry and word count: LIWC 2001. Mahway, NJ: Lawrence Erlbaum Associates, 2001, p. 71.

56.

Thelwall

Buckley

Paltoglou

. Sentiment strength detection for the social web. J Am Soc Inform Sci Technol 2012; 63(1): 163–173.

57.

Hutto

Gilbert

. VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the 8th international AAAI conference on weblogs and social media, 2014, https://ojs.aaai.org/index.php/ICWSM/article/view/14550/14399

58.

Diakopoulos

Shamma

. Characterizing debate performance via aggregated twitter sentiment. In: Proceedings of the SIGCHI conference on human factors in computing systems, 2010, pp. 1195–1198, https://www.researchgate.net/publication/221515559_Characterizing_debate_performance_via_aggregated_twitter_sentiment

59.

Aisopos

. Manually annotated sentiment analysis Twitter dataset NTUA. Athens: National Technical University of Athens, 2014.

60.

Pappas

Popescu-Belis

. Sentiment analysis of user comments for one-class collaborative filtering over ted talks. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, 2013, pp. 773–776, https://www.researchgate.net/publication/262328637_Sentiment_analysis_of_user_comments_for_one-class_collaborative_filtering_over_ted_talks

61.

Maas

Daly

Pham

et al. Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, 2011, vol. 1, pp. 142–150. Association for Computational Linguistics, https://ai.stanford.edu/~ang/papers/acl11-WordVectorsSentimentAnalysis.pdf

62.

Saif

Fernandez

et al. Senticircles for contextual and conceptual semantic sentiment analysis of Twitter. In: Proceedings of the European semantic web conference, 2014, pp. 83–98. Springer, https://www.researchgate.net/publication/300450866_SentiCircles_for_Contextual_and_Conceptual_Semantic_Sentiment_Analysis_of_Twitter

63.

Saif

Fernandez

et al. Contextual semantics for sentiment analysis of Twitter. Inform Process Manage 2016; 52(1): 5–19.

64.

Manning

Surdeanu

Bauer

et al. The Stanford CoreNLP natural language processing toolkit. In: Proceedings of the association for computational linguistics (ACL) system demonstrations, 2014, pp. 55–60, http://www.aclweb.org/anthology/P/P14/P14-5010

65.

Mir

Sharma

Kar

et al. Critical success factors for integrating artificial intelligence and robotics. Digit Policy Regul Govern 2020; 22(4): 307–331.

66.

Mohd

Jan

Shah

. Text document summarization using word embedding. Exp Syst Appl 2020; 143: 112958.

67.

Mikolov

Chen

Corrado

et al. Efficient estimation of word representations in vector space, https://arxiv.org/abs/1301.3781

68.

Joulin

Grave

Bojanowski

et al. Bag of tricks for efficient text classification, https://arxiv.org/abs/1607.01759

69.

Pennington

Socher

Manning

. Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532–1543, https://nlp.stanford.edu/pubs/glove.pdf

70.

Jan

Khan

. Emotion mining using semantic similarity. In: Information Resources Management Association (ed.) Natural language processing: concepts, methodologies, tools, and applications. Hershey, PA: IGI Global, 2020, pp. 1115–1138.

71.

Hakak

Mohd

Kirmani

et al. Emotion analysis: a survey. In: Proceedings of the 2017 international conference on computer, communications and electronics (COMPTELIX), Jaipur, India, 1–2 July 2017, pp. 397–402. New York: IEEE.

72.

Nida

Mahira

Mudasir

et al. Automatic emotion classifier. In: Pati

Panigrahi

Misra

et al. (eds) Progress in advanced computing and intelligent engineering. Singapore: Springer, 2019, pp. 565–572.

73.

Canales

Strapparava

Boldrini

et al. Intensional learning to efficiently build up automatically annotated emotion corpora. IEEE Trans Affect Comput 2017; 11: 335–347.

74.

Mohd

Jan

Hakak

. Enhanced bootstrapping algorithm for automatic annotation of tweets. Int J Cognit Inform Nat Intell 2020; 14(2): 35–60.

75.

Baek

Chung

. Multimedia recommendation using Word2Vec-based social relationship mining. Multimed Tool Appl 2021; 80: 34499–34515.

76.

Sheikh

Kefato

Montresor

. gat2vec: representation learning for attributed graphs. J Comput 2019; 101: 187–209.

77.

Kefato

Sheikh

Montresor

. REFINE: representation learning from diffusion events. In: Proceedings of the 4th international conference on machine learning, optimization and data science, LOD’18, Volterra, 13–16 September 2018.

78.

Kefato

Sheikh

Montresor

. MINERAL: multi-modal network representation learning. In: Proceedings of the 3rd international conference on machine learning, optimization and big data, MOD’17, Volterra, 14–17 September 2017.

79.

Kefato

Sheikh

Montresor

. DeepInfer: diffusion network inference through representation learning. In: Proceedings of the 13th international workshop on mining and learning with graphs, MLG’17, 2017, http://disi.unitn.it/~montreso/pubs/papers/kdd2017.pdf

80.

Sheikh

Kefato

Montresor

. Semi-supervised heterogeneous information network embedding for node classification using 1D-CNN. In: Proceedings of the 5th international conference on social networks analysis, management and security (SNAMS), Valencia, 15–18 October 2018.

81.

Sheikh

Kefato

Montresor

. A simple approach to attributed graph embedding via enhanced autoencoder. In: Proceedings of the 8th international conference on complex networks and their applications, Lisbon, 10–12 December 2019.

82.

Hamid

Sheikh

Said

et al. Fake news detection in social media using graph neural networks and NLP techniques: a COVID-19 use-case. In: Proceedings of the MediaEval 2020 workshop, Online, 14–15 December 2020, http://ceur-ws.org/Vol-2882/paper54.pdf

83.

Bojanowski

Grave

Joulin

et al. Enriching word vectors with subword information, https://arxiv.org/abs/1607.04606

84.

Miller

. WordNet: an electronic lexical database. Cambridge, MA: MIT Press, 1998.

85.

Neviarouskaya

Prendinger

Ishizuka

. SentiFul: a lexicon for sentiment analysis. IEEE Trans Affect Comput 2011; 2(1): 22–36.

86.

Mir

Kar

Gupta

et al. Prioritizing digital identity goals – the case study of Aadhaar in India. Lect Note Comput Sci 2019; 11701: 489–501.

87.

Joachims

. Making large-scale SVM learning practical, 1998, https://www.cs.cornell.edu/people/tj/publications/joachims_99a.pdf

88.

. Random decision forests. Proc IEEE 1995; 1: 278–282.

89.

Peng

Khan

et al. Sentic LSTM: a hybrid network for targeted aspect-based sentiment analysis. Cognit Comput 2018; 10(4): 639–650.

90.

Wang

Jiang

Luo

. Combination of convolutional and recurrent neural network for sentiment analysis of short texts. In: Proceedings of COLING 2016: the 26th international conference on computational linguistics: technical papers, 2016, pp. 2428–2437, https://aclanthology.org/C16-1229/

91.

Wilks

Stevenson

. The grammar of sense: using part-of-speech tags as a first step in semantic disambiguation. Nat Lang Eng 1998; 4(2): 135–143.

92.

Thelwall

. Heart and soul: sentiment strength detection in the social web with sentistrength. Berlin: Springer, pp. 119–134.

93.

Pang

Lee

Vaithyanathan

. Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing, 2002, vol. 10, pp. 79–86. Association for Computational Linguistics, https://www.cs.cornell.edu/home/llee/papers/sentiment.pdf

94.

Chawla

Bowyer

Hall

et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321–357.