Stance detection using diverse feature sets based on machine learning techniques

Abstract

Sentiment analysis is the field that analyzes sentiments, and opinions of people about entities such as products, businesses, and events. As opinions influence the people’s behaviors, it has numerous applications in real life such as marketing, politics, social media etc. Stance detection is the sub-field of sentiment analysis. The stance classification aims to automatically identify from the source text, whether the source is in favor, neutral, or opposed to the target. This research study proposed a framework to explore the performance of the conventional (NB, DT, SVM), ensemble learning (RF, AdaBoost) and deep learning-based (DBN, CNN-LSTM, and RNN) machine learning techniques. The proposed method is feature centric and extracted the (sentiment, content, tweet specific and part-of-speech) features from both datasets of SemEval2016 and SemEval2017. The proposed study has also explored the role of deep features such as GloVe and Word2Vec for stance classification which has not received attention yet for stance detection. Some base line features such as Bag of words, N-gram, TF-IDF are also extracted from both datasets to compare the proposed features along with deep features. The proposed features are ranked using feature ranking methods such as (information gain, gain ration and relief-f). Further, the results are evaluated using standard performance evaluation measures for stance classification with existing studies. The calculated results show that the proposed feature sets including sentiment, (part-of-speech, content, and tweet specific) are helpful for stance classification when applied with SVM and GloVe a deep feature has given the best results when applied with deep learning method RNN.

Keywords

Stance classification deep learning deep features sentiment analysis content based

1 Introduction

Web 2.0 has revolutionized the way people interact. It allows people to create, upload, and publish their content using social media such as blogs, wikis, forums, social networking sites, etc. The rapid growth of the social web has given birth to new communities in all fields of life from business to academia and research. To get this information from the raw data, new research domains such as sentiment analysis [1, 2], collaborative filtering-based recommender systems [3], community detection [4], social network analysis and mining [5, 6]. Sentiment analysis has emerged as one of the active research domains. It identifies and classifies sentiments, emotions, and opinions expressed in user-generated content with positive, negative, and neutral polarities. It has numerous application areas such as product recommendation [7], reviews helpfulness [8], social issues analysis [9], detection of users’ behavior analysis based on stance, and antisocial behavior predictions [10]. The stance is a person’s belief, claim, opinion, or stand towards an event.

Stance classification is sub problem of sentiment analysis and detects the stance of author towards any subject. Stance is a person’s (belief, claim, opinion, or stand) towards an event. Form linguistic point of view the definition of stance is “Stance is a public act by a social actor, achieved dialogically through overt communicative means, of simultaneously evaluating objects, positioning subjects (self and others), and aligning with other subjects, with respect to any salient dimension of the socio-cultural field” [11]. Stance predicts the person opinion’s like “favor” or “against” towards any target as shown in Table 1.

Table 1
Sample Tweets explaining Stance classification (taken from Climate Change is a Real Concern from SemEval2016 Dataset)

ID Tweet Stance

620 We are working to elect climate-friendly governments across North America #FeelTheBern #NDP #LPC #GPC #BQ #HARPERPAC.CA #SemST FAVOR

688 Climate change?! Explain the definition of climate change.. #fraud #CCOT #liberty #CruzCrew #SemST AGAINST

819 #Equity #financial #solidarity and #transparency is what #2015text needs says #Fabius at #UN #GA #SemST NONE

ID	Tweet	Stance
620	We are working to elect climate-friendly governments across North America #FeelTheBern #NDP #LPC #GPC #BQ #HARPERPAC.CA #SemST	FAVOR
688	Climate change?! Explain the definition of climate change.. #fraud #CCOT #liberty #CruzCrew #SemST	AGAINST
819	#Equity #financial #solidarity and #transparency is what #2015text needs says #Fabius at #UN #GA #SemST	NONE

Sentiment analysis computes the sentiments in text while stance computes the polarity of sentiments in a text. Sentiment analysis alone does not address the issues of subjectivity and polarity calculation of opinions. Stance Classification also used to detect the emotions, perspective identification / subjective evaluation of text of author, sarcasm detection, argument mining and biased language detection to predict the inclination of perspective in a text. There are diverse application areas of stance classification discussed here. Stance classification is carried out on online text content on various topics including politics, social arguments, product reviews, and elections. Stance classification can help forecast the latest trends in the market. Stance classification is also helpful for recommender systems; by analyzing the stance, recommender systems can give more personalized recommendations to customers. Stance detection is implemented in public health [12], topic-based stance classification for twitter [13], social media political debates [14], predicting electoral issues [15], and analyzing public opinion regarding a certain social issue [16].

The relevant literature reports two approaches commonly used for stance classification [17]. The first approach takes classification as a typical problem and exploits only textual information such as sentiment-lexicons and syntactic-patterns to gather the stance information. Whereas the second approach claims that textual information could not provide enough information for stance classification. Thus, the proponents of the second approach propose models, exploiting the relationship between posts or users for stance classification. It concludes that the earlier research studies utilize complex linguistic features which require additional resources [18]. Hence, it urges to explore diverse feature sets that do not need additional resources and intend to analyze their significance for stance classification. In the field of sentiment analysis accuracy is an important parameter.

To address the above-mentioned accuracy problem, the proposed research study aims to investigate different features and techniques and optimizes the accuracy of the result for stance classification. In this regard a framework each for stance classification is proposed which follows the five steps; 1) data pre-processing, 2) feature extraction, 3) feature selection, 4) application of algorithms, and 5) results evaluation, to analyze content for the classification of stance in tweets. This research contributes as follows:

To perform the task of stance classification on tweets SemEval2016 and SemEval2017 data sets are selected.

This research study has proposed a set of feature sets including sentiment, part-of-speech, content, and tweet specific as well as deep features including GloVe and Word2vec. Baseline features such as BoW, N-gram and TF-IDF also extracted to compare the performance of the proposed feature sets.

Feature selection techniques are applied to find the usefulness of the proposed feature sets.

Investigated the role of the machine learning techniques including 1) Conventional machine learning techniques – Naïve Bayes (NB), Decision Tree (DT), Support Vector Machine (SVM), 2) ensemble learning techniques such as AdaBoost and Random Forest, 3) and Deep learning techniques such as Recurrent Neural Network (RNN), Deep Belief Network (DBN), Convolutional Neural Network (CNN).

Comparison of computed results for stance classification is evaluated using four standard performance evaluation measures including accuracy, precision, recall, and F-measures.

The computed results for stance classification outperformed the results of baseline approaches.

2 Research methodology

This section briefly describes the frameworks proposed, data pre-processing and feature engineering for the stance classification.

2.1 The proposed framework

The proposed framework defines the steps followed for stance classification. Firstly, data pre-processing techniques are used to clean the datasets of SemEval2016 and SemEval2017. Then the proposed feature sets such as (sentiment, tweet specific, part-of-speech and content) based features are computed from the pre-processed dataset through Python libraries.

Some baseline features such as TF-IDF, N-gram and Bag of Words are also extracted from the selected datasets to perform the comparison with the proposed feature sets. Deep features including GloVe and Word2Vec are not explored for the stance classification. To explore the effectiveness of these famous deep features GloVe and Word2Vec are also extracted from the selected datasets.

After computing features to find the useful features and remove the redundant or irrelevant features feature ranking machine learning algorithms such as Information Gain (IG), Gain Ratio (GR) and Relief-F are applied. Then, these features are fed into various machine learning classifiers of three types including conventional machine learning techniques: (SVM, NB, DT), ensemble learning methods: (RF, AdaBoost) and deep learning algorithms such as (DBN, RNN, CNN-LSTM). In the last step, the performance evaluation measures are applied to evaluate the classification ability of applied algorithms for stance classification of tweets. The proposed framework along with complete definition of steps, datasets used, and algorithms applied is shown in Fig. 1.

Fig. 1

The proposed framework for stance classification.

2.2 Details of feature sets used

Features represents any quality or aspect of an attribute. To complete he feature based research task features are extracted based on various categories. Then, these features are selected according to their ranking using famous feature selection algorithms. Features are ranked to find the correlation between them and find the more effective features from the proposed feature sets.

Diverse features are computed from the selected datasets SemEval2016 and SemEval2017 to increase the effectiveness of the proposed model. These diverse features are based on sentiment, tweet specific, content and part-of-speech as shown in Table 2. Since selection of features plays important role in removing the irrelevant and redundant attributes to perform this task feature selection process is carried out as described in section 2.4. The sub-sections cover discussion on feature sets including proposed features, baseline features for comparison and deep features used for stance classification for first time. It also presents the selection and ranking techniques of the features for finding the relevant features.

Table 2
Proposed features sets

S# Category Description Symbol

1 Sentiment Sentiment score of the tweet $S_{Mnt}^{T}$

2 Sentiment –Number of Positive Words $S_{PW}^{T}$

3 Sentiment –Number of Negative Words $S_{NW}^{T}$

4 Count of positive emoticons $S_{PE}^{T}$

5 Count of negative emoticons $S_{NE}^{T}$

6 POS Number of nouns in a Tweet $P_{N}^{T}$

7 Number of pronouns in a Tweet $P_{P}^{T}$

8 Verb words frequency in a Tweet $P_{V}^{T}$

9 Adjectives words frequency in a Tweet $P_{A}^{T}$

10 Content Number of special symbols $C_{SS}^{T}$

11 Number of WH words in a tweet $C_{WH}^{T}$

12 Number of question mark in a tweet $C_{QM}^{T}$

13 Number of Exclamation Mark $C_{EM}^{T}$

14 Number of Capital words $C_{nW}^{T}$

15 Number of quoted words $C_{QW}^{T}$

16 Tweet Number of retweets $T_{nT}^{T}$

17 Specific Number of mentions $T_{M}^{T}$

18 Number of URLs $T_{URL}^{T}$

19 Hashtag length T _HL

20 Is tweet or retweet? C _RT

21 Number of hashtags $T_{HT}^{T}$

22 Number of capitalized hashtag(s) $T_{CH}^{T}$

S#	Category	Description	Symbol
1	Sentiment	Sentiment score of the tweet	$S_{Mnt}^{T}$
2		Sentiment –Number of Positive Words	$S_{PW}^{T}$
3		Sentiment –Number of Negative Words	$S_{NW}^{T}$
4		Count of positive emoticons	$S_{PE}^{T}$
5		Count of negative emoticons	$S_{NE}^{T}$
6	POS	Number of nouns in a Tweet	$P_{N}^{T}$
7		Number of pronouns in a Tweet	$P_{P}^{T}$
8		Verb words frequency in a Tweet	$P_{V}^{T}$
9		Adjectives words frequency in a Tweet	$P_{A}^{T}$
10	Content	Number of special symbols	$C_{SS}^{T}$
11		Number of WH words in a tweet	$C_{WH}^{T}$
12		Number of question mark in a tweet	$C_{QM}^{T}$
13		Number of Exclamation Mark	$C_{EM}^{T}$
14		Number of Capital words	$C_{nW}^{T}$
15		Number of quoted words	$C_{QW}^{T}$
16	Tweet	Number of retweets	$T_{nT}^{T}$
17	Specific	Number of mentions	$T_{M}^{T}$
18		Number of URLs	$T_{URL}^{T}$
19		Hashtag length	T _HL
20		Is tweet or retweet?	C _RT
21		Number of hashtags	$T_{HT}^{T}$
22		Number of capitalized hashtag(s)	$T_{CH}^{T}$

2.3 Proposed features sets

The stance classification and sentiment quantification are directly linked with the overall field of sentiment analysis; therefore, sentiment related features are considered. For sentiment score computation, Python-based standard library Natural Language Took Kit (NLTK) has been exploited which contains standard Vader lexicon [19, 20] which is widely used in many existing studies, in the relevant literature [19 , 22]. Also, this standard lexicon is considered for sentiment analysis to find the count of positive and negative words. Since various social media platforms allow to add emoticons and these emoticons can be used to predict the sentiments or emotions, this research also consider emoticons as features.

Linguistic features play important role in understanding the overall nature of the content; therefore, part of speech tagging is used. Considering, any standard content of English language it tags each word as a verb, noun, adjective, etc. Since nouns helps to identify the entity under discussion, thus it has been taken as a feature. Another, part of speech, a verb carries information about action or behavior of the entity, is used in the features. Also, adjectives count is taken as an important feature because adjectives depict the positive, negative and sense of the characteristics. For the tagging of part of speech, Python NLTK tagger is used.

Content features exploit the text within the given social media source such as a post or a tweet. As the linguistic characteristics of the content are covered in part of the speech set of features and sentiment aspects of the features are covered in sentiment feature set, therefore, in this content-based feature set, the focus is the diverse nature of content within the tweet content. For instance, the use of WH words along with the use of question mark is taken as important because both features are related to check, whether a question has been raised in the content or not. Thus, it is assumed that in sentiment or stance, questions have been asked in conversations. Similarly, exclamation mark character and special characters are considered as a feature since the use of exclamation character represents that someone seeking attention. It is proposed that the addition of such factors is more likely to be part of sentiment-based content rather than objective content. The concept of a retweet is also considered as the political or social related subjective content is likely to be re-tweeted more as compared to factual content containing facts and figures. The use of quoted words helps to detect whether a tweet contains a discussion about a topic as the quoted contents are used to cite existing tweet content. The use of capital case is considered for shouting or emphasis. Thus, it has also been considered as a feature.

Social media platforms provide different functionality; therefore, nature of feature may vary for different target social medias. Since this research considers Twitter social media, thus Twitter specific features are considered. We assume that a tweet that is retweeted more is highly related to some sentiment related concept. As in subjective content, there is more chance of conversation discussions having a thread where replies are shared, therefore mention feature is considered which is used to add someone in the discussion or directly mention specific users within the content. Moreover, the existence of URL in the content is taken because the users are likely to mention the URL in the content to emphasize their point of view by sharing proof of some webpage in content such as video on YouTube, a post of Facebook, etc. As the hashtags depict the topic of content, therefore this feature is taken assuming that an opinionative content is likely to have more hashtags. Also, the use of capitalized case hashtags is considered as a feature. The following feature set is proposed for stance classification and sentiment quantification as shown in Table 2.

Some other conventional features are also proposed along with some new features includes N-Gram, TF-IDF, Word2Vec, and GloVe.

The evaluation of supervised learning models is tricky. Following the standard procedure, the data is split into two sets: training and testing. The training set is used to learn the model whereas the testing set is used to evaluate the learning of the model. Then, the performance evaluation metrics such as accuracy, precision, recall, f-measure are applied for evaluation. This method is called holdout method and is not reliable as the accuracy obtained for one test set can be very different to that of another test set. K-fold cross validation provides a solution to this problem. In this method, a given dataset is split into a k number of where each fold is used as a testing set once. Let us take the scenario of 10-fold cross validation, where the data is split into 10 folds. In the first iteration, the first fold is used to test the model and the rest are used to train the model. In the second iteration, second fold is used as the testing set while the rest serve as the training set. This process is repeated until each fold has been used as the testing set. For our empirical analysis, 10-fold cross validation is used.

2.3.1 Baseline features

Besides the proposed features, this research also considers baseline features for comparison with the proposed feature sets discussed in the following sections.

A. N-Gram

N-gram is a series of n words based on probabilities of word embedding. N-gram is divided into three categories: 1) N-gram/uni-gram, 2) bi-gram, and 3) trigram. The example of uni-gram, bi-gram, and trigram are “standup”, “standup slowly”, and “she stood up”, respectively.

B. Bag of Words

The bag-of-words (BoW) model is a simplifying representation used in natural language processing and information Retrieval (IR). In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words, disregarding grammar and even word order but keeping multiplicity. The bag-of-words model is commonly used in methods of document classification where the (frequency of) occurrence of each word is used as a feature for training a classifier.

C. TF-IDF

An enhanced version of Bag of Words (BoW) is Term Frequency-Inverse Document Frequency (TF-IDF). TF-IDF finds word’s frequency in a text. While document-frequency is the ratio of word frequency in a document over the total number of documents. TF-IDF represents all the words in the list and then assigns it to all the documents.

If there are 300 words list, a one-row vector and 300 columns for each document will exist. Where rows are consisting of the calculated frequency of terms and columns consists of words. The document frequency is calculated by the given formula in Equation (1). $IDF = \frac{\log_{N}}{N}$ (1)

Where N is the number of documents containing a given word. So, in the end, the TF - IDF value can easily be obtained by multiplying TF and IDF values. ${TF}_{IDF} = TF \times IDF$ (2)

2.3.2 Deep features

In addition to the proposed and baseline features, deep features are also considered and discussed in the following sections.

A. Word2vec

Text-based documents are represented in vectors using Word2vec, a technique of word embedding. Wordv2vec finds the words ‘semantics and syntactic similarity between words. Word2vec converts an input of texts into a large vector.

B. GloVe

GloVe is another approach used for word representation. GloVe is based on unsupervised learning in which the same words make a cluster and different words repel. GloVe puts words into vector space to make clusters of similar words. Apart from statistics generated locally, GloVe finds the consistency of words to get the vectors of words. GloVe gathers local and global statistics to make vectors of words.

2.4 Feature selection

Feature selection plays an important role in finding the highly correlated, and redundant attributes from the dataset. To find the redundant and irrelevant attributes from the proposed feature sets famous feature ranking techniques are applied. Negative and lowest value of information gain makes an attribute irrelevant and high correlation up to 0.9% or above it makes an attribute dependent and redundant. Among the proposed features no attribute has lowest (-ve) value of the information gain and no attribute is highly correlated with the other attribute. Therefore, from the proposed feature there was no attribute which is irrelevant, or redundant. Further, feature ranking algorithms are applied to rank the attributes according to their score as shown in Table 3. The details of the ranked attributes are also mentioned in section 2.4.4.

2.4.1 Information gain (IG)

Information Gain is also known as “Mutual Information”. IG decreases the biasness of attributes that have multiple values. For this purpose, IG selects the attribute on basis of its total branches and their size. ML methods widely use IG for predicting class information gain and compute the results in bits. Information Gain is used for the selection of important attributes from the dataset. Information gain is computed by decreasing overall entropy value and accessing the impact of feature’s inclusion. E is the entropy in given Equation (3). $info (E) = - \sum_{i = 1}^{k} (p_{i} \log_{2} p_{i})$ (3)

Where k is number of class and p_i is the probability of any attribute. Class C_i obtained as $((\frac{| C_{i}, E |}{| E |}) * \log_{2})$ is the encoding of information in the form of bits.

For attributes A∈ { a₁, a₂, …, a_v } would be in v partitions E∈ { E₁, E₂, …, E_v } Equation (4) is used to compute the entropy information. $\underset{A}{info} (E) = - \frac{\sum_{j = 1}^{v} | E j |}{| E |} \times info (E j)$ (4)

Where |Ej|/|E| is the weight of the jth partition and entropy of E_j is defined as info (E_j). IG by separate on A is: $IG (E) = info (E) - \underset{A}{info} (E)$ (5)

Attributes with a top value of IG are used to classify the document in the provided class.

2.4.2 Gain Ratio (GR)

Gain Ratio uses a repetitive process to select the small feature set by using the score of GR. GR is used widely for dimension reduction. GR calculates the inequality of features. The Top-score of GR defines the feature’s usefulness. The normalization value depicts the split value of information. The training document is split into v partitions, respective to v output on feature F.

$\underset{A}{splitinfo} (E) = - \frac{\sum_{j = 1}^{v} | E j |}{| E |} \times \frac{\log_{2} | E_{j} |}{| E |}$ (6)

Here high splitinfoA shows information is low and consistent. Few partitions keep peak values. GR is calculated as: $GR (F) = \frac{IG}{splitting (F)}$ (7)

2.4.3 Relief-F

Relief-F selects the arbitrary elements and calculates the closest neighbors of features, Relief-F is another approach used for attributes that have multiple values. Relief-F sets the final feature weighting vector. Features with high importance separate the instances from neighboring classes. It calculates the best estimate from giving probabilities to calculate the weight for each dimension. $\begin{matrix} W_{f} = P (different value of f | closest instances of the \\ different classes) - P (different value of f | \\ closest instances from some classes) \end{matrix}$ (8)

2.4.4 Useful features based on features ranking

The number of features in each feature set was proposed and analyzed for finding whether the features is relevant and plays its role in the prediction of the target class. For this, feature selection has been carried out using standard feature selection and all those features which were not relevant or have negative influence towards target class prediction have been omitted from our feature list and the only those features are selected which play significant roles in target class prediction.

In addition to feature selection, we have also applied the features selection algorithms to compute the ranking of each features within each feature set. This ranking helps us to identify top features within each category of features and then investigate their significance with respect to target class prediction. Based on three standard feature selection and ranking algorithms, we have computed the ranking of the features as shown in Table 3.

The Table 3 presents that for the sentiment-based features, the sentiment score is important as compared to count of sentiment based polar words whether positive or negative. It is also interesting to note that the negative words play more significant role in classification as compared to positive words, this observation is consistent with existing research studies [23]. Then the emoticons are ranked and again the use of negative ones are ranked higher as compared to positive emoticons. Part of Speech tags are also important and use of adverbs and adjectives are found to be important as these are the top two features in POS feature set and it is understandable as these two part of speech tags are used to refer to merit or demerit of any action or about an entity. Being more specific, adverbs modify verbs, adjectives and other adverbs whereas adjectives modify nouns and pronouns. Then verbs are ranked higher than nouns as the verbs are related to actions, behavior and functionalities and noun are related to entities and their characteristics. So, all these POS tags are important for prediction of classes annotated for social media content.

In addition to POS tags which are relevant to content, content features are also important as well. The presence of WH words and question marks shows that the subjective content contains questions raised by discussion and conversation regarding certain actions or entities. Then, quoted and repetitive content is ranked high which shows that the opinionative content contains double quoted words referring to existing content shared by some social media users. This feature shows the dialog or conversation nature of the content which is common in sentiment related classes [24]. Then use of special characters is ranked which is followed by the use of exclamation sign which is another special character. Among the tweet specific features, mention is at top as this shows the dialog or discussions within content. Then retweet count is also ranked high. The existence of URLs is higher in sentiment as the users shares links to other content in web pages to emphasize their point of views. Hashtag are ranked low as these are related to topics which are important both in objective content as well as opinionative content. The retweet capability of content is ranked lower which is understandable as any objective content containing facts and figures can also be retweeted more which does not contain any sentiment or stance in both our cases.

Among the existing baseline features, tf-idf is ranked higher as this is directly linked to content band then n-gram is ranked higher than bag-of-words as these do not contain the sequence or words. Among deep feature, GloVe are ranked higher as compared to word2vec as the former are easier to train over more data as compared to the later. In addition, GloVe combines the benefits of the word2vec based skip gram model in the word analogy tasks such as sentiment analysis and stance classification.

The Table 3 presents that for the sentiment-based features, the sentiment score is important as compared to count of sentiment based polar words whether positive or negative. It is also interesting to note that the negative words play more significant role in classification as compared to positive words, this observation is consistent with existing research studies [23]. Then the emoticons are ranked and again the use of negative ones are ranked higher as compared to positive emoticons. Part of Speech tags are also important and use of adverbs and adjectives are found to be important as these are the top two features in POS feature set and it is understandable as these two part of speech tags are used to refer to merit or demerit of any action or about an entity. Being more specific, adverbs modify verbs, adjectives, and other adverbs whereas adjectives modify nouns and pronouns. Then verbs are ranked higher than nouns as the verbs are related to actions, behavior and functionalities and noun are related to entities and their characteristics. So, all these POS tags are important for prediction of classes annotated for social media content.

Table 3
Feature ranking

Sentiment Parts of speech Content Tweet specific Baseline Deep

Sentiment score of the tweet Verb words frequency in a Tweet Number of a WH words in a tweet Number of mentions TF-IDF GloVe

Sentiment –Number of Negative Words Adverb words frequency in a Tweet Number of question mark in a tweet Number of retweets N-Gram Word2vec

Sentiment –Number of Positive Words Number of nouns in a Tweet Number of quoted words Number of URLs BoW

Count of negative emoticons Number of pronouns in a Tweet Number of repetitive words Number of hashtags

Count of positive emoticons Number of special symbols Number of capitalized hashtag(s)

Number of Exclamation Mark Hashtag length

Is tweet of retweet?

Sentiment	Parts of speech	Content	Tweet specific	Baseline	Deep
Sentiment score of the tweet	Verb words frequency in a Tweet	Number of a WH words in a tweet	Number of mentions	TF-IDF	GloVe
Sentiment –Number of Negative Words	Adverb words frequency in a Tweet	Number of question mark in a tweet	Number of retweets	N-Gram	Word2vec
Sentiment –Number of Positive Words	Number of nouns in a Tweet	Number of quoted words	Number of URLs	BoW
Count of negative emoticons	Number of pronouns in a Tweet	Number of repetitive words	Number of hashtags
Count of positive emoticons		Number of special symbols	Number of capitalized hashtag(s)
		Number of Exclamation Mark	Hashtag length
			Is tweet of retweet?

In addition to POS tags which are relevant to content, content features are also important as well. The presence of WH words and question marks shows that the subjective content contains questions raised by discussion and conversation regarding certain actions or entities. Then, quoted, and repetitive content is ranked high which shows that the opinionative content contains double quoted words referring to existing content shared by some social media users. This feature shows the dialog or conversation nature of the content which is common in sentiment related classes [24]. Then use of special characters is ranked which is followed by the use of exclamation sign which is another special character. Among the tweet specific features, mention is at top as this shows the dialog or discussions within content. Then retweet count is also ranked high. The existence of URLs is higher in sentiment as the users shares links to other content in web pages to emphasize their point of views. Hashtag are ranked low as these are related to topics which are important both in objective content as well as opinionative content. The retweet capability of content is ranked lower which is understandable as any objective content containing facts and figures can also be retweeted more which does not contain any sentiment or stance in both our cases.

3 Experimental setup

This section describes the dataset, machine learning approaches and performance evaluation measures for stance classification.

3.1 Datasets

This section covers discussion on standard datasets considered in this research work for both stance classification and sentiment quantification.

Two standard datasets SemEval2016 and SemEval2017 used for the stance classification are discussed in the following sub-sections.

3.1.1 SemEval2016

SemEval2016 is a standard dataset used for classification and quantification purposes. SemEval2016 contains tweets divided into 5 targets: Abortion (933 tweets), Feminist Movement (949 tweets), Legalization of, Hillary Clinton (984 tweets), and Atheism (733 tweets), Climate Change (564 tweets). Each topic contains many tweets with a stance class that can have a label from the set FAVOR, AGAINST, NONE. The details of the dataset are shown in Table 4. This dataset has been used in earlier studies [25].

Table 4
Statistics of SemEval2016 & SemEval2017

Dataset Total tweets Training tweets Testing tweets

SemEval2016 68197 16346 51851

SemEval2017 62617 50,333 12,284

Dataset	Total tweets	Training tweets	Testing tweets
SemEval2016	68197	16346	51851
SemEval2017	62617	50,333	12,284

3.1.2 SemEval2017

SemEval2017 is a standard multi-lingual dataset. SemEval2017 contains tweets in English and Arabic. But more tweets are in English only 19% tweets are in Arabic. The dataset contains 50,333 training and 12,284 testing tweets in English, and 3,355 training and 6100 testing tweets in Arabic. The details of the dataset are shown in Table 4. The dataset has been used in earlier studies [26, 27].

3.2 Conventional machine learning techniques

3.2.1 Naïve Bayes (NB)

Naïve Bayes (NB) is a traditional machine learning classifier grounded on conditional probability and Byes rule. NB Bayes rule formula is given in Equation (9). Where w is an event and Y is the evidence. While P(w) is the probability of an event before the evidence is seen, and P(Y|w) is the probability of an event after the evidence is seen. $P (w | Y) = \frac{P (w) P (Y | w)}{P (Y)}$ (9)

By using Equation (10), we can find the maximum probability of class w. $w = \underset{w}{argmaxw} P (w) \prod_{m = 1}^{n} P (y_{m} | w)$ (10)

Where P(w) is the probability of class and P ((y_m|W) is a conditional probability. $P_{NB} (w | Y) = P (w) \prod_{m = 1}^{n} P (f | w^{n} m (y))$ (11)

NB gives better performance for categorical data rather than numerical data.

3.2.2 Support Vector Machine

Support Vector Machine (SVM) technique is based on linear regression. SVM is applied for text classification and text processing. SVM is also used to apply for high dimensional data. SVM separates positive and negative instances with high margins. SVM acquire an optimal boundary between the classes. SVM optimization is calculated through the given formula shown in Equations (12) and (13). $\begin{matrix} \max_{f (e_{1}, e_{2}, \dots, e_{x})} = \\ \sum_{k = 1}^{x} e_{k} - \frac{1}{2} \sum_{k = 1}^{x} \sum_{p = 1}^{x} q_{k} e_{k} (m_{k} m_{p}) q_{k} e_{k} \end{matrix}$ (12)

$\sum_{k = 1}^{x} q_{k} e_{k} = 0, 0 ⩽ e_{k} ⩽ C$ (13)

Where x is the number of training, e is a linear combination of training inputs, q is the training output and m is the cost function, and k and p measure the similarity of the dot product of m. SVM performance is not favorable for noisy data when applied for large datasets, as SVM needs more execution time for the training process.

3.2.3 Decision Tree

DT is an inductive learning algorithm grounded on the principle of data decision. In this method, at internal node questions appear. DT algorithm makes a decision tree by applying entropy and Information Gain (IG) principles. The entropy technique helps in the reduction of execution time for pre-processing. DT easily carries out missing values without effecting the decision tree, which only requires more execution time. The entropy is calculated using the given formula in Equation (14). $En (X) = \sum_{h = 1}^{m} P_{h} \log_{2} P_{h}$ (14)

Where P is the probability of an attribute belongs to class m. To process the information in bits the log function is used. While En(X) is the required time to find the class label. En(X) is also known as Entropy. Information Gain is the difference between the original and expected information required to classify the rows in the dataset. ${Information}_{Gain (X, r)} = En (X) - En (X | r)$ (15)

3.3 Ensemble based-techniques

3.3.1 AdaBoost

AdaBoost combines the week classifiers to give a strong prediction for misclassified instances. All predicted values are combined for the final value. AdaBoost learners assign a weight to each element in the training sample. The very first weight is adjusted as shown in Equation (16): $Weight (E_{i}) = \frac{1}{X}$ (16)

Where X counts the number of training elements by E_i as training elements. Miss-classified prediction rate is computed as shown in Equation (17). $Err = \frac{(Corr - X)}{X}$ (17)

AdaBoost enhances the classification accuracy of classifiers, but not a suitable technique for imbalanced data, as needs more execution time.

3.3.2 Random forest

RF is an ensemble learner grounded on the regression method. RF build deep trees to give a strong prediction for irregular patterns. RF decreases the variation of values by taking an average. RF take votes of data D = d₁, d₂, … . . , d_n with responses R = r₁, r₂, … . . , r_n by applying bagging for N times. For N =1... n: A regression tree is trained using formula given in Equation (18). $\frac{1}{N} \sum_{x = 1}^{n} f_{x} (R)$ (18)

3.4 Deep learning techniques

In this research following deep learning techniques are used for stance classification and sentiment quantification.

The third type of techniques used for stance classification and sentiment quantification are deep learning techniques which are discussed in the following sub-sections.

3.4.1 Deep Belief Networks (DBN)

Deep Belief Networks (DBN) is a type of deep neural networks. DBN is grounded on statistics and probability concepts. DBN process several layers consisting of hidden blocks. These layers are interconnected with each other, but hidden blocks are separated from each other. DBN is now the latest trend to use for sentiment analysis. The three layers’ architecture is shown in Fig. 2 [28]. X is the input and h₁ is the hidden layer which is training the DBN classifier using divergence method. While hd is the total number of hidden layers and w is the weight of layers. The next layers include three layers the lower layers are “sigmoid layer” and upper layers are “Restricted Boltzmann Machine” (RBM) use to freeze the weights. The DBN mathematical model is given in Equation (19) [28]. $Pb ({hd}_{m}^{i^{k}} | {hd}^{(k + 1)}) = δ t (x_{m}^{k} + \sum_{n}^{k + 1} w_{nm}^{k} {hd}_{n}^{k + 1})$ (19)

Fig. 2

Architecture of DBN.

3.4.2 CNN-LSTM

CNN (Convolutional Neural Network) a deep learning approach is unable to catch long-distance dependency. To address this issue LSTM (Long Short-Term) is applied in sequential text modeling for sentiment analysis in sentences. The CNN-LSTM architecture extract features from given input using CNN architecture and combines them with LSTM to acquire sequence prediction. The CNN based on LSTM architecture is shown in Fig. 3 [29].

Fig. 3

Architecture of CNN-LSTM.

3.4.3 Recurrent Neural Network (RNN)

RNN is the type of ANN used for speech a handwriting recognition. RNN saves memory learned from former input generated in the output phase and training phase. RNN is dependent on weights of input along with hidden vector representation knowledge. RNN acquire input {x⁰… . . x^t }. where h^t is a hidden vector. RNN acquire input till x^t and update hidden vector to h^t as shown in Fig. 4 [30]. RNN activation function is calculated from the given formula in Equation (20). ${hd}^{(t)} = f ({whd}^{t - 1} + {vx}^{t})$ (20)

Fig. 4

Processing of RNN.

3.5 Performance evaluation measures

To evaluate the accuracy of classifiers for stance classification, following four performance evaluation measures are used.

3.5.1 Accuracy

Accuracy is metrics of performance evaluation measures used to apply in fields of information retrieval and data mining. Accuracy is by the given formula in Equation (12). $Accuracy = \frac{TP + TN}{TP + TN + FP + FN}$ (21)

Here FP, FN, TN, and TP, stand for False Positive, False Negative, True Negative, and True Positive, respectively.

3.5.2 Precision

Precision is another metric of performance evaluation measure. Performance evaluation measures calculate the ratio of classified positive attributes to the actual positive attributes divided by the classified positive attributes. $Precision = \frac{TP}{TP + FP}$ (22)

3.5.3 Recall

The recall is also known as the sensitivity is the ratio of positive instances that are classified to the actual positive. $Recall = \frac{TP}{TP + FN}$ (23)

3.5.4 F-measure

The f-measure also is known as F-score. F-score is the harmonic mean of precision and recall. F-score is calculated from the given formula given in Equation (24).

$F = \frac{2 \times Precision \times Recall}{Precision + Recall}$ (24)

3.6 Details of implementation and hyper parameter

Results are computed on a core i7 10th generation with 16 Gb RAM and 1 TB HDD. For conventional and ensemble-based machine learning techniques default parameter settings are used in Jupiter Notebook using Python. While for the deep learning algorithms Rectifier, maxout and tanh are used with and without dropout value. The dropout value is set to be 0.5 and 50 hidden layers are used in these models.

4 Results and discussions

This chapter covers a detailed discussion on the computed results for stance classification and sentiment quantification. The discussion starts with single feature set based classification to multiple feature-based classification and concludes with optimal feature sets using conventional, ensemble based and deep learning-based techniques.

4.1 Single features set based classification

To present the detailed analysis, we have performed various experiments on selected datasets for stance classification. For this purpose, a single feature set of proposed features and different combinations of proposed features are selected from SemEval2016 & SemEval2017. To accomplish this experimentation, diverse features including, sentiment, part of speech, content, and tweet specific features were extracted from both selected datasets, and machine learning approaches including, Naïve Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), an ensemble-learners Random Forest (RF) and AdaBoost were applied on single features set. According to the computed results, part-of-speech (POS) features are found more effective in the detection of stance, when applied with an ensemble-learner AdaBoost. The highest accuracy of 69.61% and 70.90% with AdaBoost is recorded for SemEval2016 and SemEval2017 respectively and outperformed other approaches due to the ensemble nature of techniques as shown in Table 7.

4.2 Combination of features set based classification

Further experiments were carried out on the combination of proposed features groups. These combinations include sentiment+POS, sentiment+ content, sentiment+tweet specific, POS+content, POS+tweet specific, content+tweet specific, sentiment+POS+content, sentiment+content+tweet specific, sentiment+POS+tweet specific, POS+ content+tweet specific and all features group. All features set (sentiment+content+POS+ tweet specific) proved to be more effective when applied with SVM classifier which is widely used for text classification due to its effectiveness for high dimensional data. The computed results show that SVM has superseded all other applied approaches and the baseline methods [31, 32] with an accuracy of 79.30% and F-score 75.11% for SemEval2016 as shown in Table 8.

The same experiments were performed on the SemEval2017 dataset. According to the results, the sentiment+content+POS+tweet-based features group is found more effective when applied with SVM. Also, SVM has dominated all other applied approaches as well as the baseline methods [33 –35] with the accuracy of 80.10% and F-score 75.87% for SemEval2017 as shown in Table 9.

4.3 Discussion of optimal features set

The proposed feature sets are fed to conventional machine learning algorithms such as NB, DT and SVM and ensemble learning algorithms such as RF and AdaBoost. The proposed feature sets are fed alone and then as group of features to find the optimal results. After applying all the algorithms on single feature set their average is calculated for each single feature set. According to the computed results, Part-of-speech (POS) proved to be a promising single feature set, when applied with SVM for both datasets of SemEval2016 & SemEval2017 as shown in Fig. 5.

Fig. 5

Comparison of Optimal Results using Single Feature Set for Stance Classification based on all ML classifiers.

Further analysis is performed on various combination of feature sets fed into the above-mentioned classifiers. The average results of all the combinations are calculated for each applied classifier. The computed results show that (sentiment+POS+content+tweet specific) feature group is more effective when applied with SVM and given the highest accuracy and F-score values as shown in Fig. 6 for both dataset of SemEval2015 and SemEval2017.

Fig. 6

Comparison of Optimal Results using Feature Sets Combination for Stance Classification based on all ML classifiers.

4.4 Deep features based classification

In addition to the proposed feature sets: sentiment, content, tweet specific and part of speech, some baseline and deep features are also included in this research study. To compare our proposed features the baseline features such as BoW and N-gram are also extracted from the selected datasets. Then, deep features are not explored for the stance classification task till now. To explore the effectiveness of deep features, GloVe and Word2Vec are also extracted from both datasets SemEval2016 and SemEval2017. Deep features are applied with deep learning algorithms. Along with deep features the deep learning algorithms are also applied due to effectiveness and reliability for the unstructured and un labelled data. Deep features such as GloVe and Word2Vec are extracted from SemEval2016 and SemEval2017 datasets and compared with N-gram and Bag of Words features. The deep features are then fed to deep learning classifiers including Deep Belief Network (DBN), Convolutional Neural Network (CNN) based Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN) for stance classification. GloVe feature is more effective when applied with RNN and outperformed other deep learning techniques when applied with other deep features. According to the computed results, RNN have achieved the highest accuracy of 86.02% with f-score of 83.28% on SemEval2016 dataset and accuracy of 83.18% with f-score of 78.8% on SemEval2017 as presented in Table 5.

Table 5
Deep features based stance classification results

Algorithm Features SemEval2016 SemEval2017

Acc Pre Rec F Acc Pre Rec F

DBN GloVe 78.47 72.74 75.97 74.32 76.42 70.84 73.99 72.38

Word2vec 76.90 71.28 74.45 72.83 72.90 67.57 70.58 69.04

N-Gram 64.23 51.60 54.25 52.89 63.20 50.78 53.38 52.05

BoW 62.35 48.17 50.74 49.42 63.99 52.72 55.36 54.01

CNN-LSTM GloVe 83.69 77.58 81.03 79.27 81.14 75.21 78.56 76.85

Word2vec 82.02 76.03 79.41 77.68 78.51 72.78 76.02 74.36

N-Gram 68.04 55.37 58.17 56.73 66.89 54.43 57.18 55.77

BoW 67.93 54.58 57.38 55.94 68.16 56.86 59.67 58.23

RNN GloVe 86.02 79.74 83.28 81.47 83.18 77.11 80.53 78.78

Word2vec 83.30 77.22 80.65 78.90 81.52 75.57 78.92 77.21

N-Gram 68.78 56.68 59.51 58.06 68.15 55.45 58.26 56.82

BoW 69.69 59.58 62.45 60.98 69.57 59.47 62.34 60.87

Algorithm	Features	SemEval2016	SemEval2017
DBN	GloVe	78.47	72.74	75.97	74.32	76.42	70.84	73.99	72.38
	Word2vec	76.90	71.28	74.45	72.83	72.90	67.57	70.58	69.04
	N-Gram	64.23	51.60	54.25	52.89	63.20	50.78	53.38	52.05
	BoW	62.35	48.17	50.74	49.42	63.99	52.72	55.36	54.01
CNN-LSTM	GloVe	83.69	77.58	81.03	79.27	81.14	75.21	78.56	76.85
	Word2vec	82.02	76.03	79.41	77.68	78.51	72.78	76.02	74.36
	N-Gram	68.04	55.37	58.17	56.73	66.89	54.43	57.18	55.77
	BoW	67.93	54.58	57.38	55.94	68.16	56.86	59.67	58.23
RNN	GloVe	86.02	79.74	83.28	81.47	83.18	77.11	80.53	78.78
	Word2vec	83.30	77.22	80.65	78.90	81.52	75.57	78.92	77.21
	N-Gram	68.78	56.68	59.51	58.06	68.15	55.45	58.26	56.82
	BoW	69.69	59.58	62.45	60.98	69.57	59.47	62.34	60.87

The best results of the combination and single feature sets are shown in Table 6. The mentioned feature sets have given best performance with the different machine learning and deep learning classifiers. According to the empirical analysis-based results, the deep learning models outperformed the traditional machine learning classifiers when applied on deep features extracted from both data sets. Deep learning has outperformed the other machine learning approaches due to their complex nature and advantage of the current context as shown in Table 6. Same with the case of proposed features and deep features, the proposed features are more effective when applied with traditional classifiers and deep features have given more promising results when applied with deep learning approaches. When compared the overall performance the deep learning technique RNN model outperformed all other machine learning classifiers when applied on a deep feature GloVe and achieved the highest accuracy of 86.02% and 83.18% for SemEval2016 and SemEval2017 datasets, respectively. While for traditional and ensemble classifiers, SVM outperformed all other techniques when applied on the group of (sentiment, content, part-of-speech, and tweet specific) features. The results of best combinations of features for different algorithms are given in Table 6. GloVe a deep feature and (sentiment, content, part-of-speech, and tweet specific) remained dominant for the deep learning and conventional machine learning techniques, respectively. According to the results RNN outperformed CNN-LSTM, DBN deep learning methods for both datasets whereas, in the case of machine learning, the SVM model outperformed other machine learning classifier with group of all the proposed feature sets. as shown in Table 6.

Table 6

Comparison of optimal results for stance classification using feature sets combination

Algorithm	SemEval2016					SemEval2017
	Feature(s)	Acc	Pre	Rec	F	Feature(s)	Acc	Pre	Rec	F
NB	SPC	63.69	59.04	61.67	60.33	S	66.44	61.59	64.32	62.93
DT	PCT	72.42	67.13	70.11	68.59	PCT	75.02	69.55	72.64	71.06
SVM	SPCT	79.30	73.51	76.78	75.11	SPCT	80.10	74.25	77.55	75.87
AdaBoost	SPCT	77.47	71.81	75.01	73.37	SPCT	79.10	73.33	76.58	74.92
RF	SPC	66.88	62.00	64.75	63.34	PCT	77.05	71.43	74.60	72.98
DBN	GloVe	78.47	72.74	75.97	74.32	GloVe	76.42	70.84	73.99	72.38
CNN-LSTM	GloVe	83.69	77.58	81.03	79.27	GloVe	81.14	75.21	78.56	76.85
RNN	GloVe	86.02	79.74	83.28	81.47	GloVe	83.18	77.11	80.53	78.78

4.5 Comparison of proposed technique with existing techniques

The proposed method for stance classification is found efficient when applied with the proposed features set and it outperformed the baseline approaches for selected datasets of SemEval2016 and SemEval2017 as presented in Table 10.

Table 7
Comparison of single feature set using ML classifiers for stance classification

Dataset Features NB DT SVM AdaBoost RF

Acc Pre Rec F Acc Pre Rec F Acc Pre Rec F Acc Pre Rec F Acc Pre Rec F

SemEval2016 S 62.44 56.59 59.16 57.85 58.12 44.90 47.30 46.07 68.46 53.59 56.41 54.96 68.59 54.40 57.23 55.78 65.56 59.42 62.12 60.74

P 52.46 42.15 44.31 43.20 60.54 47.39 49.89 48.61 69.47 56.53 59.39 57.92 69.61 57.36 60.23 58.76 55.09 43.12 45.39 44.23

C 57.25 47.76 50.12 48.91 62.46 50.82 53.39 52.08 69.35 55.71 58.57 57.11 62.55 47.68 50.25 48.93 60.11 49.53 52.00 50.74

T 42.58 32.45 34.21 33.31 67.57 58.46 61.24 59.82 68.56 54.38 57.20 55.75 68.70 55.19 58.02 56.57 44.71 34.08 35.92 34.97

SemEval2017 S 66.44 61.59 64.32 62.93 56.78 43.28 45.62 44.41 69.65 53.80 56.67 55.20 69.62 54.50 57.37 55.90 60.19 45.88 48.36 47.09

P 47.81 38.41 40.38 39.37 62.61 49.01 51.59 50.27 69.65 54.52 57.39 55.92 70.90 56.96 59.88 58.38 68.04 56.07 58.87 57.43

C 53.43 45.68 47.88 46.75 65.16 53.69 56.38 55.00 69.65 53.09 55.95 54.48 64.86 49.44 52.11 50.74 65.06 50.26 52.94 51.56

T 23.91 18.23 19.21 18.71 69.65 60.26 63.13 61.66 69.67 56.69 59.56 58.09 69.75 55.32 58.19 56.72 69.67 58.85 61.72 60.25

Dataset	Features	NB	DT	SVM	AdaBoost	RF
SemEval2016	S	62.44	56.59	59.16	57.85	58.12	44.90	47.30	46.07	68.46	53.59	56.41	54.96	68.59	54.40	57.23	55.78	65.56	59.42	62.12	60.74
	P	52.46	42.15	44.31	43.20	60.54	47.39	49.89	48.61	69.47	56.53	59.39	57.92	69.61	57.36	60.23	58.76	55.09	43.12	45.39	44.23
	C	57.25	47.76	50.12	48.91	62.46	50.82	53.39	52.08	69.35	55.71	58.57	57.11	62.55	47.68	50.25	48.93	60.11	49.53	52.00	50.74
	T	42.58	32.45	34.21	33.31	67.57	58.46	61.24	59.82	68.56	54.38	57.20	55.75	68.70	55.19	58.02	56.57	44.71	34.08	35.92	34.97
SemEval2017	S	66.44	61.59	64.32	62.93	56.78	43.28	45.62	44.41	69.65	53.80	56.67	55.20	69.62	54.50	57.37	55.90	60.19	45.88	48.36	47.09
	P	47.81	38.41	40.38	39.37	62.61	49.01	51.59	50.27	69.65	54.52	57.39	55.92	70.90	56.96	59.88	58.38	68.04	56.07	58.87	57.43
	C	53.43	45.68	47.88	46.75	65.16	53.69	56.38	55.00	69.65	53.09	55.95	54.48	64.86	49.44	52.11	50.74	65.06	50.26	52.94	51.56
	T	23.91	18.23	19.21	18.71	69.65	60.26	63.13	61.66	69.67	56.69	59.56	58.09	69.75	55.32	58.19	56.72	69.67	58.85	61.72	60.25

Table 8

Comparison of feature sets combination using ML classifiers on SemEval2016 for stance classification

Features	NB				DT				SVM				AdaBoost				RF
	Acc	Pre	Rec	F	Acc	Pre	Rec	F	Acc	Pre	Rec	F	Acc	Pre	Rec	F	Acc	Pre	Rec	F
SP	60.32	52.81	55.30	54.03	62.30	50.05	52.62	51.30	72.41	59.67	62.65	61.12	72.56	62.03	65.02	63.49	63.34	55.45	58.06	56.73
SC	63.43	58.15	60.76	59.42	63.91	52.66	55.29	53.94	73.04	60.93	63.94	62.40	69.51	56.56	59.42	57.95	66.60	61.05	63.80	62.40
ST	56.18	46.29	48.61	47.42	67.25	57.49	60.26	58.84	73.31	61.91	64.93	63.39	73.45	63.55	66.58	65.03	58.99	48.00	50.43	49.19
PC	59.24	51.26	53.70	52.45	66.42	55.41	58.15	56.75	74.96	64.09	67.17	65.59	71.37	59.54	62.48	60.98	62.20	53.18	55.74	54.43
PT	51.80	40.55	42.68	41.59	69.82	61.85	64.72	63.25	75.23	65.09	68.19	66.60	75.38	67.55	70.65	69.07	54.39	42.01	44.25	43.11
CT	54.90	44.67	46.94	45.78	71.51	65.56	68.50	67.00	75.85	66.41	69.53	67.93	72.19	60.97	63.95	62.42	57.65	46.31	48.69	47.47
SPC	63.69	59.04	61.67	60.33	67.02	56.60	59.36	57.95	76.69	67.93	71.09	69.48	74.28	65.03	68.09	66.53	66.88	62.00	64.75	63.34
SPT	58.79	50.26	52.68	51.44	69.53	60.87	63.74	62.27	77.09	69.08	72.26	70.63	77.24	70.81	73.99	72.37	61.73	52.14	54.68	53.38
SCT	61.12	54.14	56.66	55.37	70.87	63.51	66.43	64.93	77.73	70.46	73.66	72.02	75.28	66.68	69.78	68.19	64.17	56.84	59.49	58.14
PCT	57.87	48.88	51.26	50.04	72.42	67.13	70.11	68.59	78.81	72.24	75.49	73.83	76.33	69.18	72.33	70.72	60.76	50.69	53.20	51.92
SPCT	61.73	55.32	57.86	56.56	71.50	64.81	67.75	66.25	79.30	73.51	76.78	75.11	77.47	71.81	75.01	73.37	64.82	58.08	60.75	59.39

Table 9

Comparison of feature sets combination using ML classifiers on SemEval2017 for stance classification

Features	NB				DT				SVM				AdaBoost				RF
	Acc	Pre	Rec	F	Acc	Pre	Rec	F	Acc	Pre	Rec	F	Acc	Pre	Rec	F	Acc	Pre	Rec	F
SP	59.98	53.75	56.22	54.96	62.68	49.71	52.29	50.97	73.13	60.26	63.27	61.73	73.77	62.31	65.35	63.79	67.32	54.09	56.86	55.44
SC	63.53	58.24	60.86	59.52	64.63	52.59	55.25	53.89	73.83	61.59	64.64	63.08	71.28	58.73	61.67	60.16	66.38	52.65	55.38	53.98
ST	48.34	40.33	42.32	41.30	67.64	56.43	59.22	57.79	74.54	62.95	66.02	64.45	74.56	64.51	67.58	66.01	69.48	57.97	60.83	59.36
PC	54.67	47.86	50.12	48.96	69.00	58.99	61.83	60.38	75.22	64.31	67.40	65.82	73.31	61.16	64.18	62.63	71.88	62.19	65.15	63.63
PT	39.09	30.20	31.81	30.98	72.08	63.85	66.82	65.30	75.93	65.69	68.82	67.22	76.65	67.90	71.06	69.44	75.05	68.03	71.12	69.54
CT	42.54	33.30	35.05	34.15	74.15	67.97	71.02	69.46	76.63	67.09	70.24	68.63	74.03	63.29	66.34	64.78	74.10	66.40	69.46	67.90
SPC	62.04	56.23	58.79	57.48	68.29	57.67	60.49	59.05	77.31	68.48	71.67	70.04	75.99	66.53	69.66	68.06	71.52	61.14	64.09	62.58
SPT	51.58	43.56	45.69	44.60	70.58	61.79	64.70	63.21	78.01	69.91	73.12	71.48	78.50	71.96	75.20	73.54	73.89	65.45	68.49	66.94
SCT	54.16	46.86	49.09	47.95	72.17	64.67	67.64	66.12	78.71	71.34	74.59	72.93	76.93	68.93	72.10	70.48	73.42	64.28	67.31	65.76
PCT	47.56	37.72	39.68	38.67	75.02	69.55	72.64	71.06	79.41	72.79	76.06	74.39	78.09	70.78	74.00	72.36	77.05	71.43	74.60	72.98
SPCT	55.08	48.79	51.06	49.90	73.08	66.24	69.25	67.71	80.10	74.25	77.55	75.87	79.10	73.33	76.58	74.92	75.60	69.31	72.42	70.83

5 Related work

This section presents the review of existing studies for stance classification based on feature, ensemble, and deep learning-based approaches.

5.1 Categorization of existing studies of stance classification

This section discusses the earlier research studies carried out for stance classification-based on machine learning approaches. The earlier work is divided into three categories: feature-based, deep learning-based, and ensemble-based machine learning techniques.

5.1.1 Feature-based approaches

Diverse machine learning approaches consists of feature set are applied in the field of stance classification are discussed here. Support Vector Machine (SVM) is widely used as a feature-based technique for stance classification. SVM was used as a baseline classifier for comparison with other techniques. Saif et al., targeted the tweets to perform sentiment analysis and stance detection. SemEval-2016 was selected to carry out the experiments. Word embedding technique used to improve the stance classification [31, 37]. SVM studies include the following feature set for stance classification: n-grams and character gram, syntactic n-grams, target transfer features syntactic and positional features, word embedding, word-length, number-of-words, number-of-hashtags, number-of-words starting with capital letters, parts of speech, and bag of words features. Further to address the problem of target-training dataset for SemEval-2016, a new approach is introduced based on Bag-of-words to represent the features of tweets. Logistic Regression is used to classify the labelled tweets and achieved optimal results for sub tasks of SemEval-2016 [38].

Table 10
Comparison of Performance with Existing Techniques for Stance Classification

SemEval2016 SemEval2017

Techniques Acc(%) Ref. Techniques Accuracy Ref.

SVM with (sentiment+semantic) features 74.44% [32] SVM (content+pos+n-gram) 78.0% [35]

Bi-LSTM (word embedding) 70.03% [36] Bi-LSTM (vectors, lexicons, content) 78.4% [34]

SVM (N-gram) 67.82% [31] SVM (BoW+sentiment) 70.9% [33]

Proposed model 75.11% Proposed Model 80.10%

SemEval2016	SemEval2017
SVM with (sentiment+semantic) features	74.44%	[32]	SVM (content+pos+n-gram)	78.0%	[35]
Bi-LSTM (word embedding)	70.03%	[36]	Bi-LSTM (vectors, lexicons, content)	78.4%	[34]
SVM (N-gram)	67.82%	[31]	SVM (BoW+sentiment)	70.9%	[33]
Proposed model	75.11%		Proposed Model	80.10%

Another approach is presented to predict the voting intentions of twitter people based on stance detection. A semi-supervised approach is selected along with additional features of sensitive text and network information. The proposed approach is tested on real time environment and achieved effective and robust results [39]. Logistic Regression studies include the following feature set: Linguistic, topic model, word vector, similarity, sentiment, and tweet-specific features [40], headlines features, claim-headline features, domain-independent features, pattern around prior-situation/effect features [41, 42]. Naïve Bayes (NB)is the third frequently used traditional classifier for stance detection. Addawood et al., [43] investigated users stance in debate. The proposed method used lexical, twitter specific and syntactic features for classifying stance. The models trained on these features achieved significant accuracy. Another approach is proposed [44] for stance classification based on sentiment polarity and target information. Target-specific features are useful for stance classification. NB based studies used following features set: content-based features, sentiment-based (AFINN, Hu & Liu, LIWC, DAL), structural features (Hashtags, punctuation marks), context-based (target of interest mentioned by name, target by pronouns, target party, target party colleagues, target oppositions, nobody) [45].

Various other classifiers employed for stance classification based on feature-based techniques are cited here. Artificial Neural Network (ANN) related studies-based on feature set: TF-IDF [46]. Corpus Quantitative and computational based features, adjective frequency, adverb frequency, character number/sentence, Word number/sentence, Commas frequency, conjugation frequency, Digitals frequency, full stops frequency, Pronouns frequency, nouns frequency. These features have improved the performance of stance classification for following categories including necessity, prediction, source, uncertainty and source of knowledge. The experimental results showed that stance formation is highly correlated with these features and can be effective in automatic stance classification [47]. Another framework is proposed to find the stance in news articles. The proposed approach calculated the correlation between headline and body of news. Deep learning is applied to extract the features. Bidirectional LSTM achieved the highest accuracy [48, 49]. Decision Tree (DT) related studies used feature set: n-grams, explicit stance predicted, explicit stance oracle, majority class baseline [50]. K Nearest Neighbor (kNN) [51]. Maximum Entropy method based on feature set: Handcrafted features, lexical features, Sentiment-based features, and topic features [52] used for stance classification. To detect stance in topics, a method is proposed based on factorization machine. Factorization machines are applied for recommendations of users’ preferences. The proposed approach achieved best results for predicting stance of silent users [53], active learning with SVM [54], and active learning with logistic regression [55].

5.2 Ensemble approaches

Ensemble learners combine more than one classifier to achieve the desired output. The ensemble learners include majority voting technique is used to detect the stance in tweets. The proposed technique used a part-of-speech method and hashtag segmentation for tweets representation in form of trees. Majority voting method is more efficient for stance classification while applied on SemEval-2016 dataset. The proposed method is compared with the state of the art methods and proved to be more effective for stance identification [56]. Combined Long Short Term Memory (LSTM) and Convolutional Neural Network (CNN) employed for stance detection in debates [57]. Some other related studies-based on the ensemble based approach include a semi-supervised learning approach [58], stance detection in fake news based on LSTM and GRU [59]. To contribute in field of stance detection, a new approach is presented by Yiwei et al. [60]. The proposed approach presented the tweets in form of vectors. The vectors are constructed with respect of target regardless of fact that if target is mentioned or not mentioned in a tweet. The proposed approach based on CNN and GRU focused on the target information with vector representation of tweets. Bi-directional GRU based CNN is applied that deals tweets on semantics. The proposed model collects the useful information about target-based stance from informative tokens. The proposed approach builds conditional vectors of tweets with respect to target. The experiments are carried out on the SemEval-2016 dataset. The empirical results demonstrated that the proposed approach have outperformed many baseline techniques based on token level method, GRU and SVM classifiers.

To detect the stance in tweets with respect to target information the conventional methods used the attention-based approaches. These approaches combine the target information with tweets but lacks in better performance due to the usage of a information repeatedly. To address this issue, another technique is introduced to detect the stance in tweets based on target information. The proposed method collects the better information of target content. The proposed model learns the vector representation of tweets with respect to target information. After collecting the enough information, the model runs the iterative process to extract the critical information of target. This critical information about the target is gathered through multiple interaction of target with tweets words. The proposed model is tested on SemEval-2016 and dominated the state of the art methods which cannot express the opinions in tweets explicitly [61].Some other ensemble-based related studies are based on features cited here. SVM, Random Forest based on feature set: character and word level feature, character N-grams, Word N-grams, stance indicative takers [62]. For consumers’ health related queries, a new ensemble-based technique is presented. The proposed model is tested on consumer’s health information query-based dataset. The dataset contains stance-based query results and helps researchers in proper decision making. Apart from Bag of Words the linguistic characteristics are used for the consumers health dataset. Diverse features are extracted based on medical semantics, stance vector and sentiment polarity. SVM and Neural Network based ensemble technique is used to find the impact of the proposed features for stance classification. The proposed features are more effective than the BoW features and outperformed other state of the art methods [63]. Tomás Hercig et al., introduced a method to detect stance in online discussions. The proposed method detects whether an author is in against, favor or neutral towards the given target. The proposed method based on SVM, Maximum Entropy and CNN and extracted n-gram, BoW, bag of adverb, bag of adjectives, negative emoticon, word shape, n-gram, text length feature set to detect stance in online discussions. The proposed approach is efficient then state of the art methods [64]. SVM, Logistic Regression, Naïve Bayes, Decision Trees, Random Forest model-based on Hashtag, sentiment, quantitative, emoji’s, orthographic, long words, stop words, onomatopoeia, slang, POS tag, character unigram, character bigram, character trigram, word unigram, word bigram, word trigram, skip character n-gram, skip word n-gram [65]. Ensemble learner methods less applied than traditional approaches; due to the performance of ensemble learners, they can be applied more in the future to improve the performance of stance detection [66].

5.3 Deep learning approaches

Deep Neural Network is used for stance classification alone and along with some other traditional approaches. Isabelle Augenstein et al., [67] proposed a model based on Long Short-Term Memory (LSTM) with conditional encoding for stance classification. The proposed model addresses the problem of missing targets and training data. The proposed model builds a tree of tweets based on target. Bi-directional method conditional encoding increased the performance of LSTM. The experiments are carried out on SemEval-2016 and has given efficient results for semi-labelled data based on target function. Another method is introduced to detect stance in topics. The proposed model is based on two phases. The first phase calculates the subjectivity of tweets with respect to topic. In second phase, the sentiment analysis is performed on tweets. LSTM is used on every stage with attention model. SemEval 2016 is selected to carry out the experiments. The proposed method outperformed the other deep learning methods with 60.2% accuracy [13]. Qingying SUN et al., improved the sentiment analysis for detection of stance. The proposed model is based on neural network to predict the sentiment and stance of a post. Neural network approach aids in learning the representations of sentiment and stance collectively. The stacking approach helps in detection of sentiments for stance classification [68]. Mavrin et al., proposed a model based on LSTM and soft attention model to detect stance of tweets with respect to news. The experiments are carried out on publicly available dataset and outperformed other state of the art method [69]. To detect the stance in documents a neural network-based approach is presented. The proposed approach makes a model of words to learn representation of documents. The proposed technique computes the linguistic features of documents to detect the stance. A hierarchy-based attention model learns the relation between documents and linguistic information. The empirical analysis is applied on the two datasets and outperformed other attention based models in terms of accuracy [70]. To contribute in field of stance detection, a new approach is presented by Yiwei et al. The proposed approach presented the tweets in form of vectors. The vectors are constructed with respect of target regardless of fact that if target is mentioned or not mentioned in a tweet. The proposed approach based on CNN and GRU focused on the target information with vector representation of tweets. Bi-directional GRU based CNN is applied that deals tweets on semantics. The proposed model collects the useful information about target-based stance from informative tokens. The proposed approach builds conditional vectors of tweets with respect to target. The experiments are carried out on the SemEval-2016 dataset. The results demonstrated that the proposed approach have outperformed many baseline techniques based on token level method, GRU and SVM classifiers [60]. Deep learning approaches used word vectors features, word embedding, and character gram based on sentiment lexicon [66]. By employing the neural networks techniques improved the stance classification [13 , 69]. Problem Statement.

In the literature, various feature sets have been exploited using machine learning techniques to improve the results; however, still, there is a need to investigate those feature sets for emerging domains of stance classification to further improve accuracy because of sensitive nature of sentiment in opinion seeking process. The problem statement for stance classification is formally described as: A twitter user u_i belongs to U can share his views and opinions by creating a new tweet (t_i belongs to T) using the Twitter microblog where U is the set of m users, such that U ={ u₁, u₂, u₃, …, u_m }. It may be a reply of a user u_i to the tweet by any other user u_j belongs to U in the Twitter social network. A user can express his views or opinions in Favor, Against or Neutral (neither favor nor against).

Formally, a tweet t∈ T = { t₁, t₂, t₃, …, t_n } is a sequence of words wεW, our task is to classify each, by a user u_jU ={ u₁, u₂, u₃, …, u_m } into one of the stance class c_k∈ C = { c₁, c₂, c₃ }.

6 Conclusion

The proposed research study contributes to the field of sentiment analysis in general and to the research domains of stance classification. Following the main objective to improve the accuracy of classifiers for stance classification. The proposed study explores a variant of features including the proposed features (content, sentiment, tweet specific, part-of-speech) and deep features such as (GloVe and Word2Vec). Some base line features (Bag of Words, N-Gram and TF-IDF) are also computed form the selected datasets to compare the performance of our proposed approach. Feature Selection techniques are applied to find the relevant attributes including (IG, GR and Relief-F).

For investigating the likelihood of machine learning techniques, conventional, ensemble-based, deep learning-based techniques are investigated. In conventional machine learning techniques, three methods including Naïve Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM) are used. AdaBoost and Random Forest are used in the case of ensemble-based techniques. Recurrent Neural Network (RNN), Deep Belief Network (DBN) and Convolutional Neural Network (CNN-LSTM) are exploited in deep learning category of techniques. The computed results show that deep features which are still not explored for the stance classification have given the best results. GloVe a deep feature outperformed other deep learning techniques when applied with RNN for both datasets SemEval2016 and semEval2017. In case of the proposed feature sets, the combination of all feature sets (sentiment, Part of speech, content, and tweet) has outperformed the other machine learning techniques when applied with SVM for both datasets. Thus, in general the combination of all proposed feature sets (sentiment, Part of speech, content, and tweet) and GloVe a deep feature have given the best results for the task of stance classification for tweets.

References

Zamir

, Khan

H.U.

, Mehmood

, Iqbal

and Akram

A.U.J.T.E.L.

, A feature-centric spam email detection model using diverse supervised machine learning algorithms, 2020.

Zamir

, et al., Phishing web site detection using diverse machine learning algorithms, 2020.

Mahmood

A.J.J.o.D.

, Information and Management, Identifying the influence of various factor of apps on google play apps ratings, 2(1) (2020), 15–23.

Malik

, et al., A novel approach for finding research areas for new researchers, 25(3) (2019), 182–204.

Mahmood

and Khan

H.U.J.T.E.L.

, Identification of critical factors for assessing the quality of restaurants using data mining approaches, 2019.

Farooq

, Khan

H.U.

, Iqbal

and Iqbal

S.J.T.E.L.

, An index-based ranking of conferences in a distinctive manner, 2019.

Chelliah

and Sarkar

, Product Recommendations Enhanced with Reviews, in Proceedings of the Eleventh ACM Conference on Recommender Systems, 2017, pp. 398–399: ACM.

Chua

A.Y.

and Banerjee

, Helpfulness of user-generated reviews as a function of review sentiment, product type and information quality, Computers in Human Behavior 54 (2016), 547–554.

Wang

, Joo

, Tong

and Chan

, Issues of social data analytics with a newmethod for sentiment analysis of social media data, in Cloud Computing Technology and Science (CloudCom), 2014 IEEE 6th International Conference on, 2014, pp. 899–904: IEEE.

10.

Kumar

, Cheng

and Leskovec

, Antisocial Behavior on theWeb: Characterization and Detection, in Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 947–950: International World Wide Web Conferences Steering Committee.

11.

Du Bois

J.W.J.S.i.d.S.

, Evaluation, interaction, The stance triangle, 164(3) (2007), 139–182.

12.

Zhang

, Qiu

, Chen

, Zhang

, Yu

and Elhadad

, We make choices we think are going to save us: Debate and stance identification for online breast cancer CAM discussions, (in eng), Proc Int World Wide Web Conf 2017, pp. 1073–1081, Apr 2017.

13.

Dey

, Shrivastava

and Kaushik

, Topical stance detection for Twitter: A two-phase LSTMmodel using attention, in European Conference on Information Retrieval, 2018, pp. 529–536: Springer.

14.

Lai

, Cignarella

A.T.

, Farías

D.I.H.

, Bosco

, Patti

and Rosso

, Multilingual stance detection in social media political debates, Comput Speech Lang 63 (2020), 101075.

15.

Barfar

and Padmanabhan

, Predicting presidential election outcomes from what people watch, Big data 5(1) (2017), 32–41.

16.

L.Y.-F.

, Cacciatore

M.A.

, Liang

, Brossard

, Scheufele

D.A.

and Xenos

M.A.

, Analyzing public sentiments online: Combining human-and computer-based content analysis, Information, Communication & Society 20(3) (2017), 406–427.

17.

Wang

, Zhou

, Jiang

, Si

and Yang

, A Survey on Opinion Mining: from Stance to Product Aspect, IEEE Access, 2019.

18.

Sun

, Wang

, Li

, Zhu

and Zhou

, Stance detection via sentiment information and neural network model, Frontiers of Computer Science 13(1) (2019), 127–138.

19.

Elbagir

and Yang

, Sentiment Analysis on Twitter with Python’s Natural Language Toolkit and VADER Sentiment Analyzer, 2020, pp. 63–80.

20.

Adarsh

, Patil

, Rayar

and Veena

K.M.

, Comparison of VADER and LSTM for sentiment analysis, International Journal of Recent Technology and Engineering 7 (2019), 540–543, 03/01.

21.

Zamir

, Khan

, Mehmood

, Iqbal

and Akram

, A feature-centric spam email detection model using diverse supervised machine learning algorithms, The Electronic Library, vol. ahead-of-print, 07/04 2020.

22.

Borg

and Boldt

M.J.E.S.W.A.

, Using VADER Sentiment and SVM for Predicting Customer Response Sentiment, p. 113746, 2020.

23.

Khan

H.U.

, Mixed-sentiment classification of web forum posts using lexical and non-lexical features, Journal of Web Engineering 16(1-2) (2017), 161–176.

24.

Khan

H.U.

and Daud

, Using Machine Learning Techniques for Subjectivity Analysis based on Lexical and Nonlexical Features, International Arab Journal of Information Technology (IAJIT), 14(4), 2017.

25.

Huang

, Xie

, Rao

, Feng

and Wang

F.L.

, Sentiment Strength Detection With a Context-dependent Lexicon-based Convolutional Neural Network, Information Sciences 520, 02/01 2020.

26.

Miranda-Jiménez

, Graff

, Tellez

and Moctezuma

, INGEOTEC at SemEval 2017 Task 4: A B4MSA Ensemble based on Genetic Programming for Twitter Sentiment Analysis, 2017, pp. 771–776.

27.

Rosenthal

, Farra

and Nakov

, SemEval-2017 Task 4: Sentiment Analysis in Twitter, 2019.

28.

Hassan

M.M.

, Alam

M.G.R.

, Uddin

M.Z.

, Huda

, Almogren

and Fortino

G.J.I.F.

, Human emotion recognition using deep belief network architecture, 51 (2019), 10–18.

29.

Elzayady

, Badran

K.M.

and Salama

G.I.

, Arabic Opinion Mining Using Combined CNN-LSTM Models.

30.

Ming

, et al., Understanding hidden memories of recurrent neural networks, in 2017 IEEE Conference on Visual Analytics Science and Technology (VAST), 2017, pp. 13–24: IEEE.

31.

Mohammad

, Kiritchenko

, Sobhani

, Zhu

and Cherry

, Semeval-2016 task 6: Detecting stance in tweets, in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 31–41.

32.

Dey

, Shrivastava

and Kaushik

, Twitter stance detection—A subjectivity and sentiment polarity inspired two-phase approach, in 2017 IEEE international conference on data mining workshops (ICDMW), 2017, pp. 365–372: IEEE.

33.

Enayet

and El-Beltagy

S.R.

, NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter, in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017, pp. 470–474.

34.

Kochkina

, Liakata

and Augenstein

I.J.a.p.a.

, Turing at semeval-2017 task 8: Sequential approach to rumour stance classification with branch-lstm, 2017.

35.

Bahuleyan

and Vechtomova

, UWaterloo at SemEval-2017 Task 8: Detecting stance towards rumours with topic independent features, in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017, pp. 461–464.

36.

Siddiqua

U.A.

, Chy

A.N.

and Aono

, Tweet Stance Detection Using Multi-Kernel Convolution and Attentive LSTM Variants, IEICE TRANSACTIONS on Information and Systems 102(12) (2019), 2493–2503.

37.

Mohammad

S.M.

, Sobhani

and Kiritchenko

S.J.A.T.O.I.T.

, Stance and sentiment in tweets, 17(3) (2017), 1–23.

38.

Augenstein

, Vlachos

and Bontcheva

, Usfd at semeval-2016 task 6: Any-target stance detection on twitter with autoencoders, in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 389–393.

39.

Tsakalidis

, Aletras

, Cristea

A.I.

and Liakata

, Nowcasting the stance of social media users in a sudden vote: The case of the Greek Referendum, in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 367–376.

40.

Zhang

and Lan

, ECNU at SemEval 2016 task 6: relevant or not? Supportive or not? A two-step learning system for automatic detecting stance in tweets, in Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 2016, pp. 451–457.

41.

Sasaki

, Mizuno

, Okazaki

and Inui

, Stance classification by recognizing related events about targets, in 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), 2016, pp. 582–587: IEEE.

42.

Ferreira

and Vlachos

, Emergent: a novel data-set for stance classification, in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2016, pp. 1163–1168.

43.

Addawood

, Schneider

and Bashir

, Stance classification of twitter debates: The encryption debate as a use case, in Proceedings of the 8th International Conference on Social Media & Society, 2017, pp. 1–10.

44.

, Twitter Stance Detection with Textual, Sentiment, and Target-specific Models, 2020.

45.

Lai

, Farías

D.I.H.

, Patti

and Rosso

, Friends and enemies of clinton and trump: using context for detecting stance in political tweets, in Mexican International Conference on Artificial Intelligence, 2016, pp. 155–168: Springer.

46.

Zhang

, Yilmaz

and Liang

, Ranking-based method for news stance detection, in Companion Proceedings of the The Web Conference 2018, 2018, pp. 41–42.

47.

Simaki

, Simakis

, Paradis

and Kerren

, Detection of stance-related characteristics in social media text, in Proceedings of the 10th Hellenic Conference on Artificial Intelligence, 2018, pp. 1–7.

48.

Rajendran

, Chitturi

and Poornachandran

, Stance-In-Depth Deep Neural Approach to Stance Classification, Procedia Computer Science 132 (2018), 1646–1653, 01/01.

49.

, Pan

, Zhang

and Fu

, Stance Detection in Chinese MicroBlogs with Neural Networks, in NLPCC/ICCPOL, 2016.

50.

Wojatzki

and Zesch

, Stance-based Argument Mining –Modeling Implicit Argumentation Using Stance, 2016.

51.

Shenoy

G.G.

, Dsouza

E.H.

and Kübler

S.J.A.

, Performing Stance Detection on Twitter Data using Computational Linguistics Techniques, vol. abs/1703.02019, 2017.

52.

, Paris

, Nepal

and Sparks

R.J.a.p.a.

, Cross-target stance classification with self-attention networks, 2018.

53.

Sasaki

, Hanawa

, Okazaki

and Inui

, Predicting Stances from Social Media Posts using Factorization Machines, in COLING, 2018.

54.

Simaki

, Paradis

, Skeppstedt

, Sahlgren

, Kucher

and Kerren

, Annotating Speaker Stance in Discourse: The Brexit Blog Corpus, Corpus Linguistics and Lingustic Theory, 11/03 2017.

55.

Skeppstedt

, Sahlgren

, Paradis

and Kerren

, Active learning for detection of stance components, in Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES), 2016, pp. 50–59.

56.

Siddiqua

U.A.

, Chy

A.N.

and Aono

, Stance detection on microblog focusing on syntactic tree representation, in International Conference on Data Mining and Big Data, 2018, pp. 478–490: Springer.

57.

Zhang

, Qiu

, Chen

, Zhang

, Yu

and Elhadad

, We make choices we think are going to save us: Debate and stance identification for online breast cancer CAM discussions, in Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 1073–1081.

58.

Fraisier

, Cabanac

, Pitarch

, Besançon

, Boughanem

, Stance classification through proximity-based community detection, in Proceedings of the 29th on Hypertext and Social Media, 2018, pp. 220–228.

59.

Rajendran

, Chitturi

and Poornachandran

P.J.P.c.s.

, Stance-in-depth deep neural approach to stance classification, 132 (2018), 1646–1653.

60.

Zhou

, Cristea

A.I.

and Shi

, Connecting targets to tweets: Semantic attention-based model for target-specific stance detection, in International Conference on Web Information Systems Engineering, 2017, pp. 18–32: Springer.

61.

Wei

, Mao

and Zeng

, Atarget-guided neural memory model for stance detection in Twitter, in 2018 International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8: IEEE.

62.

Swami

, Khandelwal

, Singh

, Akhtar

S.S.

and Shrivastava

M.J.a.p.a.

, An English-Hindi code-mixed corpus: Stance annotation and baseline system, 2018.

63.

Sen

, Sinha

, Mannarswamy

and Roy

, Stance classification of multi-perspective consumer health information, in Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, 2018, pp. 273–281.

64.

Hercig

, Krejzl

, Hourová

, Steinberger

and Lenc

, Detecting Stance in Czech News Commentaries, in ITAT, 2017, pp. 176–180.

65.

HaCohen-Kerner

, Ido

and Ya’akobov

, Stance classification of tweets using skip char ngrams, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2017, pp. 266–278: Springer.

66.

Küçük

and Can

, Stance Detection: A Survey, J ACM Comput Surv 53(1) (2020), Article 12.

67.

Augenstein

, Rocktäschel

, Vlachos

and Bontcheva

K.J.a.p.a.

, Stance detection with bidirectional conditional encoding, 2016.

68.

Sun

, Wang

, Li

, Zhu

and Zhou

G.J.F.o.C.S.

, Stance detection via sentiment information and neural network model, 13(1) (2019), 127–138.

69.

Mavrin

, Statistical Modeling Of Stance Detection, 2017.

70.

Sun

, Wang

, Zhu

and Zhou

, Stance detection with hierarchical attention network, in Proceedings of the 27th international conference on computational linguistics, 2018, pp. 2399–2409.

SemEval2016			SemEval2017
Techniques	Acc(%)	Ref.	Techniques	Accuracy	Ref.
SVM with (sentiment+semantic) features	74.44%	[32]	SVM (content+pos+n-gram)	78.0%	[35]
Bi-LSTM (word embedding)	70.03%	[36]	Bi-LSTM (vectors, lexicons, content)	78.4%	[34]
SVM (N-gram)	67.82%	[31]	SVM (BoW+sentiment)	70.9%	[33]
Proposed model	75.11%		Proposed Model	80.10%

Stance detection using diverse feature sets based on machine learning techniques

Abstract

Keywords

1 Introduction

2.1 The proposed framework

2.3.1 Baseline features

2.4 Feature selection

2.4.1 Information gain (IG)

3.1 Datasets

3.1.1 SemEval2016

Table 4 Statistics of SemEval2016 & SemEval2017 Dataset Total tweets Training tweets Testing tweets SemEval2016 68197 16346 51851 SemEval2017 62617 50,333 12,284

3.2 Conventional machine learning techniques

3.2.1 Naïve Bayes (NB)

3.3.1 AdaBoost

3.4.1 Deep Belief Networks (DBN)

3.5.1 Accuracy

4 Results and discussions

4.1 Single features set based classification

4.2 Combination of features set based classification

4.3 Discussion of optimal features set

5.1 Categorization of existing studies of stance classification

5.1.1 Feature-based approaches

5.3 Deep learning approaches

6 Conclusion

References

Table 4
Statistics of SemEval2016 & SemEval2017

Dataset Total tweets Training tweets Testing tweets

SemEval2016 68197 16346 51851

SemEval2017 62617 50,333 12,284