Abstract
A wide variety of social media platforms have become integral to contemporary forms of social engagement, including mass protests. Twitter is considered specifically indicative of public attitudes in this regard. This study attempts to examine the feasibility of using Twitter sentiment analysis to predict the 2014 revolution in Ukraine. Tweets representing public opinion are clustered by means of the ‘StreamKM++’ algorithm into three classes (likely, neutral and unlikely). The resulting prediction model for the three classes (using Naïve Bayes) was 96.75 per cent. As such, this study offers a promising way to perform an online prediction of social movements.
Introduction
With the recent technological advances in peer-to-peer communication tools, a wide variety of social media platforms such as Twitter have provided an exceptional opportunity for people to communicate thoughts and ideas. This is because these platforms offer the necessary social-oriented features that give users a pleasant social interaction experience (Lee and Cheung, 2014). Currently, opinions and expressions (in the form of tweets) about certain issues are being regarded as news that can be used to construct a meaningful understanding of the overall perception of social and political reality (Weeks et al., 2017). According to Mohammad and Kiritchenko (2018), tweets are self-contained, public posts and tend to be rich in emotions, which eventually can be used to signify different dimensions of support for certain political groups (Bora, 2012; Jorgensen et al., 2018).
Social activists, political leaders and celebrities express their opinions via social media. Twitter has become widely known as a tool of communication by politicians, while Facebook grew as a resource for social mobilization. The Arab Spring is famously known as a ‘Twitter revolution’, while the ‘Euromaidan’ protests in Ukraine began with a post on Facebook. Therefore, not only is social media important for monitoring public opinion about certain issues in general (Boulianne, 2015; Perrin, 2015), but they may also become a powerful tool of social mobilization. Therefore, social media analysis has grown to be associated not only with tracking public opinion but also with the potential to predict specific outcomes by its relationship to public opinion diversity (Asur and Huberman, 2010).
In this article, an investigation of Twitter sentiment analysis for predicting ‘democratic revolutions’ was conducted in order to provide an accurate and rapid method for the determination of public reactions to political developments, especially changes. Precisely, this study examined the feasibility of using data streaming algorithms, such as StreamKM, to construct a meaningful clustering solution of tweets for the prediction of public protests in the Ukraine. It is assumed that outcomes from this work would provide some insights into the development of the Ukraine’s Maidan Revolution, which began with the Euromaidan protests on 21 November 2013 and peaked with the Revolution of Dignity during 18–23 February 2014. This study demonstrates how likeliness and intensity of protests can be evaluated by means of social media analysis.
Twitter sentiment analysis studies
Sentiment analysis has carved its own niche in research on social opinion and public preferences. Indeed some research on sentiment analysis remains highly technical, and it is oriented towards exploring the opportunities and limitations of this methodological tool for political analysis (Haselmayer and Jenny, 2017). Yet, sentiment analysis is considered a highly reliable method of analysis, which has been reflected in its broad scope of application across various contexts and for different purposes. In fact, the majority of studies that apply sentiment analysis still refer to the analysis of media and news (Burscher et al., 2016; Riff et al., 2014) as well as investigations of public preferences and knowledge about certain issues (Ceron et al., 2014). The latter is especially crucial for monitoring electoral campaigns (Ceron et al., 2015; Himelboim et al., 2016) and studying the communication flow between public officials and people (Zavattaro et al., 2015).
Sentiment analysis has also been applied to the case of Ukraine in a very limited manner. For example, Etling (2014) analysed attitudes towards the Euromaidan protests as expressed in tweets in various languages. In turn, Romenskyy et al. (2018) studied Ukraine’s political polarization across the East and the West, and also analyzed Twitter data. Their research reflected on the polarization of opinion caused by the war in Donbass that had already begun at the time of the analysis. Yet, both cases concentrated on depicting the situation rather than incorporating any prediction element to their analysis.
Meanwhile, the majority of previous research concentrates on the importance of social media for media analysis (Bobichev et al., 2017) or studying public opinion. This includes the use of social media by specific right-wing political parties in Ukraine (Doroshenko et al., 2018). Furthermore, the problem of analyzing non-English tweets is the most debated in literature dealing with Ukraine or any sort of media analysis of Russian or Ukrainian media (Medagoda et al., 2013; Steinberger et al., 2011; Watanabe, 2017). The latter is especially important due to the ‘mirror’ logic of analysis that refers to the Russian-Ukrainian conflict that erupted after the Maidan Revolution.
Whereas this study distances itself from the sentimental portrayal of the Maidan Revolution and also from the debate on the technical details of how to improve Russian-Ukrainian dictionaries for the Application Programming Interface (API) search, it is innovative in terms of the potential to predict mass protests or other relevant responses to political change through the analysis of social media rather than monitoring the contents of social media debates.
Method
Data for this study were obtained from Twitter (multi-domain) using the Twitter API. The data were based on the following: (1) a timeline of political events in the Ukraine that occurred between October 2013 and February 2014 and (2) a corpus of 1,986,240 tweets published by Twitter users during that time period and temporally distributed as shown in Figure 1. During the pre-processing and transformation phase, all the extracted tweets were tokenized in order to eliminate possible duplicates, hashtags and URLs from the text (Figure 2). The 214 standard stopwords, including highly common verb forms, were also removed from all the extracted tweets. It consists of a list of English stopwords to sieve semantically useless words (from tokens) beginning or ending with a stopword.

Distribution of tweets.

Example of data collection, prepossessing and labelling.
The Snowball stemmer was applied to get word stem. This results in a subset of 320,711 normalized tweets. Then we assessed the population’s aggregate perception of revolution in Ukraine using three classes: likely, neutral and non-likely. This was achieved during the analysis phase where an online cluster scheme called ‘StreamKM++’ was used to produce these classes. StreamKM++ computes a small weighted sample of Twitter data and uses the k-means++ algorithm to categorize the data using the coreset constructions stored in the coreset tree. A Massive Online Analysis (MOA) system by Bifet et al. (2010) in conjunction with the Waikato Environment for Knowledge Analysis (Weka) was used to implement the clustering and classification processes. In order to determine the best performing classifier for the case of this study, we tested several learning algorithms such as Naïve Bayes, DecisionStump, Perceptron and SPegasos. The Kappa statistic was mainly used to evaluate the performance of these algorithms.
Evaluation
To ensure the interrater reliability of the chosen classifiers, we used the Kappa statistic (McHugh, 2012). Based on Cohen (1960), the optimal case of the Kappa statistic is 1 (perfect agreement), and the formula of the Kappa statistic is represented as follows
where

The Kappa statistic result for the three classes.
A confusion matrix was also used to evaluate the performance of the Naïve Bayes algorithm in predicting tweets into the three classes (likely, neutral and unlikely) as shown in Figure 4. Given a class, Cj, and an example instance, Ei, that instance may or may not belong to that class, and its actual membership may or may not exist in that class. However, with two classes, four possible outcomes can be formed with the classification. True positives (hits), false positives (false alarms), true negatives (correct rejections) and false negatives. Therefore, for a learning algorithm to have an acceptable level of accuracy, each of the instances should be placed within the partition displayed along the confusion matrix diagonal.

Confusion matrix results for the three classes.
Finally, the accuracy measurement was used to identify the percentage level of the correctly classified instances resulting in each classification process of each class. The overall performance of the prediction model for the three classes (using Naïve Bayes) was 96.75 per cent. Based on these results, it is reasonable to say that Twitter sentiment analysis can provide a promising tool to predict social movements on the web, which may mobilize people into political and social actions.
Conclusion
The results revealed that Naïve Bayes was the best classifier for predicting a stream of tweets online. Consequently, the prediction model was able to perform an online prediction of public opinion about the possible revolution in each tweet. The distribution of tweets as shown in Figure 1 reveals that prior to the end of December, there was no significant increase in social interaction towards the ‘revolution theme’ in Twitter, even when protests were already happening. Furthermore, social media primarily responded to and reflected on the efforts of the opposition to gather crowds or participate in the political process (the movement that increased social interactions among people). Whereas the use of sentiment analysis does not demonstrate how social media gather the crowds, it certainly can be used to map the importance of certain events for surging such protests. As such, it can be anticipated that inclusion of the opposition in the political process has accelerated the ‘social media revolution’ and, potentially, the actual ‘democratic revolution’. Nevertheless, these conclusions require a more rigorous and content-based research.
It is worth mentioning that this study was limited to English tweets, meaning they represent the opinion of only part of the Ukrainian population. In addition, this study was limited to the use of a certain data streaming algorithm; other methods can be investigated in future studies. Finally, this study demonstrated that revolutionary events may be predicted by using sentiment analysis, and this may be done not only retrospectively but also with regard to already occurring protests.
Footnotes
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
