Italian sentiment analysis on climate change: Emerging patterns from 2016 to today

Abstract

The debate on climate change has increasingly attracted attention, especially among young people, since the foundation of the movement Friday for Future and the raising fame of Greta Thunberg.

Social media websites can be used as a data source for mining public opinion on a variety of subjects including climate change. Twitter, in particular, allows for the evaluation of public opinion across time. Although it is a known problem that Twitter population is biased with respect to the whole population, it is also true that Twitter users are more likely to be young people. For this reason, the sentiment analysis of Twitter textual data on climate topics provides valuable insights into the climate discussion and could be considered as representative of the rising climate movement.

In this study, a large dataset of Italian tweets between 2016 and 2022 containing a set of keywords related to climate change (e.g. Global warming, sustainable development, etc.) is analysed using volume analysis and text mining techniques such as topic modelling and sentiment analysis.

Topic modelling, performed using word embedding, allows validating the keywords’ set and providing the prevalent discussion in Italy about the climate agenda and the major concerns related to climate emergency.

Both daily volume and sentiment of tweets series have been analysed. The first series allows assessing the Italian participation to the climate debate, while the latter provides useful insights on the overall evolving mood during these years. In particular, we show that the major Italian concerns are related with global warming with a negative mood while a positive mood is recorded when public policies on environment are implemented.

Keywords

Sentiment analysis climate change text mining Twitter word embeddings

1. Introduction

The use of Big Data for official statistical purposes represents a growing field of interest and many statistical offices provide new experimental statistics. The use of Big Data allows to integrate official estimates and to produce new indicators complementary to official statistics estimates. The interest on climate change, more precisely the climate emergency we are currently witnessing, is recent and official statistics is trying to measure these new phenomena by integrating different surveys, such as survey of weather-climate and hydrological data, with other statistical indicators on pollution or deforestation.

Twitter, as a social media, is the most popular platform to share breaking news, individual experience, and opinions about current events.

It is a fact that the debate about climate change on social media became very intense since the foundation of Greta Thunberg movement especially among young people. Indeed, in February 2020, Thunberg had 4 million followers on Twitter and 9.7 million followers on Instagram. She became so popular that [1] published a study about the perception and sentiments of Twitter users about Greta, where they also try to characterize US and Swedish users following Greta. Indeed, the movement on climate change and strike was advertised on Twitter and many other social media. Another study [2] focuses on English tweets and applies Latent Dirichlet Allocation (LDA) for topic modelling to infer the different topics of discussion, and Valence Aware Dictionary and Sentiment Reasoner to evaluate sentiment analysis. A similar study [3] focuses on topic modelling analysis via LDA on Twitter posts of US federal government science agencies about climate change. Finally, [4] specifically focuses on climate global warming Twitter data: topic analysis is carried out by standard LDA and sentiment analysis is computed by a word-list approach National Research Council Word-Emotion Association Lexicon,1 to track the emotions of people (joy, anger or fear).

All the afore-mentioned approaches base the data collection process on tweets containing a specific set of keywords, then undergo a specific text pre-processing, finally computing the sentiment score. In general terms the latter can be estimated using two popular approaches: lexicon-based sentiment analysis based on wordlists, weighted in the form of scores, and sentiment analysis based on machine learning.

Cody et al. [5] is amongst the first studies on Twitter and climate topics. In this research, they analyse a set of tweets, collected between September 2008 and July 2014 and containing the word “climate” Using a sentiment measurement tool called the Hedonometer2 (not available for Italian), they determine how collective sentiment varies in response to climate change news, events, and natural disasters.

The present study differs from the cited works because a Word Embedding (WE) approach is applied, and the sentiment analysis is lexicon-based.

Concerning the population of Italian Twitter users, estimates at national level [5] quantify in 11.2 million the Italian Twitter users on a population of 60 million in 2021. Other estimates calculate that the users represent around 11% of the total population and 24% of 14–29 years old of users [6]. For this reason, it is clear that students and young population are over-represented in Twitter. Nevertheless, since the interest on climate movement and agenda is prevalent among young people, analysing the Italian mood on such a subject may be considered as relevant. However inference remains difficult, indeed a recent study [7, 8] shows that males are overrepresented among the Twitter users and in larger cities areas. The territorial distribution seems to be quite similar to that of the total population.

Istat collects since 2016 tweets using two specific filters: one is used to publish the experimental statistics Social Mood on Economy Index [9]; the second (Istat-filter) has not yet being used to provide any other experimental statistics. This filter involves 278 keywords, that have been derived from the themes available on Istat’s online data warehouse [10].

Messages sampled through Istat-filter are meant to represent a small-scale model of the overall population of messages which are potentially relevant in the Official Statistics perspective. This allows to develop new indicators, such the present study on climate change social perceptions.

The scope of this index is to symmetrically measures feelings related temperatures (hot/cold), opposite weather events (rain or drought), environment, pollution, carbon emissions but also policies related to sustainable development, renewable energies, waste recycling so to record both positive and negative events.

The paper is outlined as follows: in Section 2 an overview of the research activities in the field of Sentiment Analysis is provided, in Section 3, the authors illustrate how the daily sentiment index has been built, and the WordEmBox i.e., a web application that allows to query and visualize WE models, used to ensure the relevance of sampled tweets and for topic analysis purposes [11]. In Section 4, first the relevance and validation of filter’s keywords, obtained by a data driven analysis prior to the growth of the climate change movement (2016–2017) is explained. Then both the daily sentiment series and the volumes of tweets captured the filter the are commented, by analysing both dynamics. Volume analysis enables to highlight the Italian participation to global strike and Greta Thunberg movement. Sentiment analysis shows that the overall discussion is negative, mainly in case of extreme weather events in Summer. Finally, a focus is carried out with the aim of better understanding Twitter conversations and how events may lead to changes on the mood, by means of an ad-hoc word embedding model.

2. Background and related works

Sentiment analysis (or opinion mining) is a natural language processing technique aiming at identifying people’s attitudes, whether positive, neutral, or negative, based on statements or text they have produced, e.g., tweets. In recent years, sentiment analysis has attracted great attention both in academic and industrial contexts because of its broad range of applications, e.g., brand and product sentiment monitoring based on customer feedback, customer satisfaction and market research [12]. The main focus of sentiment analysis (SA) is the estimate of the polarity of a text (positive, negative, neutral), but it further allows to detect specific feelings and emotions (angry, happy, sad, etc.), urgency (urgent, not urgent) and even intentions (interested v. not interested). In general terms SA tasks aim at labelling people’s opinions in different categories such as positive and negative (or relevant and out of scope, subjective vs objective) from a given piece of text (e.g., a tweet or an article). Classifying these documents, is an arduous task. In recent years many methods, techniques and tools have been implemented to increase the accuracy of SA in different tasks at different levels. The techniques used to classify opinions can be grouped as follows: i) machine learning approaches, based on Natural Language Processing (NLP); ii) lexicon-based approaches based on lexical resources; iii) hybrid approaches.

In the context of machine learning, supervised machine learning approaches, have a well-known limitation related to the sensitivity of the models to the quantity and quality of the training datasets, indeed the predictions may be very inaccurate when training data are biased or insufficient. Semi-Supervised learning (SSL) models drive from either supervised or unsupervised methods. In contrast with supervised learning, which learns only from labelled data, SSL learns from both labelled and unlabelled data. This is motivated by the lack of labelled data in real world applications. The main idea behind SSL is that, although unlabelled data hold no information about classes, they do contain information about joint distribution over classification features. Therefore, the use of SSL with unlabelled data can increase the accuracy of the predictions over supervised learning. In lexicon-based approaches words in texts are labeled as positive or negative (and sometimes as neutral) with the help of a so-called valence dictionary. Finding the proper dictionary can be very challenging (e.g., in Italian very few dictionaries are available).

In the following, several state-of-the-art artificial intelligence techniques are described, with special attention to the Italian case, as most of above-mentioned methods are primarily developed for English texts.

The first and most common approaches used in the field of sentiment analysis are based on features like unigrams, Part Of Speech (POS) tags and term position [13], opinion words and sentences [14], negations [15] and syntactic dependencies [16]. Some approaches have shown effective performance in text categorization, such as Support Vector Machine (SVM) [17], Multinomial Naïve Bayes (MNB) [18] and Maximum Entropy (ME) classifiers and derived ensembles [16, 19], even if their classification performances are limited by the high training costs due to the need for a broad vocabulary, i.e., more words from which more features can be extracted [20] to be used in conjunction with machine learning algorithms for sentiment classification.

With respect to lexicon-based methods, Italian polarity lexicons that deserve to be mentioned are Sentiment Italian Lexicon [21], also known as Sentix;3 SentIta [22] the lexicon of the University of Bologna [23]; the CELI Sentiment Lexicon [24]; the Distributional Polarity Lexicon [25] and SenticNet [26]. The advantage of using sentiment polarity lexicons, such as DPL and Sentix is that, compared to simpler lexicon-based approaches, they do not assign a binary value, indicating if lemmas are either positive or negative, but also indicate the degree of polarisation of lemmas.

2.1 Word embedding

Word Embedding (WE) is a technique that maps textual tokens, e.g., words, into dense and low-dimensional vector space, built on large unlabelled corpora, where each token is related to other tokens in its context. Methods to generate this mapping include neural networks, dimensionality reduction on the word co-occurrence matrix, probabilistic models. Word2Vec, proposed by Mikolov et al. [27], was the first technique used for word embedding based on neural networks. Word2Vec can adopt two different models: the skip-gram model or the Common Bag Of Words (CBOW) model. While the latter predicts a word based on the words within the surrounding context, the former predicts the words within the surrounding context starting from the current word. These models map words into vectors that are closer when words are similar and often close together. TheWord2Vec approach gained a huge resonance in the scientific context, and it is still used in several areas in combination with deep neural networks, such as health care [28].

Pennington et al. [29] proposed another model Global Vectors (GloVe), that generate the vector encoding of a word faster thanWord2Vec, because the implementation can be parallelized when launched on a greater amount of data. Moreover, Cao and Rei [30] proposed a novel approach named char2vec based on the representation of characters instead of words.

FastText is an extension of Word2Vec proposed by Facebook in 2016 [31]. Instead of feeding individual words into the Neural Network, FastText breaks words into several n-grams (sub-words). It allows to handle properly rare words since it is highly likely that some of their n-grams also appears in other words.

In the field of sentiment analysis, several WE models were proposed, based on the prior knowledge acquired both through opinionated words from sentiment lexicons and available sentiment labels. Li et al. [32] proposed a new method for learning word embedding for sentiment analysis based on prior knowledge, which improved the results in comparison with standard WE.

2.2 Deep neural networks

Deep Neural Networks (DNNs) are Artificial Neural Networks (ANNs) that present multiple hidden layers between input and output and exist in a plethora of different architectures depending on the topology of neurons and their connections; The architectures that have brought remarkable results over the years in the research context are [33]: Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), up to language models based on transformers [37]. A detailed description of such models is out of the scope of the paper, nevertheless a brief description of the methods and main applications of DNNs can be found in [37, 38].

The most recent works proposed language models specifically pre-trained on tweet corpora: Thakkar and Pinnis [34] achieved encouraging performance leveraging a time balanced evaluation set for sentiment analysis on Latvian tweets, comparing several BERT-based architectures, and Nguyen et al. [14] presented BERTweet, the first public large scale pre-trained language model for English tweets.

With respect to Italian Bert Models, the University of Bari developed a brand new model for Italian especially devoted to Twitter texts and social media, AlBerto trained by using 200 millions tweets from 2012 to 2015 [35]. Only the uncased model is available to the community. Due to the specific training Alberto requires a particular pre-processing step for replacing hashtags, urls, etc. that alter the official tokenisation, therefore it is not really applicable to word-based classification tasks in general texts; thus, it will be used on twitter or social media data. GilBERTois a rather new CamemBERT Italian model trained by using the huge Italian Web corpus section of the OSCAR [35] corpus project consisting of more than 11 billions of tokens. Also for GilBERTo it is available only the uncased model.

The more recent model developed explicitly for Italian, as far as we know, is UmBERTo. As well as GilBERTo, it has been trained by using OSCAR, but the produced model, differently from GilBERTo, is cased.

Recently, 2021, a sentiment classifier feel-it based on UmBerto was proposed [46] and a new manually annotated data-set released where posts were annotated with four basic emotions: anger, fear, joy, sadness. By collapsing them, also a sentiment sentiment classifier of positivity vs negativity is available. All these models are available on github.

3. Methods

In the following we briefly describe: i) the methodology used to compute the sentiment index; ii) the scoring procedure; iii) the word embedding technique used both for topic analysis and for validating filter’s keywords.

3.1 Twitter social mood index on climate change

For the calculation of the Twitter Social Mood on Climate Change Index (in the following SMCCI) we adopt an unsupervised lexicon-based approach. Such approach is currently used in Istat to calculate the Social Mood on Economy Index [9, 40].

The daily pipeline, sketched in Fig. 1, consists of three fundamental steps: (i) collection and pre-processing (text cleaning) of the daily sample of public tweets related to climate change; (ii) estimate of the sentiment score of each tweet in the sample; (iii) calculation of the daily value of the index as synthesis of the sentiment values of entire sample.

Step (i): Daily sample data collection

Figure 1.

Daily pipeline for the calculation of SMCCI index.

Twitter’s streaming API allows downloading, for research purposes, samples of tweets in real time. Istat collects samples of tweets since 2016 with the aim of producing innovative indicators of interest for official statistics. The data collection procedure uses a filter composed of 278 words borrowed from Istat’s online data warehouse that represent all the possible relevant themes for official statistics purposes. (Istat-filter). For privacy reasons, user data is not saved and only data related to the tweet (as text and date) are stored in the archive. To extract tweets related to climate change, we apply a second level filter (CC-filter), containing lemmas tailored to the topic of interest. To create the CC-filter, we begin by selecting a subset of Istat-filter’s terms referred to climate events, ecologic transition, alternative energy sources and similar (CC-filter-preliminar). Then we integrate this preliminary list with a set of terms strictly related to the specific goal of the index, these terms may not be present in the first level filter. Such list, composed of 48 lemmas, is used to extract from Istat’s tweets store a dataset of about 20 million of tweets (11,000 per day in period 2016–2021). This wide dataset is further used to verify the relevance of each of the 48 keywords in the CC-filter through word embedding techniques, described in the following. The result of this refining phase is a new list of 78 keywords that is the second level filer of SMCCI, i.e., the CC-filter used to compute the index.

In this work, we present the index calculated in the period between October 1 ${}^{\text{st}}$ , 2016, and October 31 ${}^{\text{th}}$ , 2022. The average number of tweets selected in this period using CC-filter is about 7,000 per day, the total number of tweets in the period is 16 million.

Step (ii): Sentiment score estimate

To identify tweets’ sentiment, we adopt a lexicon-based approach. Amongst the few Italian resources available we use an integrated version built from the two distinct Distribution Polarity Lexicons (DPL), which turned out perform better results in the Social Mood on Economy Index and consequently adopted in [41].

Table 1

Examples of DPL sentiment lexicon scores

Italian_word	English	Positivity	Negativity	Neutrality
bello	handsome/nice	0.57	0.2	0.23
cattivo	bad	0.24	0.63	0.13
grandioso	great	0.70	0.08	0.22
infelice	unhappy	0.22	0.56	0.22
punto di vista	point of view	0.61	0.1	0.29
uguale	equal/same	0.26	0.23	0.51

The scoring of each term is obtained by means of a Support Vector Machine classifier based on wide dataset of messages posted in Italian social media during 2016. The methodology is described in [42]. Castellucci et al. use a sentiment transfer from sentences to words to derive a large-scale polarity lexicon, based on Distributional Models of lexical semantics. Given a set of sentences annotated with polarity, the classifier transfers the sentiment information from sentences to words. The set of annotated examples is derived from Twitter and the polarity is assigned to sentences by simple heuristics. The approach is mostly unsupervised.

In practical terms, three scores measure the sentiment of a word or expressions: positivity (Pos), negativity (Neg) and neutrality (Neu). For each entry of the vocabulary holds true: Pos $+$ Neg $+$ Neu $=$ 1, therefore the score represents a probability. More in detail, we obtain the scores of a tweet by the average of their text’s matched words. Table 1 shows a few examples of words’ annotations in our lexicon.

Step (iii): Calculation of the daily index

Figure 2.

Cartesian and polar coordinates adopted by sentix: $n$ is the negative score, $p$ the positive score, $i$ the intensity and $\theta$ is the polarity (http://valeriobasile.github.io/twita/sentix.html).

Similarly, to what proposed in [9], we consider polar coordinates (intensity i and polarity [omega]) instead of Cartesian coordinates for each tweet. The following relationship holds true, as shown in Fig. 2:

$\displaystyle\omega=1-4\text{arctg}\left(\frac{\text{Neg}}{\text{Post}}\right)% \bigg{/}\pi$ $\displaystyle i=\sqrt{\text{Neg}^{2}+\text{Pos}^{2}}$

We neglect the neutral component.

Polarity $\omega$ (which is a linear transformation of [theta]) can assume values between $-$ 1 and 1; where $-$ 1 indicates a total negativity of the text while 1 indicates a total positivity. Intensity $i$ can assume values between 0 and 1; where 0 indicates a total neutrality of text and a value close to 1 indicates a low neutrality score.

The daily value of the index is an appropriate central tendency measure, i.e. it is a weighted mean of the polarity. The exact formula we apply is the following:

$\displaystyle\textit{SMCCI}=100*\frac{\sum_{t}i_{t}\omega_{t}}{\sum_{t}i_{t}}$ (1)

In theory, the SMCCI index can assume any value between $-$ 100 and 100 but in practice, only in exceptional cases the absolute value of the index is greater than 10. When this happens, it is often due to an off-topic contamination, for example a viral tweet that accidentally passed the filter.

In order to make the index robust against these possible contaminations, we developed a “surveillance system” which searches for anomalous values in the daily time series by means of an outlier detection routine. We carefully verify all these outlier points before admitting them into the index series.

3.2 The WordEmBox and WE models

The Word Embeddings have shown as a powerful tool to extract word representations from wholly unstructured set of textual messages by an unsupervised Machine Learning model [13]. The underlying insight of words’ vector representations is the “distributional hypothesis”: “You shall know a word by the neighbours it keeps”. The output of WE is a words vector space that, in general, captures syntactic and semantic regularities within language patterns. Every relationship appears as a relation-specific vector offset enabling vector-oriented reasoning.

These classes of models share some common characteristics, namely the input space, the output space, and the architecture of the neural network. The available neural network architectures are Skip-Gram or CBOW. The latter is generally more suitable for short unstructured text and is therefore the chosen architecture for our word-embedding simulations. These models require a fine-tuning of several hyper-parameters to enhance the quality of the output model and it is not scope of the present paper to provide a detailed description.

Nevertheless, the main hyper-parameters are: (i) embedding space dimension: the vector space size to which the words of the corpus are mapped; (ii) window size: the width of the sliding window defining context’s size; (iii) iterations: the number of neural network’s training epochs. As both embedding space dimension and the number of iterations increase, the accuracy of the word embedding model is expected to increase. On the contrary, the window size reaches its maximum value depending on the analysed corpus, while for higher or shorter values the accuracy decreases.

3.2.1 WordEmBox

WordEmBox is an open-source tool developed by Istat, with the aim of providing a set of functionalities that allow exploring Word Embedding (WE) vector spaces [11]. WordEmBox is compatible with models generated by WordToVec and FastText. These models produce context independent word representations, i.e., each word has a unique vector representation (embedding), combining all the different senses of the word.

The WordEmBox provides a set of functionalities, based on WE models standard outputs:

affinity, for a given word it provides a list of the most affine words, i.e. the words which are closer in the vectorial output space according to the cosine distance;

analogy, which allows to solve similarities between pairs of words, in particular given three words, where one pair has a semantic relationship such as Italy, Rome, for a chosen country it should provide its Capital;

graph analysis, a peculiar WordEmBox function, in which, starting from a root seed word, the list of the most affine $n$ words is displayed in a graph. Then this operation is applied recursively to each node for a user-defined number of iterations.

In our simulations we mainly use the ‘graph analysis’ functionality, that allows to explore the surrounding of both the root node and the iteration nodes. This graph allows to investigate topics related to the seed/root word, e.g., filter keywords, and to create closed topic clusters, i.e., as the number of iterations increases the number of connected nodes saturates.

3.2.2 WE model performance

The optimality of WE models can be addressed in terms of the robustness of their results as well as their validity. More specifically, WE performances are evaluated in terms of: (i) stability, i.e., measurement of WE models robustness using intrinsic evaluation based on word similarity (affinity) by varying a set of hyper-parameters; (ii) validity, i.e., testing the optimality of WE models using extrinsic evaluation based on word analogies by developing a set of “ad-hoc” analogies for research purposes. There are few studies in literature which address the stability of a WE models [47, 48].

Given two WE models E1, E2, a list of word in the vocabulary, where $V_{s}$ is the vocabulary size, a list of $K$ most affine words per word ( $w$ ), $K$ -nearest neighbors (KNNE), we can evaluate the stability per word, as in [48], such as:

$\displaystyle\textit{Stability}(\omega)=\frac{\#\left({\left(\textit{KNNE}_{1}% \right)\cap\left(\textit{KNNE}_{2}\right)}\right)}{K}$

Then the stability between two models is defined as:

$\displaystyle\textit{Stability}(E_{1}\cap E_{2})=\frac{\sum_{i=1\ldots V_{s}}% \textit{stability}(w_{i})}{V_{s}}$

In the present study, as the major purpose is to validate the filter and to carry out an implicit topic modelling on a set of keywords, we disregarded an extensive evaluation on all the hyper-parameters. A detailed analysis is carried out in [47, 48]. Both studies show that the nearest-neighbour distances are highly sensitive to small changes in the training corpus for a variety of algorithms. For all methods, the inclusion of specific documents in the training set can cause substantial variations in the output space. However, in [48] they show that, given a corpus, after 20 iterations all the models converge to a stability value and increasing the number of iterations does not provide any enhancement. They also show that the optimality of the window size depends upon the text under examination.

In the following simulations our WE models are generated using Word2Vec [43] applied to a subset of Istat-filter tweets, sampled using both CC-filter-preliminar and CC-filter of a certain period.

With respect to the specific hyper-parameters used when training our models, we have selected:

CBOW architecture, which is more suitable for the short informal nature of our corpus of tweets. In addition to that, when compared to Skip-gram it provided analogies that are more consistent.

Number of iterations equal to 20, which should ensure a robust convergence of the model.

The output vector dimension was set to 200.

The reference chosen window was 8, because again it was intrinsically validated by means of analogies.

Nevertheless, in Section 4.2, we evaluate extensively stability between models, according to the following criteria: architecture, and window size.

4. Results

4.1 Relevance of the CC filter

The validation of the filter is a two-step procedure: i) a top-down approach where experts chose a preliminary list of keywords; ii) a bottom up approach where a data-driven analysis through WE graphs, permitted to disregard some terms or to enrich the filter with correlated terms which were initially not set.

Before analysing the text of tweets through WE, we clean and pre-process the data. We perform several steps in creating our training corpus by removing stop words, removing URLs, punctuation, and converting all words to lower case.

Starting from the initial set of words of the Istat filter, a subset of words connected to the environment has been selected. Then, we added some words not contained in the Istat filter, such as climate or climatic and specific climate events such as rain or grown. In this way, we extracted from the collected base of tweets those pertinent to the theme under analysis. We decided to skip climatic catastrophes such as earthquakes or volcanoes in order not to record too many negative events. Then we trained the WE model on a corpus consisting of 2016–2017 tweets’, in order to validate our set of keywords. We decided to use this time interval because we wanted to analyse semantic relations prior to the raise of the global climate movement since 2018, because we did not want to risk having distorted representations. In order to avoid any bias in the sentiment index we decided not to include specific words or expressions related to the climate movement, such as “Friday for future” or “global strike”. However, words such as “sustainable development” or “global warming” were included. We then tested words with multiple meanings, such as environment, energy. It turned out that for instance energy, see Fig. 3 on the right, is linked as expected to elettrico (electric), idrico (hydric), but also to adrenalina (hadrenalin) and in general to the field of emotions. For these reasons, we decided to decline the word energy into composite expressions such us “electric energy” and so on. In some cases, this analysis lead to add some words such as “green”, see Fig. 3, because while “green” is linked to words related to environment, the Italian word, verde (green) showed no good semantic relationships.

Figure 3.

WE graph of the word green on the left and of the word energia (energy) on the right.

In some other cases this data driven analysis leads to add some words in the filter which were not previously taken into account. For instance, deforestification appeared to be strongly affine to environment. A final remark about the main sources of contaminations over time, induced by new words and their popularity. Since summer 2021, the word green was mainly associated to “green pass”, in order to avoid any possible bias tweets containing the expression “green pass” were filtered out.

4.2 Robustness of WE models

As the size of corpus increases, usually WE models trained on large structured texts should increase their accuracy at least for frequent words. When WE models are trained on twitter texts, that contain many hashtags and neologisms, whose usage decay very rapidly, it is not evident how WE models may vary according to the choice of the hyper-parameters. Twitter may be seen as a large unstructured text full of sparse words. Moreover, it is not evident if the size of the corpus exceeds a certain value this may cause some convergence problems.

We then decided in order to validate results to extend the period of analysis of tweet for the whole period 2016-2021 of the tweet samples of the initial filter. We then analysed 20 million of tweets for a total of 634 million of tokens, roughly 10 times bigger than in [48].

We set our vocabulary as the intersection of words (thus excluding composite expressions) with the words of the cleaned corpus having at least 5 occurrences. Thus, the size turned out to be almost 21 thousand records. We then computed the stability and its standard deviation as reported in Table 2, for the whole vocabulary size and for the filter keywords. We computed stability for 25 $k$ -neighbours.

Table 2
Stability evaluation of WordToVec models

Stability		Vocabulary		Filter keyword
Architecture	Window size	Mean	SD	Mean	SD
CBOW	5 vs 8	0.44	0.22	0.6	0.14
CBOW vs skip-gram	5	0.21	0.18	0.24	0.17
CBOW vs skip-gram	8	0.45	0.23	0.6	0.12
Skip-gram	5 vs 8	0.46	0.22	0.6	0.12
CBOW	8 vs 10	0.57	0.21	0.71	0.1

Results are coherent with [48] and [47] where for WordtoVec they observe a mean stability of 45%. The highest stability value is obtained with cbow and between windows of size bigger than 8. It is worth to notice that the use of different architectures, when using a window of 5, lead to complete different results. However for our analysis purposes results, at least for our keywords are satisfactory as it can be seen for the word “climate” where the overall stability is 0.84 as shown in Table 3. Moreover, the time span permits to observe the social link of the word climate with all the conference on climate change, as well as climate changes, global warming and global strike for future. Most of the affinities words are represented by hashtags and are not words.

4.3 Volume series analysis and comparison with google trends

Table 3
List of the first 25 stable affinites obtained comparing E1 (CBOW window $=$ 5) vs E2 (CBOW window $=$ 8) for the word clima climate

Word_English	Word_original_tweet	Affinity_E1	Affinity_E2
parisagreement	#accordodiparigi	0.60	0.51
Conference of the Parties (climate change conference)	cop	0.60	0.56
2018 COP conference	katowice	0.58	0.47
#parisagreement	#parisagreement	0.58	0.51
climatechange	#cambiamentoclimatico	0.56	0.47
2016 COP conference	marrakech	0.54	0.51
United Nations Framework Convention on Climate Change	unfcc	0.54	0.51
#climatechange	#climatechange	0.54	0.48
climatechanges	#cambiamenticlimatici	0.53	0.49
globalwarming	#riscaldamentoglobale	0.52	0.43
#globalwarming	#globalwarming	0.51	0.51
2017 COP Conference	bonn	0.50	0.44
#climatecrisis	#climatecrisis	0.50	0.46
2021 COP Conference	glasgow	0.50	0.48
climatecrisis	#crisiclimatica	0.49	0.39
#stopclimatechange	#stopclimatechange	0.49	0.5
#climateaction	#climateaction	0.47	0.43
#italianclimate	#italianclimate	0.46	0.43
#globalstrikeforfuture	#globalstrikeforfuture	0.44	0.41
National Intragrated Plan for Energy and Climate	pniec	0.45	0.4
moreactionsforclimate	#piùazioniperilclima	0.44	0.4

Figure 4.

Volume daily series of SMCCI.

By the use of the CC filter in our tweets database for the period between October 1 ${}^{\text{st}}$ 2016 and December 31 ${}^{\text{st}}$ 2021, we obtain a set of 16 million of tweets. The daily series of volume is shown in Fig. 4, the average value is 7,200 tweets per day, but such value is not stable through the period. Our analysis shows that on average the volume of tweets linearly increased, and doubled over 2 years (2017–19), but since Covid the interest went back to 2017 levels, showing only in September 2021 an increase. More precisely, the average per day of volume is 4,000 in 2016, 5,000 per day in 2017, 7,300 in 2018, 9,300 in 2019, 6,200 in 2020, 6,800 in 2021 and 9,800 in 2022.

In addition to that, a level shift in the volume series is observed since November 2018, in occasion of the climate movement founded by Greta Thunberg.

The figure shows also the most relevant peaks of the series, which are related to specific relevant events. Ecological crisis like the waste emergency in Rome in May 9 ${}^{\text{th}}$ 2017 and the accusation of dumping toxic waste for the Aquarius migrant ship in November 20 ${}^{\text{th}}$ 2018. Public events against climate change like the Friday for future in March 15 ${}^{\text{th}}$ 2019, the Global climate strike in October 27 ${}^{\text{th}}$ 2019, the World Environment Day in June 5 ${}^{\text{th}}$ 2020 and the International Day of Clean Air for Blue Skies in September 7 ${}^{\text{th}}$ 2022.

Figure 5.

Comparison of the interest about “climate” recorded in google trends and in SMCCI.

Figure 6.

Social mood on climate change index (SMCCI).

In order to understand how our tweet volumes are able to grasp the overall internet interest about climate, which is in the filter and represent over the 30% of the sampled tweets, we decided to compare the time-series of normalized shares of tweets containing the word “climate” with google trends for climate in Italy. Google Trends allows analysing the popularity of top search queries in Google Search across various regions and languages (IT and Italian in our case). The website uses graphs to compare the search volume of different queries over time. This comparison was carried out for five years by analysing a weekly time-series as shown in Fig. 5, where for comparison reasons we computed the shares as the absolute number of tweets containing the word climate, divided by the maximum number of tweets containing the word climate over the considered time period.

Figure 7.

Comparison of the monthly moving average of the social mood on climate change index (SMCCI) with its preliminary index.

Figure 8.

WE graph of the word agenda.

These weekly 5 years’ series show a correlation of 0.7; hence, it suggests that the tweets are consistent with the importance of environmental issues in media and web researches. We disregarded minor expressions such as climate change or emergency, because they show a very low interest in google trends and selected among the set of keywords the word that has both the major interest in google trends and maximum absolute frequency in our index. Some differences may be due to the viral nature of the Twitter platform in exam.

4.4 Sentiment index analysis and their major concerns related to climate emergency

In Fig. 6 the SMCCI is shown, and the most relevant positive and negative peaks are commented. Major negative peaks are observed in August and are related to extreme high temperatures. In particular, the lowest value of the index is recorded when in August 2020 on the same day two workers died crushed by the heat in North Italy (Pordenone and Bologna). Positive peaks are observed when the Government either promotes measures for eco-sustainability (September 2021, Decree of Ministry of ecologic transition) or his commitment for environmental measures (29 ${}^{\text{th}}$ August 2019, Government Conte II). If we smoothen the daily index by applying a 30 days moving average (see Fig. 6), we can observe that the index starts decreasing in June, reaches a local minimum in August and goes back to May levels in late September. September 2021 is an exception probably because of the Global Strike organized in that period. The summer fall of 2020 is less evident probably due to the generic optimism related to slowdown of Covid containment measures and the positive environmental impact of lock-down measures (the index keeps increasing until May 2021).

In general, we can infer that extreme temperatures induce negative peaks (due to global warming it is more frequent the case of elevated temperatures).

4.5 Comparison of sentiment index within the two filters

As mentioned above in Section 4.1 it is important to highlight that the sets of tweets selected with the final version of filter are not simply sub-sets of those selected with the preliminary version. As a result, the two filters have an intersection but it is not evident the comparison of the index between the two filtering settings.

The correlation between the volume of the two versions is quite high (0.90), instead the correlation between the daily index value of the two versions is only 0.59. However, at least when dealing with additive outliers the choice of the filter has a non-trivial impact. The second version of the filter looks more reliable from the word embedding analysis and is the version that has a no false outliers, while in the previous filtering assumption the number of days “contaminated” by high presence of out-of-context tweets in the daily sample was eight. A rough quality indicator such as the correctness of truly outliers may be considered as a quality improvement at least in terms of relevance.

It is less clear the impact on moving averages as shown in Fig. 7 where a monthly moving average is shown. In this smoothened series, we observe that the dynamics since December 2017 to September 2020 are very similar, hence suggesting that the two filtering assumptions are quite robust with respect to the Climate change raising movement. The dynamics in 2017 and 2021 look different, due to the introduction of some keywords such as temperatures. The overall volatility of the second series is higher; however, the moving average series show a good correlation 0.78.

4.6 Discussion in Italy about the climate agenda

We decided to use tweets from 1st September to 2nd November 2021 because of the following reasons: i) we wanted to focus on recent data; ii) we observed an increase of volumes of tweets; iii) the level of the social mood was lower with respect to May 2021 which was anomalous. In total, we trained the WE model with 500.000 tweet. In this case, as the size of the corpus is much smaller representations tends to be more robust. As we can observe in Fig. 8, where the word agenda is plotted, one major link is to climate (clima).

The other two links are related to program (programma), which is related to mobility (mobilità) and development (sviluppo), and to politics (politica) in particular Draghi, Prime Minister, and Cingolani, Minister of the ecological transition (see positive peak of the index). If we analyse the climate branch, we observe the young commitment (giovani, ragazzi) into the climate movement and COP (UN Climate Change Conference of the Parties) which began the 31 ${}^{\text{st}}$ October 2021.

5. Conclusions

With respect to the present study, it is evident that the Twitter debate on climate change reached its maximum in 2019, but still volumes of 2021 and 2020 are 70% higher than those of 2016–2017. Concerning the dynamics of the daily raw index, it is quite difficult to identify a clear trend, presenting numerous spikes, i.e. additive outliers related to identifiable events of extreme weather conditions, mainly anomalous heat in summer (negative), or public policies in favour of renewable energies and sustainable development (positive). Concerning the Italian twitter mood we observe a substantial seasonal fall during summer. In particular, it seems that the raw index shows a yearly seasonality, associated with a negative mood in summer.

From a topic modelling view, results confirm that a young population, jointly with the use of many hashtags that confirm the presence of a closed community, primarily follows the debate on climate change.

For the first time we exploited the set of the Istat filter tweets to produce a new daily index. As results seem to be very encouraging we plan some further investigations in order to measure other phenomena related to climate and environment and to isolate the latent components of the time-series by means of an econometric analysis in order to identify the eventual presence of seasonal patterns and trend.

These actions need to be implemented prior to Istat publication of an experimental index.

In general, the production and publication of new experimental statistics for a national statistics office, (i.e. experimental because the target population is unknown, as well as the underlying tweets generating processes), involve several aspects:

The relevance of the topic, and the lack of official statistics measurements in such area are crucial arguments; moreover the climate change thematic is uprising popular and the publication of such an index should meet the user informative demand.

The long-term sustainability of the production process and of the source must be guaranteed: this should be ensured by the twitter policy, which permits a costless data collection process, unlikely, other social media data sources. It also ensures real-time indexes, which allow for timely evaluations for policy makers.

An internal revision and validation carried out by subject-matter statisticians, which produce environmental statistics so to enhance the user needs’ informative content of the index, and to provide useful insights in interpreting climate related phenomena.

We tried to address the concern of Italian twitter users and their concerns about climate. Results show that in terms of volumes and debate we have coherent results with the global strike of young as in the rest of the world, as recorded with the growth of Greta Thundberg movement and with google trends.

Footnotes

A detailed description of Word-Emotion Association Lexicon is available from: https://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm.

A detailed description of the Hedonometer project is available from: https://hedonometer.org/about.html.

Sentix vocabulary is available from: https://valeriobasile.github. io/twita/sentix.html.

References

Jung

Petkanic

Nan

Kim

. When a Girl Awakened the World: A User and Social Message Analysis of Greta Thunberg. Sustainability. 2020; 12: 2707.

Dahal

Kumar

SAP

. Topic modeling and sentiment analysis of global climate change tweets. Soc Netw Anal Min. 2019; 9(24): na-na.

Depaula

. Climate science communication on Twitter: A topic modeling analysis of US federal government agencies. In: iConference 2020 Proceedings. Available from: http://hdl.handle.net/2142/106591.

Fang

QJW

. Topic Modelling and Sentiment Analysis of Global Warming Tweets: Evidence From Big Data Analysis. Journal of Organizational and End User Computing (JOEUC). 2022; 34(3).

Cody

Reagan

Mitchell

Dodds

Danforth

. Climate Change Sentiment on Twitter: An Unsolicited Public Opinion Poll. Plos One. 2015; 10(8): e0136092.

Number of users of leading social networks in Italy in March 2021. Statistica.com; 2021. Available from: https://www.statista.com/statistics/787390/main-social-networks-users-italy/.

Censis. 13∘ rapporto censis-ucsi sulla comunicazione i media tra élite e popolo. 2016. Available from: http://www.censis.it/17?shadow_pubblicazione=120570.

Righi

Gentile

Bianco

. Who Tweets in Italian? Demographic Characteristics of Twitter Users. In: New Statistical Developments in Data Science. SIS 2017. Springer Proceedings in Mathematics & Statistics. Springer; 2019. vol. 288.

Bruno

Catanese

Scannapieco

Valentino

. Natural language processing in official statistics: The social mood on economy index experience. Statistical Journal of the IAOS. 2022; 38: 1-9.

10.

I.stat, the online data warehouse of Istat. Available from: http://dati.istat.it/?lang=en.

11.

Bruno

Catanese

De Cubellis

De Fausti

Pugliese

Scannapieco

Valentino

. Analyzing textual data through Word Embedding: experiences in Istat. In: Proceedings of the 51st Scientific Meeting of the Italian Statistical Society. Caserta; 22-24 June 2022. pp. 571-583.

12.

Liu

Zhang

. A Survey of Opinion Mining and Sentiment Analysis. Springer: Mining Text Data; 2012. pp. 415-463.

13.

Pang

Lee

Vaithyanathan

. Thumbs up, Sentiment Classification using Machine Learning Techniques. In: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. Philadelphia, PA, USA; 6–7 July 2002. pp. 79-86.

14.

Mukherjee

Joshi

. Author-Specific Sentiment Aggregation for Polarity Prediction of Reviews. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014. Reykjavik, Iceland; 2014.

15.

Diamantini

Mircoli

Potena

. A Negation Handling Technique for Sentiment Analysis. In: Proceedings of the 2016 International Conference on Collaboration Technologies and Systems, CTS 2016. Orlando, FL, USA; 31 October–4 November 2016.

16.

Perikos

Hatzilygeroudis

. Aspect based sentiment analysis in social media with classifier ensembles. In: Proceedings of the 16th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2017. Wuhan, China; 24–26 May 2017.

17.

Pota

Esposito

Pietro

. A Forward-Selection Algorithm for SVM-Based Question Classification in Cognitive Systems. In: Proceedings of the Intelligent Interactive Multimedia Systems and Services 2016. Puerto de la Cruz, Tenerife, Spain; 15–17 June 2016.

18.

Manning

Raghavan

Schütze

. Introduction to Information Retrieval. Cambridge University Press; 2008.

19.

Berger

Pietra

VJD

. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics. 1996; 22: 39-7.

20.

Pota

Fuggi

Esposito

Pietro

. Extracting Compact Sets of Features for Question Classification in Cognitive Systems: A Comparative Study. In: Proceedings of the 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC 2015. Krakow, Poland; 4–6 November 2015.

21.

Basile

Nissim

. Sentiment analysis on Italian tweets. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2013. pp. 100-107.

22.

Pelosi

. SentIta and Doxa: Italian Databases and Tools for Sentiment Analysis Purposes. In: Proceedings of the Second Italian Conference on Computational Linguistics CLiC-it 2015. 2015. pp. 226-231.

23.

Di Gennaro

Rossi

. The FICLIT+CS@UniBO System at the EVALITA 2014 Sentiment Polarity Classification Task. In: Proceedings of the First Italian Conference on Computational Linguistics CLiC-it 2014. 2014.

24.

Bolioli

Salamino

Porzionato

. Social Media Monitoring in Real Life with Blogmeter Platform. In: Proceedings of the First International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (ESSEM 2013). 2013. vol. 1096. pp. 156-163.

25.

Castellucci

Croce

Basili

. A Language Independent Method for Generating Large Scale Polarity Lexicons. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016. Portorož, Slovenia; 23–28 May 2016.

26.

Cambria

Xing

Poria

Kwok

. SenticNet 6: Ensemble Application of Symbolic and Subsymbolic AI for Sentiment Analysis. In: Proceedings of the CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management. 2020. pp. 105-114.

27.

Mikolov

Sutskever

Chen

Corrado

Dean

. Distributed Representations of Words and Phrases and their Compositionality. In: Proceedings of the Advances in Neural Information Processing Systems. 5–8 December 2013. pp. 3111-3119.

28.

Ali

El-Sappagh

SHA

Islam

SMR

Ali

Attique

Imran

Kwak

. An intelligent healthcare monitoring framework using wearable sensors and social networking data. Future Generation Computer Systems. 2021; 114: 23-43.

29.

Pennington

Socher

Manning

. Glove: Global Vectors for Word Representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014.

30.

Cao

Rei

. A Joint Model forWord Embedding andWord Morphology. In: Proceedings of the 1st Workshop on Representation Learning for NLP. 2016.

31.

Bojanowski

Grave

Joulin

Mikolov

. Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics. 2017; 5: 135-146.

32.

Pan

Yang

Wang

Tang

. Cambria, E. Learning Word Representations for Sentiment Analysis. Cognitive Computing. 2017; 9: 843-885.

33.

Yadav

Vishwakarma

. Sentiment analysis using deep learning architectures: A review. Artificial Intelligence Review. 2020; 53: 4335-4385.

34.

Thakkar

Pinnis

. Pretraining and Fine-Tuning Strategies for Sentiment Analysis of Latvian Tweets. In: Human Language Technologies, the Baltic Perspective. Proceedings of the Ninth International Conference Baltic HLT 2020. Kaunas, Lithuania: IOS Press; 2020. pp. 55-61.

35.

Polignano

Basile

de Gemmis

Semeraro

Basile

. ALBERTO: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. In: Proceedings of the Sixth Italian Conference on Computational Linguistics. Bari, Italy; 2019.

36.

Ortis

Suarez Sagot

Romary

. Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures. In: 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7). Cardiff, United Kingdom; 2019.

37.

Devlin

Chang

Lee

Toutanova

. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019. Minneapolis, MN, USA; 2–7 June 2019.

38.

Samek

Montavon

Lapuschkin

Anders

Müller

. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proceedings of the IEEE. 2021 March; 109(3): 247-278.

39.

Pota

Ventura

Catelli

Esposito

. An Effective BERT-Based Pipeline for Twitter Sentiment Analysis: A Case Study in Italian. Sensors. 2021.

40.

Zardetto

. Using Twitter Data for the Social Mood on Economy Index. In: Atti della XIII Conferenza nazionale di statistica. 2020. pp. 385-390.

41.

Catanese

Scannapieco

Bruno

Valentino

. The Italian Social Mood on Economy Index: recent methodological developments. In: Proceedings of the 16th international conference on statistical analysis of textual data (JADT). 2022. vol. 1. pp. 213-220.

42.

Castellucci

Croce

Basili

. Acquiring a Large Scale Polarity Lexicon through Unsupervised Distributional Methods. In: Natural Language Processing and Information Systems – 20th International Conference on Applications of Natural Language to Information Systems, Lecture Notes in Computer Science. Springer; 2015.

43.

Mikolov

Yih

Zweig

. Linguistic regularities in continuous space word representations. In: Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2013. pp. 746-751.

44.

De Fausti

De Cubellis

Zardetto

. Word Embeddings: a Powerful Tool for Innovative Statistics at Istat. JADT. 2018.

45.

Levy

Goldberg

Dagan

. Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics. 2015; 3: 211-225.

46.

Bianchi

Nozza

Hovy

. FEEL-IT: Emotion and Sentiment Classification for the Italian Language. In: Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. 2021. pp. 76-83.

47.

Bénédicte

Tanguy

. Towards Qualitative Word Embeddings Evaluation: Measuring Neighbors Variation. North American Chapter of the Association for Computational Linguistics. 2018.

48.

Angana

Manash

Amit

. Are Word Embedding Methods Stable and Should We Care About It? In: Proceedings of the 32nd ACM Conference on Hypertext and Social Media. 2021.

Italian sentiment analysis on climate change: Emerging patterns from 2016 to today

Abstract

Keywords

1. Introduction

2. Background and related works

2.1 Word embedding

2.2 Deep neural networks

3. Methods

3.1 Twitter social mood index on climate change

Step (i): Daily sample data collection

Step (ii): Sentiment score estimate

Step (iii): Calculation of the daily index

3.2.1 WordEmBox

3.2.2 WE model performance

4. Results

4.1 Relevance of the CC filter

Table 2 Stability evaluation of WordToVec models

Table 3 List of the first 25 stable affinites obtained comparing E1 (CBOW window = 5) vs E2 (CBOW window = 8) for the word clima climate

4.5 Comparison of sentiment index within the two filters

4.6 Discussion in Italy about the climate agenda

5. Conclusions

Footnotes

References

Table 2
Stability evaluation of WordToVec models

Table 3
List of the first 25 stable affinites obtained comparing E1 (CBOW window $=$ 5) vs E2 (CBOW window $=$ 8) for the word clima climate