Abstract
Social networks users often post their opinion after reading a news article. By analyzing these responses, it is possible to find diverse emotions expressed in them. When several users react to an article, a distribution of these emotions is accumulated. Writers and publishers would benefit to have an estimation of how users will react to an article. This work proposes a method to predict the distribution of emotions that users would express in Twitter after reading a news article. More than one emotion can be expressed in responses, so that an approach of modeling this distribution as a supervised multi-target classification problem is followed. For this purpose, it was necessary to collect a corpus of Spanish news articles and their associated responses and a group of annotators tagged the emotions expressed in them. The use of this strategy allows to naturally model instances (news articles) that have more than one associated class (emotions expressed in responses). The predicted values are expressed in terms of the percentage of responses that triggered each specific emotion. The proposed method is evaluated by measuring the deviation of the predicted emotion distribution with regard to the annotated set of emotions, obtaining a precision above 90%. In addition to that, the proposed method was used in a foreign corpus in order to compare it with 10 state of the art methods. Results show that the proposed method performs better than 9 of these methods on this corpus.
Keywords
Introduction
Social media allows new interactions between writers and readers. The writer of a post can receive several responses from readers as soon as it is available and then a direct interchange of opinions starts. This particular form of communication has attracted attention of the Sentiment Analysis community because both writers and readers express emotions during their conversation. There have been several efforts to automatically determine emotions from the readers’ perspective.
In [12] a method to classify news articles based on the emotions they evoke in readers was proposed. The authors collected 17,743 news articles from Yahoo!’s China’s and the emotions users expressed from a fixed set of 8 emotions: happy, angry, sad, surprised, heartwarming, awesome, bored and useful. The authors extracted features like unigrams, bigrams, metadata and also used a lexicon to get emotion categories of words. Then, a SVM was trained with these features using 12,079 articles. The model was tested with 5,664 articles. Their results show an accuracy 87.9% in predicting the predominant emotion (also known as Acc@1) expressed by users in each article. In [18] the size of the previous corpus was increased by collecting 25,975 articles for training and 11,441 for testing. The authors used Support Vector Regression (SVR) algorithm on an emotion to predict is percentage of votes in a news article. Then the emotions were sorted according to the estimated percentages of votes. To evaluate results, the measure Acc@n was used, which considers a proposed emotion list to be correct if its first n emotions are both the same and in the same order as the first n emotions in the true emotion list. Acc@1 was 75%, while Acc@8 (the complete rank of emotions) was lower than 10%. In [18] authors proposed a model that can distinguish topics from a background theme to topics from contextual theme. A corpus of 4,570 news articles and user ratings across eight emotions (touching, empathy, boredom, anger, amusement, sadness, surprise, and warmness) from Sina website1 was collected. The authors reported 54.95% of Acc@1. In addition to that, the authors proposed the use of a fine-grained metric to measure how the method is predicting the distribution of emotions compared to the annotated votes over all emotion labels. Averaged Pearson’s correlation coefficient (AP) was used for this purpose. Values of this metric ranges from -1 to 1, where 1 indicates a perfect positive correlation. For this work a value of 0.52 AP was reported. In [11] authors created an opinion network where nodes indicate social opinions and edges indicate relation between them. To create the network, the authors trained word vectors according to the most recent Wikipedia word corpus. Then, semantic distance between news was calculated via word vectors. Prediction is made by neighbor analysis. To evaluate the method authors used the same corpus of Rao [18] and another corpus specially created for this work. Reported results were 0.62 AP and 61.27% Acc@1 in Rao corpus and 0.64 AP and 58.59% Acc@1 in their own corpus.
The method proposed in this paper predicts the expected distribution of emotions that readers would express to a news article. This problem has proved to be difficult to solve by methods like linear regression [13] or using topics related to emotions [20], therefore to explore a new approach is convenient. The proposal described in Section 2.3 uses a multi-target strategy to solve the problem. According to literature review, a multi-target strategy has not been used before to predict distribution of emotions. In addition to that, the comparison of the proposed method against similar methods (see Section 3) shows that the multi-target strategy obtains better results than other methods, and is very close to the topline method, despite the simplicity of the features used for training.
This paper is structured as follows: Section 2 presents the proposed method. Section 3 describes the experiments. Section 4 discusses obtained results and Section 5 draws the conclusions and discusses future work.
Proposal
This section describes the created corpus, along with specifications of the annotation process, as well as, the multi-target strategy followed by the proposed method to predict distribution of emotions.
Corpus development
A good source of news linked to readers’ responses is Twitter. This social network has become media’s favorite for posting news; that is why almost 85% of trending topics are headlines or persistent news [9]. According to a thorough investigation carried out, there is not a free available corpus of news articles along with their reactions by Twitter users. Therefore, in this work a corpus with this information was created. News articles were collected from three Mexican newspapers: El Universal, La Jornada and Excelsior from 01/01/2016 to 01/01/2017. Responses were collected from replies to each newspaper official Twitter account. An example of a news articles and Twitter user responses is the following:
News article headline published by La Jornada (originally in Spanish): “@lajornadaonline: He is not my president!, they shout in US cities after Trump’s victory.” A sample of its corresponding Twitter users responses (translated from Spanish):
@user1: @lajornadaonline @realDonaldTrump They are making pressure, and it is obvious that this is not necessarily going to change the state of things... @user2: @lajornadaonline Polarization has generated radicalizations that threaten to rise tone. @user3: @lajornadaonline @realDonaldTrump That is democracy, accept it; it was a plot.
As can be seen, each Twitter user response expresses emotions, but these emotions are not always explicitly stated in its content. In order to identify emotions in responses, a set of human annotators were asked to perform this task. 4 annotators were provided with 288 news articles collected from the three selected newspapers (aforementioned) and 3,542 Twitter user responses (with an average of 11 responses per article). Table 1 shows the number of news articles and responses by newspaper. Emotions that annotators have to identify were defined based on the six basic emotions described in [22]. Shaver et al. proposed a hierarchy structure of emotions and from this hierarchy, the emotions love, joy, surprise, sadness, anger and fear were defined as basic. A generalized version of Cohen’s kappa (multi-kappa) presented in [3] was used to determine inter-annotator agreement. The agreement was 0.48, considered by [10] as a moderated inter-annotator agreement.
Collected corpus
Collected corpus
Users that reply to news articles can express multiple emotions in their responses. These emotions can be counted in order to determine the frequency (votes) of each emotion. To illustrate this idea, let us consider that Twitter users have expressed some emotions in ten replies, from the set of six basic emotions, after reading a news article. Table 2 shows the expressed emotions indicated with a checkmark.
Reactions of users to a news article
Reactions of users to a news article
It is possible to represent information of Table 2 in terms of percentage by dividing each total counting of emotions by the number of responses. For instance, Love was expressed in 50% of responses, Joy in 100%, while Fear was expressed in 0% of responses. These percentage values will be called distribution of emotions from now on. This notion is useful to determine the predominant emotion from a set of emotions (Joy)—cf. [12], determine the ranking of emotions (Joy, Anger, Love, Surprise, Sadness and Fear)—cf. [13] and more important for this work: in which proportion each emotion was expressed in responses as a reaction to news articles.
A multi-target strategy requires targets to be explicitly defined. A fixed set of values allows to define these values as targets for the predicting model. Accordingly, percentage ranges instead of total votes were used. The resulting set of percentage values is composed by 11 values from 0% to 100% with 10% increment. The same normalization process was applied to each set of responses in order to have all news articles associated with their corresponding distribution of emotions. For example, the values for Joy can be one of {10%, 20%, 30%,..., 100% }, the same for the rest of emotions.
An analysis was made to find out the distribution values for each emotion in the corpus. Some emotions are concentrated in specific percentages. For instance, in El Universal newspaper responses expressed 0% of Love emotion in 60 out of 100 news articles. This situation causes that 7 out of 11 possible percentage values for this emotions were not associated with any news article. A similar situation happens in the rest of newspapers hence the created corpus is imbalanced. Classes that are not equally represented can not be properly learned by Machine Leaning methods and this problem causes that less frequent cases are ignored during prediction. In Section 4 the effects of imbalance corpus will be further addressed.
Predicting emotion distributions can be modeled as a classification problem. By following a supervised approach, news articles are considered as instances, while their corresponding distribution of emotions becomes their corresponding classes. More specifically, the distribution of emotions is a tuple, where each value of the tuple represents the percentage of responses that expressed the corresponding emotion. For instance, in Table 2 the tuple would be (50%, 100%, 40%, 30%, 70%, 0%). A Machine Learning method can be trained with instances and tuples and the generated model would be able to predict tuples of unseen news.
There is a problem related to the kind of classes the model needs to predict. Most of Machine Learning methods are used for binary classification (two mutually exclusive classes) or multi-class classification (more than two mutually exclusive classes). If each emotion is considered as a class, then the model is required to predict 6 not mutually exclusive classes. Well-known methods like Naïve Bayes and SVM can not handle this kind of information directly. When a classification problem has to deal with inclusive classes, it is known as multi-label classification problem. In multi-label classification the input instances correspond to a set of classes instead of just one, so classes are not disjoint [7]. Many problems of real life have this multi-label feature. A song may belong to more than one genre, a novel can have elements of science fiction, thriller and horror at the same time. An important restriction of multi-label problems is that they can only handle two possible values for each class: 1 if the class associated to the input instance or 0 if the class is not. As was defined in 2.2 each emotion can have 11 possible values, so the problem in this case is known as multi-dimensional or multi-target classification problem [7].
Multi-target classification represents a challenging task because of the number of classes the classifier needs to learn. In the case of emotion distribution tuples, there are six emotions and each one has eleven possible values, that is 116. A total of 1,771,561 outputs are possible and the classifier is expected to predict all of them.
Multi-class problems can be casted to binary problems, by following a one-vs-one or one-vs-all strategies [4]. Once the problem is reduced, binary classifiers are used straightforwardly. A similar casting procedure applies from multi-label and multi-target problems. To simplify the explanation of these transformations, a multi-label transformation method is explained first.
Binary Relevance (BR) [2] is a transformation method that generates as many binary datasets as classes there are, using each one to train a binary classifier. In Table 3 a multi-label dataset is shown. It can be seen that each instance is associated with one to three classes (A, B and C). On this dataset the BR transformation was applied and the result is shown in Table 4. BR created three binary datasets and each dataset is specialized in one particular class. For instance, dataset 1 is specialized in class A, so all instances that contain this class are part of the positive examples, while all instances that do not contain class A are taken as negative examples. A binary classifier can be trained on this dataset and will determine if an instance belongs to the positive examples, meaning that it is associated with class A, or belongs to negative examples, meaning that it is not associated with class A. The other two datasets are specialized in classes B and C respectively. When a new test sample arrives, it is given to each individual classifier, joining their predictions to obtain the final set of classes.
Example of multi-label dataset
Example of multi-label dataset
BR transformation of the multi-label dataset
The are other methods like Label Powerset (LP) [2] and Random k-Labelsets (RAkEL) [23] that transform the multi-label problem to a multi-class problem. Further description of multi-label transformation methods can be found in [7].
A method similar to BR can be applied to a multi-target dataset. Table 5 shows an example of multi-target dataset, where a set of four news has been associated with their corresponding tuple (Love, Joy, Surprise, Anger, Sadness, Fear) of emotions’ distribution. The result of applying the BR method is shown in Tables 6 and 7. As can be seen, the transformation method has created 66 binary datasets, each one specialized in a emotion with a particular percentage value. For instance, dataset Surprise20 % has N3 and N4 as positive examples because these news have 40% value in Surprise emotion in their corresponding tuple of emotions’ distribution, while the rest of news are considered negative examples. There is an inherent problem with multi-target datasets—the sparsity of instances among classes. This situation creates, most of the times, imbalanced datasets. This characteristic will affect the results of the proposed method as explained in Section 4.
Example of multi-target dataset
BR transformation adapted to multi-target dataset (part 1)
To summarize, the multi-target strategy allows to tackle the prediction of distribution of emotions as a classification problem, that otherwise could not be handled directly with binary, multi-class or even multi-label approaches. Transformation methods like BR create specialized binary datasets for each possible combination of emotion and percentage values, in order to use well-known methods like SVM and Naïve Bayes. To our knowledge, this is the first time a multi-target strategy has been used to predict distribution of emotions.
This section describes the experiments performed to train and use a model to predict the tuples of emotions’ distribution. It also explains how the corpus was processed, the multi-target methods applied to it and the classifiers used to create a model. Finally, the metric used to evaluate the model and the obtained results are presented.
Preprocessing
Raw text collected from electronic versions of selected newspapers, requires a preprocessing procedure in order to extract useful features to train the model. The first task of preprocessing is to split the text into tokens, this procedure is known as tokenization. Tokenization can use the space character to separate tokens, but it is important to consider special characters like periods, commas, semicolons, etc. that can also be used to separate words. A procedure that tokenizes the text of news articles was implemented.
The number of tokens (total number of words) and types (total number of distinct words) in 288 news articles were 74,930 and 12,353 respectively. Sparsity of words can affect the performance of the model. If words were used as features, would be 12,353 different features. Sparsity can be reduced by applying a procedure called lemmatization. Lemmatization is the process to reduce a word to its lemma. All the inflections of words are grouped together in order to have a unique item. The Spanish lemmatizer provided by the language analysis suite Freeling2 [15] was used. There are 12 morphological modules available in Freeling that can be turned off or on. During the lemmatization process used in the experiments, it was useful for the purpose of this work to keep numbers and dates with their original values, so number detection and date detection modules were turned off to avoid Freeling to change those values to a predefined tag. The rest of the modules were turned on by default. Further details of the morphological modules in Freeling can be found in the online manual3.
After using the lemmatizer, 118,282 tokens and 8,245 types were obtained. The number of tokens got increased because Freeling separates some contractions of Spanish words like “del” to “de el” before applying the lemmatizer. Despite the increase in tokens, the number of types reduced in 33%, which is a significant reduction in sparsity.
Feature extraction
In this work one-hot encoding was selected to characterize instances. In one-hot encoding words of a text are represented as a vector. The dimension of this vector corresponds to the number of different words in the text (types), and a binary value indicates the presence or absence of these words. Although this approach for text representation seems simple, it has proved to be effective in the Sentiment Analysis task [5, 16].
Multi-target classification
As it was explained in Subsection 2.3, the proposed multi-target strategy can deal with tuples of emotions distribution by transforming the original dataset into a binary or multi-class dataset. Problem transformation methods like Binary Relevance or Label Powerset created for multi-label classification problems, can be easily adapted to perform these transformations (see Tables 6 and 7).
BR transformation adapted to multi-target dataset (part 2)
BR transformation adapted to multi-target dataset (part 2)
Some of the state of the art problem transformation methods have been implemented in MEKA, a suite of multi-label methods. MEKA [21] is an extension of the popular Machine Learning suite WEKA [6], so it can use all the features and resources available in WEKA like feature selection methods, binary and multi-class classifiers and clustering methods. MEKA also provides multi-label algorithms that have been adapted to work with multi-target data4. Taking advantage of the availability of multi-target algorithms implemented in MEKA, this tool was selected to perform the experiments.
Several problem transformation methods are available in MEKA, particularly BR, LP and RAkEL were selected because of the good results they have obtained in literature [14]. BR transforms the original dataset into a binary dataset, while LP and RAkEL create a multi-class dataset. Once the dataset has been transformed, binary or multi-class classifiers can be used. Naïve Bayes, Random Forest and SVM were selected for this purpose.
The following list specifies the different features, multi-target methods, classifiers and related parameters (using default values in MEKA) considered in the experiments.
Features Binary values of one-hot encoding (OH) Multi-target transformation methods Binary Relevance (BR) No parameters needed Label Powerset (LP) Sub-sampling value (N) = 0 Pruning value (P) = 0 Seed value (S) = 0 Random k-Labelsets (RAkEL) Number of labels per partition (K) = 3 Sub-sampling value (N) = 0 Pruning value (P) = 0 Seed value (S) = 0 Classifiers Naïve Bayes (NB) No parameters needed Random Forest (RF) Size of each bag (P) = 100 Number of iterations (I) = 100 Number of execution slots (num-slots) = 1 Number of attributes (K) = 0 Minimum number of instances (M) = 1 Minimum variance for split (V) = 0.001 Seed value (S) = 1 Support Vector Machine (SVM) Complexity constant (C) = 1 Tolerance parameter (L) = 0.001 Epsilon for round-off error (P) = 1.0-12 Number of folds for the internal cross-validation (V) = -1 Seed value (W) = 1 Kernel to use (K) = PolyKernel Calibration model (calibrator) = Logistic
There are traditional metrics used to evaluate the performance of classifiers. Accuracy is a popular metric because it determines how many instances were correctly classified with regard to all the predictions the classifier made. Precision, Recall and F-measure are other well-known metrics, but when it comes to multi-label and multi-target classification, these metrics are not useful. The huge number of possible class predictions makes traditional metrics too strict and the information they provide may ignore some good predictions that classifiers obtain. According to the calculation made for the multi-target corpus of this work, an instance can have 116 different tuples of emotions’ distribution. The trained model would have 1 out of 1,771,561 chances to generate a correct prediction, while a binary classifier has 1 out of 2 or a 5-class multi-class classifier has 1 out of 5. Despite the difficulty to generate exact predictions, the model would be able to make partial good predictions. Hamming Loss is a metric that allows to measure partial matches in the predicted classes. But even a partial match metric ignores the relation between the values in tuples. In the specific problem of this work, the emotional distribution value 10% is closer to 20% than to 100%. Therefore, it is preferable to predict a close value even if it is not exactly the expected one.
For this purpose a metric that measures the precision of the distribution of emotions in the predicted tuples was implemented. The emotional distribution precision (EDP) has values in the range [0-1]. Precision is 1 when all the predicted values of the tuples are the expected ones (an exact match). On the other hand, when the value is 0, the difference between the expected and predicted values is the greatest possible (value 0% was expected but 100% was predicted or vice versa). To calculate the difference between the expected values and the predicted ones, the differences are accumulated for each emotion in each tuple, and then they are divided by the maximum value of accumulated differences to obtain the emotional distribution error. EDP is the inverse of this error (EDP = 1 – Emotional distribution error). An example of the use of this metric is shown in Table 8 (percentage values are shown in decimal form). In addition to this metric, average Pearson Correlation (AP) was also calculated. AP has been used in state of the art [11, 18] to measure correlation between the predicted distributions of emotions and the actual ones, for more details see Section 1.
Example of the use of EDP metric
Example of the use of EDP metric
Selected algorithms were run on each newspaper and also on an integrated version of the three newspapers. From each newspaper and the integrated version, 90% of the news articles were used for training and 10% for testing selected from a 10-fold cross-validation.
To compare the results obtained by the multi-target methods, a baseline was defined. Thanks to the corpus analysis explained in Subsection 2.2, it is known in advance that some percentage values are much more frequent in some emotion than in others. The baseline is composed by the most frequent distribution values of each emotion in the different newspapers. The following list shows the baselines defined for each newspaper. Remember that each value of the tuple corresponds to a different emotion (Love, Joy, Surprise, Anger, Sadness, Fear). El Universal: (0%, 10%, 10%, 70%, 70%, 30%). Excelsior: (0%, 10%, 0%, 90%, 80%, 50%). La Jornada: (0%, 10%, 0%, 80%, 90%, 50%). Integrated newspapers: (0%, 10%, 0%, 90%, 80%, 50%).
The best average results for EDP and AP are reported in Tables 9 and 10 respectively.
Best EDP results of each newspaper
Best AP results of each newspaper
Despite that, to our knowledge no other method has been used before to predict distribution of emotions expressed by Twitter users as a reaction to news articles, we can provide a comparison against similar methods in order to asses the performance of the multi-target strategy in other scenarios. In the work done by Li et al. [11], described in Section 1, the correlation between the predicted distribution of emotions and the real distribution was measured using the AP metric. This metric can also be applied to the tuples of emotions distribution predicted by the method proposed in this paper. Authors released the two datasets used in their research and can be downloaded from GitHub5. Both datasets contain news collected from the Sina website. dataset2012 is composed by 4,570 news articles published from January to April of 2012 and its annotated with the users’ votes over 8 emotion labels: touching, empathy, boredom, anger, amusement, sadness, surprise and warmness. dataset2016 is composed by 5,257 news articles6 collected from January to December of 2016 and its annotated with the users votes over 6 emotion labels: touching, anger, amusement, sadness, surprise and curiosity. Authors also provide results of a comparison of their method with other 9 methods, using the same datasets. By using the same datasets it is possible to perform a direct comparison between the multi-target strategy against 10 different methods.
In order to carry out a fair comparison, datasets are used in their original language (Chinese) and the same split by Li et al. was applied to create training and testing sets. In dataset2012 the training set is composed by the first 2,342 news articles and the testing set by the remaining 2,228. The training set in dataset2016 contains the first 3,109 news articles and the remaining 2,148 were used for testing.
In order to apply the multi-target strategy in the same way it was used with the news and Twitter users’ responses corpus, the tuples of emotions distribution were determined for each news article. The number of votes of each emotion was divided by the total number of votes of the news article; then a normalization to 11 values (from 0% to 100% with an increase of 10%) was done. After that, the same stages explained in Section 3 were used.
Development sets were created from the training sets. Each development set was split in 90% for training and 10% for testing. The following steps were applied on the development sets. Preprocessing. Tokenization was done using the space character that authors had already added to each dataset to separate words. A method to reduce sparsity of words was also required. This method reduced the number of words leaving out those words that had the same frequency in more than 90% of the instances. The number of types (different words) was reduced from 38,842 to 2,167 in the case of dataset2012 and from 77,611 to 2,245 in dataset2016. The huge number of types before reduction is due to datasets containing not only Chinese characters but also numbers, URLS and other characters from different alphabets. It is important to mention that the 90% threshold was defined experimentally over the training set, and also the words selected in this set are the same that were used in the testing set. Feature extraction. A vector space model, similar to the one in the previous experiments, was used to represent each news article but instead of binary values, frequencies were used because they provided better results. Multi-target methods and classifiers. The same multi-target methods and classifiers were used in this experiment. BR multi-target transformation and NB classifier yielded the best results. BR transformation method and NB classifier have no parameters to be set.
A baseline was created for each dataset using the most frequent percentage values on tuples. For dataset2012 the baseline tuple is (0%, 0%, 0%, 0%, 0%, 0%, 0%, 0%) and for dataset2016 the baseline tuple is (0%, 0%, 0%, 0%, 0%, 0%, 0%, 10%). As these datasets are bigger and more balanced than the corpus created in this work, it was expected for the baseline to have lower results than other methods.
The final model was created with the characteristics that obtained the best results in the development set and the original testing set was provided to this model. Tables 11 and 12 show the results obtained by the methods reported in [11] and the model created with the multi-target strategy, using the Acc@1 metric and AP metric respectively.
Results using Acc@1 metric
Results using Acc@1 metric
Results using AP metric
Tables 9 and 10 show results very close to 1, which is the best possible value for both metrics. It can be said that multi-target methods are doing very good predictions of the distribution of emotions, but if we turn our attention to the baseline results it can be seen that their prediction seems to be very good as well, even sometimes better than the multi-target methods. Despite the fact of the simple criteria used to define the baseline, multi-target methods are not getting significative improvements in predictions. The reason for these close results is the imbalance in corpus. As it was explained in Subsection 2.2 and shown in Tables 6 and 7, multi-target datasets usually have this problem. In particular, the corpus created for this work is highly unbalanced. For instance, from 100 news articles collected from Excelsior newspaper, emotion Love was associated in 85 news articles to 0%, 13 news articles to 10%, 1 article to 20% and 1 article to 30%. Just by using simple frequency, almost every Machine Learning method will always predict 0% for Love emotion for a new instance in that newspaper, ignoring the rest of instances that have different associated values. These highly frequent values are used in the baseline tuples, and every algorithm used by the multi-target strategy is learning the same values of baseline, which is the best they can do given the provided corpus. In order to see a significant improvement in the proposed method a more balanced corpus is required, as well as increasing the size of the corpus.
On the other hand, the experiments done with the external corpus, which is bigger and more balanced that the created for this work, proved that the proposed multi-target strategy can be very competitive when compared with other methods. According to results shown in Tables 11 and 12, the proposed method obtained better results than all methods in both datasets and with both metrics, except for the Social Opinion Mining model which is the topline. Despite the fact that both features and the classifying algorithm used by the proposed model were very simple the multi-target strategy obtained very close results to those (obtained) by the topline. In contrast with the multi-target strategy, the best qualified methods use more complex features like topics and create models that consider the context and use affective terms.
Another important thing is that the proposed model has proven to be independent of language and context, which is one of the advantages that allowed it to obtain good results. Finally, for the external datasets, the baseline strategy calculated in the same way as before obtained the worst results. The number of instances and the balance on the corpus made difficult this time for the baseline to compete against classification algorithms. It is importante to note that Acc@1 metric in dataset2012 was not calculated for the baseline because its values are all 0% i.e. (0%, 0%, 0%, 0%, 0%, 0%). This metric considered the emotion with the highest prediction as the top-1 and this emotion is compared against the true top-1. In the case of the baseline, the six emotions are considered as the top-1, so when compared against the true top-1 they would be always correct. This situation is not what we want to test, and thus, it was avoided.
Conclusions
This work presented a method to predict distribution of emotion that users would express in Twitter after reading a news article. Users can express more than one emotion in their responses, so a news article would be associated with a set of emotions and each emotion can have a particular frequency. To deal with these characteristics a multi-target strategy was proposed. Several multi-target problem transformation methods were used along with different features and classification algorithms. Results showed that despite the imbalance on the created corpus, the multi-target strategy was able to predict the most frequent distribution value obtaining good results. A comparison was made with 10 methods and two datasets were used for training and testing. Despite the simple features and classification algorithm used by the proposed method, results showed that it performed better than 9 methods and it was very close to the method which obtained in the state of the art the better results. Besides, the multi-target strategy is language and context independent. Increasing the size of the created corpus and including more instances of the less frequent classes are proposed as future work. Ensembles of classifiers could be used to improve predictions as well.
Footnotes
Authors stated in their paper that this corpus contains 5,258 news articles but one is missing in the released version.
Acknowledgments
Authors wish to thank the support of Mexican Government (SNI, Instituto Politécnico Nacional, SIP-IPN, COFAA-IPN projects 20181102, 20182114 and BEIFI-IPN). This work was partially funded by CONACYT under the Thematic Networks program (Language Technologies Thematic Network project 281795).
