Refinement of word embeddings with sentiment content using multi-output neural networks

Abstract

Word embeddings have been successfully used in diverse tasks of Natural Language Processing, including sentiment analysis and emotion classification, even though these embeddings do not contain any emotional or sentimental information. This article proposes a method to refine pre-trained embeddings with emotional and sentimental content. To this end, a Multi-output Neural Network is proposed to learn emotions and sentiments simultaneously. The resulting embeddings are tested in emotion classification and sentiment analysis tasks, showing an improvement compared with the pre-trained vectors and other proposes in the state-of-the-art for fine-grained emotion classification.

Keywords

Word embedding multi-output neural network VAD polarity emotion classification

1 Introduction

In Natural Language Processing (NLP), word representation is essential for many tasks. Two types of word representations exist in NLP: the first one proposed was one-hot-encoding, where each word represents an index of a vocabulary; the second one, introduced by [2], is a vector-based model (also known as embeddings) where each word is represented in n-dimensional space as a vector of continuous numbers. In contrast to one-hot-encoding, embeddings encode similarities between words as distance or angle between word vectors, capturing various lexico-semantic relations, while one-hot-encoding representation does not capture any semantic information. Despite their advantages, embeddings do not carry any emotional information; however, this limitation has not reduced their usage in emotion classification and sentiment analysis tasks.

Recent research in sentiment-emotion classification tasks focuses on deep learning methods using transformers. This paper focuses on research using sentiment-emotion word embeddings to improve the classification. To this end, it is proposed a refinement of pre-trained word embeddings using a Multi-output Neural Network using different sentiment-emotion lexicons¹.

This paper is organized as follows: in Section 2, a review of state-of-the-art was made; Section 3 describes in detail the proposed model; in Section 4, a description of the corpus used for comparison with state-of-the-art and results are presented, in Section 5 a conclusion and proposal of future work are presented.

2 Related work

Two main approaches have been proposed in state-of-the-art to incorporate sentiment information in embeddings [15]:

–
Learning sentiment word embeddings from scratch in a combination of supervised and unsupervised learning (described in 2.1).
–
Refining pre-trained word embeddings using sentiment and emotion lexicons as resources (described in 2.2).

2.1 Sentiment word embeddings from scratch

For the first approach, [8] proposed two models: one to capture semantic similarities using a probabilistic model similar to Latent Dirichlet Allocation (LDA) to learn the word’s association strength with respect to each latent topic; the second model was used to capture word sentiment using logistic regression to predict the polarity of documents. The combination of these models maximizes the sum of the objective functions of the models. The authors tested their approach for the sentiment classification task using three corpora: Polarity Dataset, Subjectivity Dataset, both proposed by [17], and their proposed dataset collected from IMDB (Internet Movie Database, [10]). The results showed better accuracy for the semantic model than the combined model in two datasets (Polarity and Subjectivity).

Later on [30] proposed SSWE (Sentiment Specific Word Embedding) using a dataset of 10M tweets labeled as positive or negative (balanced). The authors developed a model that captures the sentiment information of sentences and the syntactic contexts of words using a specific loss (a linear combination of two hinge losses). The authors used the dataset used in the International Workshop on Semantic Evaluation 2013 (SemEval-2013 dataset) [16] to test their proposal. Compared to other algorithms evaluated on the same dataset, the proposal showed the best performance in accuracy for the classification.

2.2 Refining pre-trained word embeddings

For the second approach, [1] proposed Emotion Word Embeddings (EWE), an emotion-enriched word representation refining GloVe embeddings [19]. The authors used a Long Short-Term Memory (LSTM) and a Neural Network (NN) with a hidden layer, with the embedding matrix E (initialized with GloVe) added to the input layer. The authors used the six basic emotions of [3] (anger, surprise, disgust, enjoyment, fear, and sadness) for the word-emotion vector; this vector was concatenated to each of the inputs of the LSTM then, the outputs were passed to the NN for the classification of the emotions. The final representation of EWE is the embedding matrix E after the training. To measure the emotion similarity, the authors clustered emotionally similar words, using the formula proposed by [30] and the emotion lexicon DepecheMood [28]. EWE outperforms the results of the other embeddings: GloVe, Word2Vec [9], SSWE [30].

In [14], based on the approach of retrofitting (a technique that encourages related words to have comparable vector representations in order to improve vector space representations using relational information from semantic lexicons., [4]), the authors proposed the counter-fitting method, using a loss function to inject antonyms and synonyms constraints into vector space representations. Diverse research focuses on generating emotional embeddings have been made using counter-fitting.

Afterwards, [24] used counter-fitting to generate emotional embeddings using the eight basic emotion categories proposed by [20] (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust), and the lexicon National Research Council Canada Word-Emotion Association Lexicon (NRC-EmoLex) [13]. Authors showed that by using counter-fitting to different pre-trained vectors, an improvement of emotional similarity is reached based on the [26] emotion categorization.

A similar approach was taken by [6], where two steps were used to obtain the embeddings: the first step was to concatenate the values of Valance, Arousal and Dominance (VAD²) [32] values to the pre-trained embeddings and, as the second step, apply counter-fitting. The authors used the International Survey on Emotion Antecedents and Reactions (ISEAR) [31] and Twitter Emotion Corpus (TEC) [11] datasets to test their approach, comparing the results with the pre-trained embeddings. The authors showed that concatenating VAD improves the accuracy classification for all the pre-trained embeddings, while applying the second step had a lower performance than pre-trained embeddings.

In [15], authors proposed Sentiment-Aware Word Embedding (SAWE) to refine GloVe by combining semantic and sentimental aspects of words. To obtain the sentiment aspect, the authors used a Feed-Forward Neural Network model to predict the polarity (strongly positive, weakly positive, strongly negative, and weakly negative) score of pre-trained word embeddings using the combination of two lexicons: Extended version of Affective Norms of English Words (E-ANEW) [32] and Subjectivity clue [33]. After the training, the senti-embeddings were obtained from a linear combination between the input matrix and the output of the hidden layer. Finally, the authors used Principal Component Analysis (PCA) to reduce the dimensionality of the input pre-trained word vectors concatenation with senti-embeddings. Using the dataset Stanford Sentiment Treebank (SST) [27] and SemEval-2013 [16], the authors show an improvement in accuracy with respect to pre-trained embeddings and other proposals of the state-of-the-art, including SSWE.

[25] proposed a method to generate affective embeddings using retrofitting and VAD (Valence, Arousal, and Dominance). The authors used a Multi-Layer Perceptron to learn a transformation function using a custom loss function. Authors apply this transformation function to pre-trained vectors to obtain the final emotional embeddings. The authors used SST, SemEval-2017 (International Workshop on Semantic Evaluation 2017) [23], and Mustard++ [22] dataset to test their proposal, obtaining the best Micro F1-score for all datasets compared to pre-trained embeddings and other works of the state-of-the-art, as EWE, counter-fitting, among others.

This work took a similar approach to the state-of-the-art. It proposed using a Multi-Output Neural Network to learn emotional embeddings using a combination of lexicons and PCA.

3 Proposed work: refinement of pre-trained word embeddings

Multi-output Neural Networks are used to predict multiple outputs given an input, allowing diverse output data types. In Natural Language Processing, sub-fields of Multi-output Learning have been used for different applications: Document Categorization, Language Translation, Named Entity Recognition, among others [34]

This paper has proposed the use of a Multi-output Neural Network for the refinement of pre-trained word embeddings. This proposal consists of three-steps:

–
Learn a transformation function using a multi-output neural network to map pre-trained embeddings to an intermediate representation, which will be referred to as senti-embeddings, using a lexicon of around 20,000 words.
–
Use the map function learned to obtain the senti-embeddings representation for all the words present in the pre-trained embeddings using the transformation function.
–
Concatenate original pre-trained embeddings with senti-embeddings to preserve part of the semantic information. Using PCA, the resulting vector is reduced to obtain the final embeddings with a dimension size of 300.

Each part of the proposal is described in detail in the following subsections.
3.1 Multi-output neural network

A Multi-output Neural Network was trained to simultaneously learn emotional content (represented by Valance, Arousal and Dominance) and polarity content (negative, positive, and neutral). The end of using this network was to learn a non-linear transformation function to generate VAD-polarity enriched embeddings. The following lexicons were used for the training:

–
NRC-VAD³ [12]: Lexicon of emotion words obtained with human ratings with three primary independent dimensions of emotions: Valance, Arousal, and Dominance. The authors used the comparative annotation technique Best-Worst Scaling (BWS [7]) to obtain the values of the most frequent words using a combination of different lexicons. This lexicon consists of 19,971 annotated words.
–
Subjectivity clue lexicon [33]: Lexicon created using a combination of manual review and empirical results on a small training set of manually annotated data. Words are annotated according to four contextual polarity labels: strongly positive, weakly positive, strongly negative, and weakly negative. This lexicon consist of 7,228 different labeled words.
–
NRC-EmoLex⁴ [13]: Lexicon that provides English words and their associations with the eight basic emotion categories proposed by [20] (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and a polarity score (positive or negative) This lexicon consists of 14,154 words.

The Network consists of two outputs: a regression for predicting the VAD values and a multi-class classification for predicting the polarity. Specific losses were used for each output. Total loss is calculated with the sum of the individual losses per output. GloVe embeddings were used as input for the network⁵. A split of ten percent of the lexicon was used for validation. The following subsections are described in detail in each part of the Multi-output Model. Figure 1 shows the architecture of the proposed model.

Fig. 1
Multi-output Neural Network architecture. The model receives as input 20,618 GloVe embeddings (for those words in the VAD lexicon). Each input vector has a VAD value and a polarity value at the output.
3.1.1 VAD (Regression)

The initial size of the NRC-VAD lexicon is 19,971 words. To expand the number of words with a VAD value and a corresponding value in GloVe, if the lemma of a word present in GloVe is also present in VAD, this word is added to the lexicon with the corresponding VAD value of the lemma. The size of the final lexicon is 20,618.

The regression output is the VAD value of each input word, i.e., the output layer has a size of 3 (corresponding to each VAD value) and a linear activation. Similar to the approach of [25], each of the VAD values per word was weighted using the density-based weighting scheme (DenseWeight) [29], using an alpha value of 1. The values obtained were averaged to obtain a weighting per word. Weighted Mean Squared Error (WMSE), shown in Equation 1, was used. All the words in the VAD lexicon were used to create the dataset. The pre-trained GloVe embeddings were used as input, considering only those values present in VAD; for words in VAD but not in GloVe, a random uniform initialization was used.

\begin{matrix} WMSE = \frac{1}{N} \sum_{i = 0}^{N} w_{i} \times (y_g {old}_{i} - y_p {red}_{i})^{2}, \end{matrix}

(1)

where N is the number of samples; y _ gold_i is target value for sample i; y _ pred_i is the predicted value for sample i.

3.1.2 Polarity (Multi-class classification)

The output of the Multi-class classification corresponds to the combination of the lexicons Subjectivity clue lexicon and NRC-EmoLex, using only those words that also exist in the VAD lexicon.

NRC-EmoLex

For the lexicon NRC-EmoLex, only two labels of the original dataset were used, negative and positive, plus a dimension for neutral words. Given that all words in NRC-EmoLex are present in NRC-VAD, only those words that appear with both positive and negative values were discarded. The final number of words considered was 14,073, Table 1 shows the final distribution of words.

Table 1
Distribution of words in NRC-EmoLex after removing words with both positive and negative values

Negative Positive Neutral

3,235 2,227 8,611

Negative	Positive	Neutral
3,235	2,227	8,611

Subjectivity clue

Originally the Subjectivity clue lexicon contained 7,228 words; 2,781 words were absent in VAD, so they were discarded. The final number of words considered was 4,447; Table 2 shows the final distribution of words.

Table 2

Distribution of words in Subjectivity clues lexicon after removing words not present in VAD

Positive		Negative
Strongly	Weakly	Strongly	Weakly
950	680	1,839	1,035

The Subjectivity clue lexicon was represented in three dimensions following the following considerations: –

Strongly_Positive and/or Weakly_Positive → Positive.

–

Strongly_Negative and/or Weakly_Negative → Negative.

–

Weakly_Positive and Weakly_Negative → Neutral.

–

Strongly_Positive and Weakly_Negative → Positive.

–

Strongly_Positive and Strongly_Negative → Neutral.

–

Weakly_Positive and Strongly_Negative → Negative.

Combined polarity lexicon

A neutral value was assigned for words existing in the VAD lexicon but not in the combined polarity lexicon. Table 3 shows the result of combining words in NRC-EmoLex and Subjectivity clues and the combination of these with the lemmas added to VAD and words only in VAD (Combined polarity lexicon)

Table 3

Number of words in the combined polarity lexicon

Negative	Positive	Neutral	Total
Combined polarity lexicon	4,523	3,056	13,039	20,618

Given that the combined polarity lexicon distribution is imbalanced, the loss Weighted Categorical Cross-Entropy (WCCE) [5], described in Equation 3, was used. Equation 2 specifies the formula used to determine weights per class.

{weights}_{class} = \frac{n_samples / (n_classes \times y_{class})}{2},

(2)

where weights_class is the corresponding weight per class; n _ samples is the total number of samples; n _ classes is the number of classes; y_class is the number of positive samples for class class.

\begin{matrix} WCCE & = - \frac{1}{M} \sum_{k = 1}^{K} \sum_{m = 1}^{M} {alpha}_{km}, \\ {alpha}_{km} & = w_{k} \times y_g {old}_{m} \times \log (y_p {red}_{m}) \end{matrix}

(3)

where M is the number of samples; k is the number of classes; w_k is the weight for class k; y _ gold_m is target value for sample m; y _ pred_m is the predicted value for sample m

3.2 Pre-trained embeddings transformation

The Multi-output Neural Network was trained for 200 epochs to learn the mapping function. With the model trained, it was used to obtain the senti-embedding representation using the pre-trained embeddings. The output layers of the model are removed, and the last hidden layers of each output are concatenated. The pre-trained embeddings (GloVe) are fed into the model, resulting in the senti-embedding at the output. The senti-embeddings are concatenated to the original pre-trained embeddings to preserve part of the semantic information.

The concatenated senti-embeddings and pre-trained embeddings are reduced to obtain the final representation of the embeddings with a dimension of 300. Given the vocabulary size (around 2M words), IncrementalPCA [18] was used for the reduction.

4 Results

The embeddings were tested on the task of emotional classification and sentiment analysis using the following corpus:

–
SemEval-2017 [23] International Workshop on Semantic Evaluation 2013, task 4A for Sentiment Analysis in Twitter. This dataset contains 62,617 sentences labeled with one of the labels: positive, negative, or neutral. The provided partition train/test was used, making an extra validation split of 20 percent in training.
–
SST-2 [27] Stanford Sentiment Treebank binary variant (accessed via the library pytreebank [21]). This dataset contains 9,613 sentences labeled as positive or negative. The provided partition train/dev/ test was used.
–
ISEAR [31] International Survey on Emotion Antecedents and Reactions contain 7666 samples labeled with seven emotions: joy, fear, anger, sadness, disgust, shame, and guilt. As in the state-of-the-art, only five emotions were considered joy, fear, anger, sadness, disgust. A split of 20 percent was made for the test.

A Bi-LSTM was used to evaluate the generated embeddings. The same GloVe embeddings used as input for the training⁵ were used as the baseline. The results of the classification are shown in Table 4.

Table 4
Sentiment classification for Bi-LSTM and different embeddings. The results reported are the average micro F1-scores and standard deviation after ten runs. This is an approximation based on the specifications of the article. The best results are shown in bold, the second best result is underlined

Embeddings SemEval-2017 SST-2 ISEAR

GloVe [19] 0.6523 (0.0067) 0.8181 (0.0104) 0.6340 (0.0070)

Shah et al. [25] 0.6539 (0.0067) 0.8115 (0.0075) 0.6287 (0.0092)

EWE [1] 0.6493 (0.0054) 0.8233 (0.0087) 0.6344 (0.0073)

SAWE [15] 0.6495 (0.0054) 0.8169 (0.0100) 0.6301 (0.0069)

Proposal 0.6532 (0.0068) 0.8253 (0.0037) 0.6481 (0.0095)

There was no improvement in the polarity representation using the proposal of this article, having similar results to the state-of-the-art.

For the dataset SemEval-2017, all the embeddings had a similar performance. The proposal of this paper had the second-best average micro F1-score; the best one was proposed by Shah et al. [25]. For binary classification, using SST-2, the proposal had the best performance on average; EWE had the second-best result; however, considering the standard deviation, the variation is minimal.

For ISEAR, there is a more significant difference in the result for the other embeddings, making this article’s proposal the best one.
5 Conclusions

Embeddings	SemEval-2017	SST-2	ISEAR
GloVe [19]	0.6523 (0.0067)	0.8181 (0.0104)	0.6340 (0.0070)
Shah et al. [25]*	0.6539 (0.0067)	0.8115 (0.0075)	0.6287 (0.0092)
EWE [1]	0.6493 (0.0054)	0.8233 (0.0087)	0.6344 (0.0073)
SAWE [15]	0.6495 (0.0054)	0.8169 (0.0100)	0.6301 (0.0069)
Proposal	0.6532 (0.0068)	0.8253 (0.0037)	0.6481 (0.0095)

This paper proposed using a Multi-output Neural Network to create embeddings rich in emotion and polarity using a combination of lexicons and outputs.

For fine-grained classification, the proposal had the best representation of emotions; thus, using a Multi-Output Neural Network successfully captures emotional information. However, the polarity information requires improvement since the results obtained for SemEval-2017 and SST-2 are almost identical to those shown by the state of the art.

For future work, Multi-output Neural Networks will continue to be exploited to create embeddings that better represent sentiments and emotions by using a balanced lexicon, avoiding over-representing a specific class, could improve the representation of the sentiments.

Footnotes

Acknowledgment

The work was done with partial support from the Mexican Government through grant A1-S-47854 of CONACYT, Mexico, grants 20232138, 20232080, 20231567, and 20231387 of the Secretarıa de Investigación y Posgrado of the Instituto Politécnico Nacional, Mexico. The authors thank the CONACYT for the computing resources brought to them through the Plataforma de Aprendizaje Profundo para Tecnologıas del Lenguaje of the Laboratorio de Supercómputo of the INAOE, Mexico, and acknowledge the support of Microsoft through the Microsoft Latin America Ph.D. Award.

Notes

2

Valence or Pleasure: the pleasantness of a stimulus; Arousal: the intensity of emotion provoked by a stimulus; Dominance: the degree of control exerted by a stimulus.

3

NRC-VAD: National Research Council Canada Valence, Arousal, and Dominance Lexicon

4

NRC-EmoLex: National Research Council Canada Word-Emotion Association Lexicon

5

GloVe embeddings by Pennington et al. [19], available at

References

Agrawal

Papagelis

, Learning emotion-enriched word representations. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 950–961). Santa Fe, New Mexico, USA: Association for Computational Linguistics. (2018) https://aclanthology.org/C18-1081.

Bengio

Ducharme

Vincent

Janvin

, A neuralprobabilistic language model, J. Mach. Learn. Res. 3, (2003), 1137–1155.

Ekman

, An argument for emotions, Cognition and Emotion6 (1992), 169–200. https://doi.org/10.1080/02699939208411068.

Faruqui

Dodge

Jauhar

S.K.

Dyer

Hovy

SmithRetrofitting

N.A.

, Retrofitting word vectors to semantic lexicons. In Proceedings of 13 the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1606–1615). Denver, Colorado: Association for Computational Linguistics (2015). https://doi.org/10.3115/v1/N15-1184.

Wookey

, The real-world-weight crossentropy loss function: Modeling the costs of mislabeling, IEEE Access8 (2020), 4806–4813. https://doi.org/10.1109/ACCESS.2019.2962617.

Kulkarni

Bhattacharyya

, Retrofitting of pretrained emotion words with VAD-dimensions and the Plutchik emotions. In Proceedings of the 18th International Conference on Natural Language Processing (ICON) (pp. 529–536). National Institute of Technology Silchar, Silchar, India: NLP Association of India (NLPAI) (2021).

Louviere

J.J.

Flynn

T.N.

Marley

A.A.J.

, Best-Worst Scaling: Theory, Methods and Applications. Cambridge University Press. (2015), https://doi.org/10.1017/CBO9781107337855.

Maas

A.L.

Daly

R.E.

Pham

P.T.

Huang

A.Y.

Potts

, Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (pp. 142–150). Portland, Oregon, USA: Association for Computational Linguistics. (2011) https://aclanthology.org/P11-1015.

Mikolov

Chen

Corrado

G.S.

Dean

, Efficient estimation of word representations in vector space. In International Conference on Learning Representations (2013).

10.

Miller

F.P.

Vandome

A.F.

McBrewster

, net Movie Database. Alpha Press (2009).

11.

Mohammad

, #emotional tweets. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics –Volume 1: Proceedings 14 of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012) (pp. 246–255). Montréal, Canada: Association for Computational Linguistics (2012). https://aclanthology.org/S12-1033.

12.

Mohammad

, Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 English words. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 174–184). Melbourne, Australia: Association for Computational Linguistics. (2018). https://doi.org/10.18653/v1/P18-1017

13.

Mohammad

Turney

, Turney, Crowdsourcing a wordemotion association lexicon. Computational Intelligence, 29. https://doi.org/10.1111/j.14678640.2012.00460.x(2013).

14.

Mrkšić

Ó Séaghdha

Thomson

Gašić

Rojas-Barahona

L.M.

P.-H.

Vandyke

Wen

T.-H.

Young

,Counter-fitting word vectors to linguistic constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 142–148). San Diego, California: Association for Computational Linguistics (2016). https://doi.org/10.18653/v1/N16-1018.

15.

Naderalvojoud

Sezer

E.A.

, Sentiment aware word embeddingsusing refinement and senti-contextualized learning approach, Neurocomputing405 (2020), 149–160. https://doi.org/10.1016/j.neucom.2020.03.094.

16.

Nakov

Rosenthal

Kozareva

Stoyanov

Ritter

, Wilson

, SemEval-2013 task 2: Sentiment analysis in Twitter. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013) (pp. 312–320). Atlanta, Georgia, USA: Association for Computational Linguistics (2013). https://aclanthology.org/S13-2052.15.

17.

Pang

Lee

, A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04) (pp. 271–278). Barcelona, Spain (2004). https://doi.org/10.3115/1218955.1218990.

18.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

, Vanderplas

Passos

Cournapeau

Brucher

Perrot

Duchesnay

, Scikit-learn: Machine learning in Python, Journal ofMachine Learning Research12 (2011), 2825–2830.

19.

Pennington

Socher

Manning

, GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Doha, Qatar: Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/D14-1162.

20.

Plutchik

, Chapter 1 –a general psychoevolutionary theoryof emotion. In R. Plutchik, and H. Kellerman (Eds.), Theories of Emotion (pp. 3–33). Academic Press (1980). https://doi.org/10.1016/B978-0-12-558701-3.50007-7.

21.

Raiman

, Python package for loading stanford sentiment treebank corpus (2020). https://pypi.org/project/pytreebank/.

22.

Ray

Mishra

Nunna

Bhattacharyya

, A multimodal corpus for emotion recognition in sarcasm. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 6992–7003). Marseille, France: European Language Resources Association (2022). https://aclanthology.org/2022.lrec-1.756.

23.

Rosenthal

Farra

Nakov

, SemEval-2017 task 4: Sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on 16 Semantic Evaluation SemEval ’17. Vancouver, Canada: Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/S17-2088.

24.

Seyeditabari

Tabari

Gholizade

Zadrozny

, Emotional embeddings: Refining word embeddings to capture emotional content of words (2019). https://doi.org/10.48550/ARXIV.1906.00112.

25.

Shah

Reddy

Bhattacharyya

, Affective retrofitted word embeddings. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 550–561). Online only: Association for Computational Linguistics (2022). https://aclanthology.org/2022.aacl-main.42.

26.

Shaver

P.R.

Schwartz

J.C.

Kirson

O’Connor

, Emotionknowledge: further exploration of a prototype approach, Journalof Personality and Social Psychology52(6) (1987), 1061–86

27.

Socher

Perelygin

Chuang

Manning

C.D.

A.Y.

Potts

, Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013, 18-21 October 2013, Grand Hyatt Seattle, Seattle, Washington, USA, A meeting of SIGDAT, a Special Interest Group of the ACL (pp. 1631–1642). ACL (2013). https://aclanthology.org/D13-1170.

28.

Staiano

Guerini

, Depeche mood: a lexicon for emotion analysis from crowd annotated news. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp.427–433). Baltimore, Maryland: Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/P14-2070.

29.

Steininger

Kobs

Davidson

Krause

HothoDensity-based

, weighting for imbalanced regression, MachineLearning110 (2021), 2187–2211. https://doi.org/10.1007/s10994-021-06023-5.17.

30.

Tang

Wei

Yang

Zhou

Liu

Qin

, Learning sentiment-specific word embedding for Twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1555–1565). Baltimore, Maryland: Association for Computational Linguistics (2014). https://doi.org/10.3115/v1/P14-1146.

31.

Wallbott

H.G.

Scherer

K.R.

, How universal and specific isemotional experience? evidence from 27 countries on five continents, Social Science Information25 (1986), 763–795. https://doi.org/10.1177/053901886025004001.

32.

Warriner

Kuperman

Brysbaert

, Norms of valence,arousal, and dominance for 13,915 english lemmas, BehaviorResearch Methods45 (2013), https://doi.org/10.3758/s13428-012-0314-x.

33.

Wilson

Wiebe

Hoffmann

, Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (pp. 347–354). Vancouver, British Columbia, Canada: Association for Computational Linguistics (2005). https://aclanthology.org/H05-1044.

34.

Shi

Tsang

I.W.

Ong

Y.-S.

Gong

Shen

, Survey onmulti-output learning, IEEE Transactions on Neural Networks and Learning Systems31 (2020), 2409–2429. https://doi.org/10.1109/TNNLS.2019.2945133.18.

Refinement of word embeddings with sentiment content using multi-output neural networks

Abstract

Keywords

1 Introduction

2 Related work

– Learning sentiment word embeddings from scratch in a combination of supervised and unsupervised learning (described in 2.1). – Refining pre-trained word embeddings using sentiment and emotion lexicons as resources (described in 2.2). 2.1 Sentiment word embeddings from scratch

2.2 Refining pre-trained word embeddings

3 Proposed work: refinement of pre-trained word embeddings

Table 1 Distribution of words in NRC-EmoLex after removing words with both positive and negative values Negative Positive Neutral 3,235 2,227 8,611

4 Results

Footnotes

Acknowledgment

Notes

2

3

4

5

References

–
Learning sentiment word embeddings from scratch in a combination of supervised and unsupervised learning (described in 2.1).
–
Refining pre-trained word embeddings using sentiment and emotion lexicons as resources (described in 2.2).

2.1 Sentiment word embeddings from scratch

Table 1
Distribution of words in NRC-EmoLex after removing words with both positive and negative values

Negative Positive Neutral

3,235 2,227 8,611