On the need of hierarchical emotion classification: Detecting the implicit feature using constrained topic model

Abstract

Nowadays in China, Sina Weibo has become the most popular microblog platform and researches about it are proposed increasingly. In this paper, the problem of emotion classification of Weibo’s posts is addressed in a hierarchical way using a constrained topic model and Support Vector Regression (SVR). Based on this topic model which is variation of Latent Dirichlet Allocation (LDA), an implicit emotion detection algorithm is proposed to identify the underlying emotions. Meanwhile, the constraints are generated based on prior knowledge extraction approaches to compact LDA in order to generate domain-specified topics. Furthermore, a hierarchical emotion structure is employed to classify emotions more precisely into 19 classes. This hierarchy can meet different research granularities. The whole architecture is proposed aimed at alleviating the pain of misclassification caused by feature imbalance and decreasing the labor cost. The experiment results validate that our model outperforms traditional methods with precision, recall and F-scores.

Keywords

Text mining emotion classification microblog topic model

1. Introduction

Due to the convenient networks and the fast-paced modern life, virtual social contact has turned into the major communication pattern for its flexible and interactivity. People upload posts to express their opinions and record daily lives on the social media platforms. Sina Weibo, the representative of Chinese microblog platforms, attracts millions of Chinese users to share their ideas, views and opinions together on it. The interaction of microbloggers such as thumping down a post or participate in a hot topic motivates them to update posts frequently. This makes microblog posts sufficient and valuable for companies and researchers. In conclusion, the extensive use of microblog has drawn attention of industry and academia, more and more scholars are making elaborate efforts on the relevant studies. Meanwhile, as sentiment analysis is always a hot topic in NLP, it is natural to apply sentiment analysis on microblog posts. However, there are many slang words and abbreviations in these mass messages due to the colloquial format of them. To cope with these informal texts with grammar and spelling errors is a challenging task.

To implement this task, the following challenges must be taken into consideration: First, microblog post is a kind of short text that it has a space constraints to 140 words. This length limitation makes it difficult for LDA to extract topics for the few word co-occurrence of each post. This leads LDA to generate broad topics and word sparsity since some useful wordpairs appearing with low frequencies will be ignored and some too common words are selected to many topics. Then, Chinese is a language with euphemistic terms and reserved manner. Consequently, many sentences in Chinese do not express microbloggers’ emotions directly but the holders’ opinions or feelings can be implied by contexts. Meanwhile, the online texts are full of internet lingoes and abbreviations which can not be recognised by unequipped models. In previous works, these implicit messages are discarded because distinguishing implicit subjective sentences from factual ones is a tough task for traditional methods. Last but not least, the huge amount of mass messages signifies the active demand to unsupervised and semi-supervised models to accomplish the tasks. As data grows, the manual annotation must be costly. To reduce the labor effort and imporve machine bootstrapping properly as much as possible is the key point of data preprocessing.

With these considerations, in this paper a novel emotion classification architecture is present to classify Weibo posts. We utilize a variational topic model based on LDA to reduce feature space’s complexity and train the model well. Then we choose SVR to be the classifier which can get the threshold automatically. SVR is a package in Libsvm [1] which can dynamically select the classification threshold instead of a fixed one in SVM. To obtain results with all kinds of granularities and satisfy different research needs, we expand our work with an emotion tree which contains 19 emotion classes. It is first proposed by [2]. Our goal is to build a model coping with corpus automatically with proper manual guidance.

To settle the problems raised above and apply to an actual application scenarios, our work highlights as follows: A semi-supervised model based on LDA is presented. LDA is a widely used toolkit in NLP and it is a three-layer bayesian networks which can describe the documents in a probabilistic way. In topic model, every document can be illustrated as a mixture of topic distribution while every topic is displayed as a word distribution. However, LDA will obtain too broad topics in short texts for there are not enough words to learn and build networks. Some constraints and pre-existing knowledge are prepared to guide topic model. The constraints can enforce the proper features into model because they extract prior knowledge from corpus. This constrained topic model is utilized to identify implicit emotions. This model can distinguish implicit emotions from factual ones persisting for smoothed training sets. At last, a hierarchical classification is proposed to filter a more pure corpus for each category.

The remaining parts of this paper are organized as follows. The next section is devoted to related works. Section 3 introduces the proposed approaches about detecting emotion classes. In Section 4, the experimental results are reported, evaluated and discussed. Finally, Section 5 presents our conclusions and future work.

2. Related work

The emotion classification of online reviews has been a hot topic in recent years. There are two main research lines in emotion classification: the lexicon-based ones [3, 4] and machine learning ones [5, 6, 7, 8]. The former are often based on lexicons and linguistic resources. These methods generally consider the frequencies of the opinion words nearing the aspects. Whether the feature is positive or negative relies on the opinion word nearby whose polarity is already labeled on the lexicons. Sometimes some common words will be classified to many categories or none of them. This phenomenon will lead many uncorrelated features to be gathered in the same cluster. In a way, these are rule-dependent methods providing fine-annotated lexicons.

Consequently, some researchers introduce topic model, attempting to solve this problem in a statistical way. Topic model describes a document as a component of topics each of which is a mixture of words. Topic model can be used to construct features, reduce data dimensions and form word clusters [9, 10, 11]. It is a sharp tool in NLP. LDA has been the most popular topic model since it is proposed. Due to the inception of these works, many variations have been proposed. In 2008, Mcauliffe and Blei proposed a supervised topic model sLDA [7] to predict movie ratings from reviews. And Brody and Elhadad presented an unsupervised research work [12] to extract aspects and determine semantic orientations. In 2013, topic model was first introduced to identify implicit features in Xu et al.’s work to accomplish opinion mining of product reviews [13]. Hu et al. presented an interactive topic model which allows users to iteratively refine the topics by adding constraints that enforce some sets of words to appear together in the same topic [14].

In this paper, we introduce a topic model with hard constraints to ensure each aspect relating to a single certain emotion class. Furthermore, we aim at building a more accurate training set, thus we apply the algorithm to mining the implicit aspects emotion sentences. The concept of implicit feature first appeared at Liu et al.’s work [15]. In their study, they mentioned the denifition: Implicit feature is a feature that does not appear in the review directly but can be implied, which means the feature can be deduced from the sentence. In their paper, they illustrated the concept of implicit product feature with a digital camera review: “The camera is too large”. They stated that ‘large’ is implied to the feature ‘size’ although the word ‘size’ does not appear in the comment. The feature ‘size’ is indicated by the phrase ‘too large’, so it is implicit. In this work, we incorporate the implicit aspect detection and topic model together to fullfill multiple subtasks in one procedure.

3. Methodology

3.1 Problem definition

Problem definition is formulized as follows: $D=\{S_{1},S_{2},S_{3},\cdots S_{M}\}$ denotes a post where $S_{j}$ is the $j_{th}$ sentence of it. The word $w_{jk}$ denotes the $k_{th}$ word in the definition $S_{j}=\{w_{j_{1}},w_{j_{2}},w_{j_{3}},\cdots w_{j_{N}}\}$ . $E$ is an emotion structure and it contains 19 emotion classes. Given a sentence $S$ , let

$\displaystyle\delta_{S}^{e_{h}}=\begin{cases}1&\text{if sentence }S\text{ % contains emotion }e_{h}\\ 0&\text{else}\end{cases}$

So the posts containing emotions can be defined as $D^{\prime}=\{S\mid S\in D\cap\delta_{S}^{e_{h}}=1\}$ . When given a sentence $S$ , if it contains emotions, find the emotion satisfying $e=\text{argmax }{P}(e_{i}\mid S)$ by leveraging the model, then $S$ can be classified to emotion $e$ . To achieve this goal, the following subtasks must be accomplished, including how to distinguish the sentences implying emotions from factual sentences, how to select features efficiently and how to introduce the hierarchical emotion tree into the model.

3.2 Framework

The framework is visualized in Fig. 1. First, the noises in the raw posts are deleted and preprocessing is proposed. Second, the preprocessed posts are segmented and refined as informative terms $\{w_{1},w_{2},w_{3},\cdots\}$ . After tokenizing the sentences, we extract and select features by constrained topic model and create training sets for SVR. At the same time, the pre-existing knowledge and implicit detection algorithm are incorporated to improve the accuracies. Finally, we utilize SVR to classify posts. The main flows of this framework in illustrated in detail in Sections 3.3–3.6.

Figure 1.

The architecture of proposed method.

3.3 Preprocessing

A microblog post contains not only the word strings, but also vedios, images, emotions, hashtags and so on. However, some of them are noisy for this research, so we delete the irrelevant elements which are useless for sentiment analysis in this work.

Reposted and repeated posts. The reposted posts are deleted while the users’ own comments are remained. Likewise, only one post of those repeated ones is kept.

Image and vedio. There are lots of images and vedios posted by users to illustrate their posts. However, they are helpless for this research and are definitely removed.

Username. Users often notify others by a symbol @ in their posts. For instance, “I do not agreed with you @BigEye”. ‘BigEye’ here is another user’s username. However, in this work, these usernames are definitely not affective. So the usernames and symbols are removed.

Hashtag. There are multifarious funny discussions on Weibo all the time. Users can participate in the discussion by adding the popular topics between two symbols # in the posts. For instance, “Scrambled eggs with tomatoes in which some sugar is added is my favorite. # What is the taste of Tomato scrambled eggs? Sweet or salty? #”. The topic segments are usually declarative sentences describing phenomena or questions arousing discussions. Surely they have no emotions, so the topics and symbols# are deleted.

Link. The links in the posts are linked to the further introduction of something. Obviously they contain no sentiments. These sentences starting with “http://” are deleted.

Position information. Position Information usually appears at the end of posts. These locations on the form of “I am here:” or “I am at:” are supplementaries but not for emotions. This kind of segments are removed.

Factual sentence. Factual sentences are declarative sentences which describe facts, e.g.,“Today is friday. It’s a cloudy day.” There is not any emotion in these sentences at all, so they are removed after implicit emotion detection. However, distinguishing factual sentences from implicit reviews is still a tough task.

Conflicting word or too common word. If a word relates to different emotions with high frequencies, it is defined as a conflicting word. It is possible that a word can describe several emotions or can not indicate any emotions. These words are noisy for classification and removed.

3.4 Feature etraction and selection

Choosing appropriate features is important for the following work. We first apply Part of Speech (POS) to select feature candidates. We employ the NLPIR,1 widely used segmentation toolkit in Chinese, to complete the tokenization. When people express their feelings, most of the vocabularies usually converge into nouns, verbs, adjectives and adverbs. So we select these kinds of words and remove others. Then we use the PMI and $\chi{{}^{2}}$ test to picks out the words highly related with emotions. PMI can picks out the low-frequency features. We experimentally set the threshold and collect words with high PMI values to form the low-frequency word set. These sets are prepared to guide the values in classification procedure. However, these two methods just choose the wordpairs with co-occurrence but are paralyzed with the latent semantic relations between words.

Topic model is a generative statistical model which can detect the latent sementic correlations. The constrained topic model is derived from the basic LDA. In 2003, Blei et al. proposed a generative probabilistic model named LDA [9]. LDA can depict data in a statistical way. The basic idea of LDA is that documents can be represented as mixtures over latent topics where topics are associated with a distribution over the words of the vocabulary. Then the probability of each word $w$ in document $D$ can be denoted as Eq. (1).

$\displaystyle P(e_{i}=j|\textbf{\text{e}}_{-i},\textbf{\text{w}},\alpha,\beta)% =\left({\frac{n_{-i,j}^{(w_{i})}+\beta}{\sum_{w^{\prime}}^{W}n_{-i,j}^{(w^{% \prime})}+W\beta}}\right)\left({\frac{n_{-i,j}^{(s_{i})}+\alpha}{\sum_{j}^{T}n% _{-i,j}^{(s_{i})}+T\alpha}}\right)$ (1)

$\alpha$ and $\beta$ are hyper-parameters for the sentence-emotion and emotion-word Dirichlet distributions, respectively. $e_{i}=j$ represents the assignment of the $i_{th}$ word in a sentence to emotion $j$ , $e_{-i}$ represents all the emotion assignments excluding the $i_{th}$ word. $n_{j}^{(w^{\prime})}$ is the number of instances of word $w^{\prime}$ assigned to topic $j$ and $n_{j}^{(s_{i})}$ is the number of words from document $s_{i}$ assigned to topic $j$ , the $-i$ notation signifies the counts are taken omitting the value of $e_{i}$ . Furthermore, after N iterations of Gibbs sampling for all words in all documents, the distribution $\theta$ and $\phi$ are finally estimated using Eqs (2) and (3).

$\displaystyle\phi_{j}^{(w_{i})}=\frac{n_{j}^{(w_{i})}+\beta}{\sum_{w^{\prime}}% ^{W}n_{j}^{(w^{\prime})}+W\beta}$ (2) $\displaystyle\theta_{j}^{(s_{i})}=\frac{n_{j}^{(s_{i})}+\alpha}{\sum_{j}^{T}n_% {j}^{(s_{i})}+T\alpha}$ (3)

In this work, we expand the basic LDA with pre-existing knowledge which is derived from Andrzejewski and Zhu’s work [16]. We employ the concept of Topic in Set. For this scheme, the core process of inferring topics is updated as Eq. (4). In Andrzejewski and Zhu’s work, they set constraints for every word, which narrowed down the topic searching scope, facilitated the step of inferencing latent topics. Likewise, in this work we combine emotion lexicons to form the constraints. The constraint is formulized as the indicator $\delta(cs\in E^{(i)})$ , if the word is in the topic set, then the indicator is 1, otherwise is 0. The probability is modified as Eq. (4).

$\displaystyle P(e_{i}=j|\mathbf{e}_{-i},\mathbf{w},\alpha,\beta)=\delta(s_{i}% \in E^{(i)})\left({\frac{n_{-i,j}^{(w_{i})}+\beta}{\sum_{w^{\prime}}^{W}n_{-i,% j}^{(w^{\prime})}+W\beta}}\right)\left({\frac{n_{-i,j}^{(s_{i})}+\alpha}{\sum_% {j}^{T}n_{-i,j}^{(s_{i})}+T\alpha}}\right)$ (4)

3.5 Implicit emotion detection

As Liu et al. first proposed in their work [15], sentences may imply sentiments but contain no emotions such as this post: “Tomorrow’s party is canceled. I have been waiting the whole summer for it. From now on, please do not talk to me!” There is not any emotion aspects appearing at the sentences. However, human beings can easily feel the disappointment of the reviewer. The emotion disappointment is implied in this post. Meanwhile, it is difficult to distinguish a factual sentence from an implicit emotional one for machines. Without manual annotation and priori information, machines are inclined to judge this posts as no emotions. At the same time, it is a high labor cost to annotate while the data increases. So we present an implicit aspect detection algorithm to identify these sentences and increase the accuracy of selecting useful data. Recently, some researchers commit themselvies to extract implicit aspects [4, 6, 13, 17, 18, 19, 20].

Our implicit detection method is applied to the first level’s classification. Algorithm 3.5 illustrates the processes. First, the explicit emotion sentences are selected to explicit data set $E S$ based on emotion lexicon and POS (Line 1). And the rest posts are set to the non-explicit data set NES (Line 2). Then the PMI, $\chi^{2}$ -test and Topic-in-Set methods are proposed to detect the word correlations. The prior knowledge can be integrated into the data set $P S$ (Line 3). With the guide of these correlations, topic model samples the topic distribution and word distribution of the corpus and capture the high probabilistic word pairs (Lines 4–5). Then with all of the knowledge, true valuable information is selected to train the classifiers (Line 6). Finally, for every feature $f$ in FFS and every sentence $s$ in NES, if the classifier classifies $s$ to $f$ , then sentence $s$ is the implicit review to feature $f$ (Lines 8–11).

Implicit Emotion Detection[1] Select Explicit Sentences $\rightarrow$ Explicit Set ( $E S$ ), The rest of data $\rightarrow$ Not Explicit Set (NES) Priori knowledge from ES $\rightarrow$ Prior Set ( $P S$ ) Seed Words in Topic in Set $\rightarrow$ Feature Seed Set (FSS) Explicit Topic Model ( $E S$ , $P S$ , FSS) $\rightarrow$ Explicit Model ( $E M$ ) Selecting Training Attributes $\rightarrow$ Training Set for every class ( $T S$ ) Training Classification Model by SVR $\rightarrow$ Classifier for Features ( $CA(f)$ ) every $\textit{feature}f\in\textit{FSS}$ SVR (NES, $CA(f)$ ) $\rightarrow$ Positive Results of $f$ ( $\textit{PRA}(f)$ ) $S(j)\in\textit{Set Implicit}(f)$

3.6 Emotion classification

The four-level emotion hierarchy is presented to satisfy different research granularities. The first level distinguish the emotions from the statements, which can be implemented after identifying the implicit feature detection. Then the second-layer has two semantic orientations: positive and negative. The third stage’s categories are derived from the six-emotion structure which is mostly applied in previous works. Finally, the leaf nodes refine the six-emotion to 19 classes. Every upper level’s results can generate a good corpus for its lower level. Meanwhile, every level’s results impact their subclassifications vastly. The tree is presented in Fig. 2.

Figure 2.

Emotion classification in four-level hierarchy.

SVR is a particular implementation of support vector machines (SVM), a principled and very powerful method that in the few years since its introduction has already outperformed most other systems in a wide variety of applications [21]. In this work, a SVR package in Libsvm is employed to construct the hierarchy. While training, there are two parameters to investigate in SVR: model parameter $\nu$ and classification threshold $t$ . The parameters are decided by iterations to achieve the best performance. The class with maximum distance between regression value and threshold is selected as the final decision. Algorithm 2 illustrates the procedures of obtaining these two parameters. First, the hyper-plane’s distance and class are initialized (Line 1). Then, for each potential value of model parameter $\nu$ , train the classifier and find the maximum distance and appropriate class. The parameter $\nu$ is set to the one with the maximum distance (Lines 3–13). Finally, the classification threshold is tuned to classify the instances corretly (Lines 14–16).

The procedures of obtaining classifier’s parameters.[1] initialize max_distance $=-$ 500 initialize max_class $=-$ 20 $\nu$ in range (0.1, 1, 0.1) libsvm_train ( $\nu$ ) each classifier (i) prediction $=$ classifier (i).classify () distance $=$ prediction $-$ classifier (i)_threshold distance $>$ max_distance max_distance $=$ distance max_class $=$ i $t$ in range ( $-$ 1, 1, 0.05) evaluate ( $t$ )

Table 1

Details of the microblog dataset

Emotions	The number of posts	Ratios
Happy	780	7.83
Calm	204	2.05
Wishful	366	3.67
Praiseful	129	1.30
Trustful	53	0.53
Favored	195	1.96
Shy	44	0.44
Missed	213	2.14
Panic	62	0.62
Angry	217	2.18
Frightened	70	0.70
Sad	253	2.54
Surprised	95	0.95
Disappointed	537	5.39
Guilty	69	0.69
Dissatisfied	220	2.21
Annoyed	454	4.56
Doubtful	76	0.76
Hateful	123	1.23
Emotions	4160	41.76
Neutral	5800	58.23
Implicit	1622	38.99
Explicit	2358	61.00
Total	9960	100.00

4. Experimental results

4.1 Data set description

As there is no benchmark of fine-grained classification for Chinese posts, we collect our data from Sina Weibo2 users’ posts. For data’s authenticity and practicality, we crawled posts and reviews randomly from 18:00 to 24:00 on June 5th, 2015. We collect 9976 posts after preprocessing. Three annotators marked the data separately. The disagreed annotations are finally decided by the first author. There are several principles for anotation:

The posts contains both implicit and explicit emotions are labeled as explicit. The posts with more than one emotions are deleted as noises. The implicit posts are labled seperately by three annotators and the disagreed ones are decided by first author.

There are about 39% of the posts are implicit. Meanwhile, the emotion with highest ratio is happy and it account for 7.83% of the data set. However, some emotions such as shy, hateful and guilty are much less than others. The data set is not balanced about emotions. So the hierarchical classification can prune irrelevant information and obtain a more pure training set for the further classification.

Figure 3.

Sampling performance with different iteration.

Figure 4.

Constrained topic model under different document length.

4.2 Implicit emotion detection results

Tang et al. summarized the elements impacting the performance of LDA [22]. The length of document $N$ , the number of topics $K$ , the hyper-parameters $\alpha$ , $\beta$ (describe the sparsity of topic distribution and word distribution) and the sampling iterations have strong influences to topic model. Meanwhile, Hong and Davison validate that aggregated short texts can lead topic model to superior performance [23]. The experimenatal results demonstrate that the model suffers when parameters deviate much from the real ones. which can be seen clearly in Figs 3–5. However, the topic number $K$ is fixed at 19 and we just estimate other parameters. Finally, the best performance of constrained topic model is achieved when the sampling iteration is set at 20, the average length of each document at 550, $\alpha=2.63$ and $\beta=0.015$ and equipping emotion lexicons and constraints.This improves constrained topic model’s F-measure up to 0.851. Only the binary classification in level 1 are tested. However, the superclass’s good performance can improve their children’s.

Four test groups are set for comparison. F1-score are introduced to measure the results for it is a good standard for skewed classes problem. Group 1 applies the basic LDA to detect the implicit emotion posts. Group 2 introduces a dictionary containing 50,000 emotion words base on Group 1. Group 3 uses Topic in Set constraints and Group 4, the constrained topic model, adds on all the pre-existing knowledge. We apply the investigation to test implicit posts. Seen from the Fig. 3 we can get the conclusion that prior kownledge, including emotion lexicons and constraints, upgrades topic model perfectly.

Following the anotating principles, we manually annotate the data. There are 1622 posts are implicit. When users express their feeling, their expressions often converge to a specific vocabulary set. We list the high probabilistic words under each emotion in our corpus estimated by constrained topic model. These words can reflect the close correlations to these emotions. However, the corpus is just a small part of the short text online, the word associations might deviate from the reality. So the emotion lexicons and pre-existing knowledge enrich the representations of the model. Twenty top words are selected to help forming the training attributes and part of them are listed in the following ranking table.

emotions	high probabilistic words under emotions
Happy	mommy/stand by/fate/favorite/lucky/assume
Wishful	mommy/New Year’s Day/journey/loving heart
Trustful	steadfast/hope/sister/attitude/firmly/happy
Praiseful	appreciate/hope/pleasant/charming/novel/revere
Calm	unkown/gentle/true to form/quietness/spring breeze
Favoured	hope/snack/imagine/male/cheerful/sufficient
Disgusted	hate/tire/weary/geezer/girl/transpond
Doubtful	ignorant/life/reflect on/anger/mood/persuade
Dissatisfied	punctuality/thanks/eutrophication/deceive
Annoyed	tired/hard/numbness/overcome/torment/be on duty
Angry	outburst/sustain/deceive/whereby/feel/local police
Shy	shame/disgrace/endure/officer/university/China
Frightened	upset/return home/fall on evil day/life
Panic	scare/college entrance examination/teacher/beauty
Surprised	amazed/film/colleague/run away/world/wonder
Missed	figure/lucency/enjoyment/stop/crowd/take away
Disappointed	flabby/deceive/kinsfolk/misfortune/go abroad
Sad	cycle/heart/fragile/girl/sadness/regret/love
Guilty	ring up/sweetheart/temperament/disappointment

Table 2
Group setting

Group	LDA	Hrch. classification	Feature selection	Impli. detection
G1	${\surd}$	$\times$	$\times$	$\times$
G2	${\surd}$	${\surd}$	$\times$	$\times$
G3	${\surd}$	${\surd}$	${\surd}$	$\times$
G4	${\surd}$	${\surd}$	${\surd}$	${\surd}$

Table 3

Levels 1 and 2 results

Level	Class	Precision			Recall			F1-score
		G2	G3	G4	G2	G3	G4	G2	G3	G4
1	Emotional	0.622	0.854	0.871	0.823	0.764	0.833	0.709	0.814	0.854
2	Negative	0.874	0.921	0.924	0.769	0.902	0.934	0.823	0.921	0.933
	Positive	0.737	0.867	0.913	0.853	0.889	0.901	0.789	0.895	0.899

Figure 5.

Constrained topic model with different $\alpha$ and $\beta$ .

Table 4

Levels 3 and 4 results

Level	Class	Precision			Recall			F1-score
		G2	G3	G4	G2	G3	G4	G2	G3	G4
3	Angry	0.734	0.931	0.942	0.624	0.789	0.833	0.656	0.861	0.882
	Distressed	0.787	0.842	0.888	0.772	0.970	0.942	0.752	0.900	0.921
	Disgusted	0.623	0.929	0.908	0.710	0.821	0.900	0.689	0.863	0.899
	Fearful	0.697	0.875	0.951	0.399	0.793	0.823	0.663	0.872	0.891
	Fond	0.751	0.878	0.869	0.573	0.873	0.944	0.667	0.878	0.925
	Joyful	0.767	0.938	0.969	0.863	0.911	0.887	0.783	0.900	0.920
	Surprised	0.111	0.945	0.956	0.321	0.833	0.859	0.000	0.884	0.898
4	Angry	0.722	0.977	0.943	0.532	0.765	0.833	0.543	0.765	0.832
	Annoyed	0.837	0.983	0.968	0.753	0.824	0.933	0.797	0.962	0.940
	Calm	0.865	0.947	0.965	0.728	0.835	0.866	0.808	0.892	0.912
	Doubtful	0.735	0.869	0.879	0.423	0.889	0.863	0.547	0.877	0.867
	Disgusted	0.778	0.979	0.977	0.545	0.922	0.934	0.628	0.924	0.954
	Disappointed	0.792	0.935	0.968	0.932	0.911	0.904	0.865	0.943	0.950
	Dissatisfied	0.507	0.853	0.872	0.765	0.963	0.932	0.645	0.900	0.893
	Frightened	0.870	0.962	0.982	0.843	1.000	1.000	0.831	0.978	0.991
	Favoured	0.705	0.946	0.898	0.662	0.954	0.972	0.683	0.950	0.934
	Guilty	0.700	0.965	0.837	0.304	0.922	0.913	0.423	0.939	0.793
	Happy	0.933	0.956	0.963	0.969	0.909	0.993	0.951	0.972	0.978
	Missed	0.855	0.927	0.937	0.806	0.925	0.963	0.796	0.912	0.948
	Surprised	0.000	0.919	0.952	0.010	0.817	0.875	0.000	0.883	0.898
	Panic	0.824	0.991	0.991	0.901	0.921	0.976	0.823	0.900	0.989
	Praiseful	0.515	0.942	0.925	0.569	0.977	0.942	0.523	0.961	0.935
	Sad	0.875	0.908	0.928	0.711	0.946	0.954	0.823	0.921	0.933
	Shy	0.867	0.914	1.000	0.877	0.988	0.987	0.837	0.993	0.993
	Trustful	0.429	0.934	0.978	0.373	0.826	0.967	0.275	0.834	0.972
	Wishful	0.900	0.961	0.987	0.885	0.949	0.943	0.897	0.909	0.961

Table 5

Hierarchical results

Class	Precision				Recall				F1-score
	G1	G2	G3	G4	G1	G2	G3	G4	G1	G2	G3	G4
Sad	0.205	0.431	0.756	0.557	0.231	0.489	0.644	0.674	0.232	0.462	0.680	0.635
Guilty	0.125	0.258	0.712	0.863	0.288	0.123	0.600	0.687	0.241	0.238	0.650	0.772
Disappointed	0.689	0.273	0.603	0.689	0.045	0.387	0.623	0.589	0.088	0.327	0.603	0.639
Missed	0.247	0.380	0.548	0.765	0.368	0.520	0.479	0.491	0.335	0.447	0.497	0.579
Surprised	0.177	0.254	0.955	0.908	0.174	0.011	0.663	0.685	0.122	0.000	0.774	0.749
Panic	0.414	0.567	0.825	0.887	0.675	0.358	0.639	0.627	0.511	0.460	0.707	0.722
Frightened	0.236	0.315	0.754	0.829	0.450	0.311	0.532	0.633	0.289	0.292	0.632	0.773
Shy	0.143	0.281	0.933	0.901	0.111	0.489	0.432	0.692	0.267	0.199	0.771	0.823
Angry	0.452	0.528	0.917	0.901	0.619	0.453	0.779	0.732	0.532	0.429	0.915	0.856
Dissatisfied	0.122	0.180	0.688	0.573	0.203	0.324	0.715	0.794	0.153	0.288	0.756	0.689
Annoyed	0.325	0.328	0.854	0.819	0.526	0.365	0.577	0.719	0.421	0.429	0.713	0.755
Doubtful	0.109	0.279	0.847	0.478	0.145	0.178	0.509	0.676	0.099	0.251	0.627	0.606
Disgusted	0.289	0.422	0.911	0.827	0.329	0.339	0.689	0.826	0.330	0.433	0.789	0.828
Favoured	0.289	0.177	0.620	0.489	0.176	0.289	0.642	0.624	0.271	0.271	0.614	0.496
Trustful	0.280	0.122	0.567	0.833	0.287	0.287	0.632	0.600	0.322	0.080	0.425	0.709
Praiseful	0.217	0.276	0.796	0.891	0.278	0.101	0.789	0.709	0.132	0.121	0.736	0.735
Wishful	0.327	0.575	0.795	0.781	0.198	0.741	0.861	0.907	0.113	0.645	0.836	0.824
Calm	0.241	0.375	0.867	0.855	0.329	0.577	0.539	0.672	0.381	0.528	0.649	0.711
Happy	0.787	0.342	0.776	0.842	0.172	0.634	0.672	0.754	0.261	0.507	0.645	0.773

4.2.1 Classification results

The whole model’s performance is reported in this section. We set four experimental groups (G1–G4). As shown in Table 2, Group 1 is only applied with basic LDA and flat classification. Group 2, in contrast, is applied with implicit feature detection. Group 3 is optimized by hierarchical emotion tree on the basis of Group 2. All the optimizations are adopted by Group 4. All of the results are the average values of 10 independent runs. In hierarchical classification, each test example is classified from the top level successively to the bottom level. The results, which are also level 3’s and level 4’s results, are presented in Table 4. Hierarchical classification can improve the performance of more than half of the classifiers. In flat classification, it is hard to distinguish the examples belong to its class and others when given the whole dataset for each classifier. Meanwhile, hierarchical classification cuts off most irrelevant examples by upper-level classifiers and makes it easier for lower-level ones to classify. As we can see from Table 4, the performance becomes better when applied with implicit feature selection and psychological emotion dictionary. The bold-font values represent the best results of the experiments in the result tables.

Every single classifier’s performance is illustrated by the level results. Every classifier is trained to identify instances between itself and its siblings. For example, when disappointed, guilty, sad and missed’s classifiers are trained, the training examples are inherited from their superclass distressed.

Table 3 presents the binary classification results on levels 1 and 2. The results show that Group 4, equipped with all the optimization methods, performs the best. Group 4’s F1-scores on Level 2 are almost 0.9 and its F1-score achieves around 0.85 on Level 1. The feature selection method’s efficiencies can be proved by the comparison of Groups 2 and 3. We can see that the latter outperforms the former, which filters many noises and reserves highly correlated features.

Table 4 displays the performance of third stage and the buttom. As seen from the table, a conclusion can be drawn that as the classes grows, the hierarchical classification has the main impact to experiment result while the implicit feature detection is not important so much as it has played a role in the first level.

The whole hierarchy’s result is proposed in Table 5. The difference between level results and hierarchy results is that the hierarchy is tested using the whole corpus and passes through this four-level structure from root to leavies. This result shows that in multi-classification problem, it weakens the classifiers as the classes grow and the dispersion of data to each category.

5. Conclusion

This paper has presented a hierarchical classification strategy to classify the emotions in micro-blog posts. Unlike many existing works, it applies a four-stage emotion structure, combining the proposed methods to obtain different emotion granularities. This classification method can decrease the inaccuracy caused by data imbalance. Meanwhile, it applies an implicit emotion detecting algorithm to identify the implicit emotion sentences. As we count from the data, there is almost 40% of the setences are implicit. The implicit detection algorithm is incorperated in the constrained topic model with pre-existing knowledge, which is called to extract features and reduce the dimensions. Results show that excellent performance is exhibited. As observed in the experiments, there are three aspects which can be improved in future work: 1) Users tend to display their funny experiences and positive attitudes rather than negatives. Consequently, this emotional imbalance calls for more accuracy training sets. Hierarchical models can generate the fine training data and meet this requirement. 2) It classifies the emotions at sentence level in this paper. However,the main emotion of the entire post may different from its sentences. How to get the correct emotions of the users will be solved in future work. 3) As seen from Table 1, the data size is not large enough which may have a main influence to the experimental results. More data are needed to validate the experiment.

Footnotes

http://ictclas.nlpir.org/.

http://weibo.com/.

Acknowledgments

This work is sponsored by National Natural Science Foundation of China (Grant No: 61673235).

References

Chang

C.-C.

and Lin

C.-J.

, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology 2 (2011), 27:1–27:27, software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm.

Yang

and Wang

, Hierarchical emotion classification and emotion component analysis on chinese micro-blog posts, Expert Systems with Applications, 2015.

Taboada

Brooke

Tofiloski

Voll

and Stede

, Lexicon-based methods for sentiment analysis, Computational Linguistics 37(2) (2011), 267–307.

Ding

Liu

and Yu

P.S.

, A holistic lexicon-based approach to opinion mining, in: Proceedings of the 2008 International Conference on Web Search and Data Mining, ACM, 2008, pp. 231–240.

Brooke

Tofiloski

and Taboada

, Cross-linguistic sentiment analysis: From english to spanish, RANLP, 2009, 50–54.

Hogenboom

Frasincar

de Jong

and Kaymak

, Using rhetorical structure in sentiment analysis, Communications of the ACM 58(7) (2015), 69–77.

Mcauliffe

J.D.

and Blei

D.M.

, Supervised topic models, Advances in Neural Information Processing Systems, 2008, 121–128.

Wan

, Using bilingual knowledge and ensemble techniques for unsupervised chinese sentiment analysis, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2008, pp. 553–561.

Blei

D.M.

A.Y.

and Jordan

M.I.

, Latent dirichlet allocation, The Journal of Machine Learning Research 3 (2003), 993–1022.

10.

Fei-Fei

and Perona

, A bayesian hierarchical model for learning natural scene categories, in: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2. IEEE, 2005, pp. 524–531.

11.

Quelhas

Monay

Odobez

J.-M.

Gatica-Perez

Tuytelaars

and Van Gool

, Modeling scenes with local descriptors and latent aspects, in: Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, vol. 1. IEEE, 2005, pp. 883–890.

12.

Brody

and Elhadad

, An unsupervised aspect-sentiment model for online reviews, in: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 2010, pp. 804–812.

13.

Zhang

and Wang

, Implicit feature identification in chinese reviews using explicit topic mining model, Knowledge-Based Systems 76 (2015), 166–175.

14.

Boyd-Graber

Satinoff

and Smith

, Interactive topic modeling, Machine Learning 95(3) (2014), 423–469.

15.

Liu

and Cheng

, Opinion observer: analyzing and comparing opinions on the web, in: Proceedings of the 14th International Conference on World Wide Web, ACM, 2005, pp. 342–351.

16.

Andrzejewski

and Zhu

, Latent dirichlet allocation with topic-in-set knowledge, in: Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, Association for Computational Linguistics, 2009, pp. 43–48.

17.

Hai

Chang

and Kim

J.-J.

, Implicit feature identification via co-occurrence association rule mining, in: Computational Linguistics and Intelligent Text Processing, Springer, 2011, pp. 393–404.

18.

Wang

and Wan

, Implicit feature identification via hybrid association rule mining, Expert Systems with Applications 40(9) (2013), 3518–3531.

19.

Schouten

and Frasincar

, Finding implicit features in consumer reviews for sentiment analysis, in: Web Engineering, Springer, 2014, pp. 130–144.

20.

Zhang

Wang

Sun

and Deng

, Grasp the implicit features: Hierarchical emotion classification based on topic model and svm, in: Neural Networks (IJCNN), 2016 International Joint Conference on, IEEE, 2016, pp. 3592–3599.

21.

Clarke

S.M.

Griebsch

J.H.

and Simpson

T.W.

, Analysis of support vector regression for approximation of complex engineering analyses, Journal of Mechanical Design 127(6) (2005), 1077–1087.

22.

Tang

Meng

Nguyen

Mei

and Zhang

, Understanding the limiting factors of topic modeling via posterior contraction analysis, in: Proceedings of the 31st International Conference on Machine Learning, 2014, pp. 190–198.

23.

Hong

and Davison

B.D.

, Empirical study of topic modeling in twitter, in: Proceedings of the First Workshop on Social Media Analytics, ACM, 2010, pp. 80–88.