Concept-LDA: Incorporating Babelfy into LDA for aspect extraction

Abstract

Latent Dirichlet allocation (LDA) is one of the probabilistic topic models; it discovers the latent topic structure in a document collection. The basic assumption under LDA is that documents are viewed as a probabilistic mixture of latent topics; a topic has a probability distribution over words and each document is modelled on the basis of a bag-of-words model. The topic models such as LDA are sufficient in learning hidden topics but they do not take into account the deeper semantic knowledge of a document. In this article, we propose a novel method based on topic modelling to determine the latent aspects of online review documents. In the proposed model, which is called Concept-LDA, the feature space of reviews is enriched with the concepts and named entities, which are extracted from Babelfy to obtain topics that contain not only co-occurred words but also semantically related words. The performance in terms of topic coherence and topic quality is reported over 10 publicly available datasets, and it is demonstrated that Concept-LDA achieves better topic representations than an LDA model alone, as measured by topic coherence and F-measure. The learned topic representation by Concept-LDA leads to accurate and an easy aspect extraction task in an aspect-based sentiment analysis system.

Keywords

Aspect extraction Babelfy latent Dirichlet allocation semantic knowledge topic modelling

1. Introduction

The Internet, which has become an important source of information to millions of people in the world, has opened the new doors for its users to share their opinions about their purchases. In this sense, online review web sites with millions of users are used to express opinions and feelings towards products, companies, services and so on, and anyone can gather others’ opinion through these mediums [1]. Everywhere, users are accessing these web sites. Therefore, a growing volume of the data for analysis is piling up and intensively analysed as it exerts a powerful effect on consumers [2].

However, there is a constant and unavoidable increase in online user reviews, and, for the same product, these reviews could be posted on different web sites. Thus, for average humans and companies to extract the required information is a challenging task. In order to help both the consumers for the product they wish to purchase and the companies for their brands, there is a need for automatic analysis of such product reviews [3].

The sentiment analysis is a computational task for analysing opinions, sentiments and evaluations of people about products, topics, individuals and their attributes [4]. Recently, understanding people’s opinion from written texts via machines has become a hot research topic. This is because opinions are central to almost all human activities and are essential parts of our decision making process. The purpose of the sentiment analysis is to determine opinions through opinionated texts.

Sentiment analysis is mainly done at three different levels: document level, sentence level and aspect level. Document-level analysis is based on the overall sentiments of a review. For example, in a restaurant review, the aim is to know a general sentiment (good or bad) about the restaurant through the whole document. Sentence-level analysis is used for learning general sentiment from opinionated sentences. Although these two analyses give general sentiment about a product as mentioned above, however, with this result, we cannot say that all the product specifications are poor or good. Thus, there is a need for fine-grained analysis; as a result, aspect-level analysis has gained popularity. Aspect, which expresses the sentiment, is anything that defines and completes a product; sentiment is positive or negative feeling about an aspect [2]. In aspect-level sentiment analysis, sentiments are individually assigned to each aspect.

The automatic extraction of aspects is a very crucial step in aspect-level sentiment analysis. There are many methods of aspect extraction, which include frequent noun and noun phrases–based methods [5 –9], rule-based methods [10 –12], supervised learning [13], deep learning [14 –17] and topic models [2,18 –20]. Among these methods, topic models, such as latent Dirichlet allocation (LDA), have the ability to discover latent topics that are of extensive interest.

LDA, which is a probabilistic topic model, is based on the idea that a document is a mixture of many latent topics and each topic has a probability distribution over words [21]. A topic is described as a basic idea discussed in the whole document. One assumption that LDA makes is the ‘bag-of-words’, so the model does not consider the semantic structure of the documents.

In this study, to overcome the shortfall of LDA, we incorporate semantic knowledge into a model for aspect-based sentiment analysis. For this purpose, instead of using a bag-of-words, a bag of {words + concepts + named entities} is used. Concepts are defined as the units of knowledge, where each unit contains a unique meaning [22]. Named entities are defined as the names of real-world objects such as a person, organisation or location. In order to extract the concept and named entity, Babelfy, which carries out both multilingual word sense disambiguation (WSD) and entity linking (EL), is used [23]. Babelfy is based on the BabelNet, which is an integration of Wikipedia and WordNet, a multilingual semantic [24]. More meaningful topics are intended to obtain the related concepts and named entities. In this study, the proposed approach is empirically proved to be an effective unsupervised method for aspect-based sentiment analysis.

The rest of the article is organised as follows. Section 2 summarises the literature review about LDA. Section 3 describes the background knowledge about Babelfy and LDA. The proposed approach is exemplified in Section 4. The dataset, evaluation measures and evaluation results based on topic coherence and F-Measure are given in Section 5. Finally, discussion and conclusions for the future work are summarised in Section 6.

2. Literature review

The studies on aspect-based sentiment analysis have attracted a great attention in recent years owing to their ability to give sentiments separately for each product aspect. In this analysis, the vital step is the aspect extraction, and to design a powerful sentiment analysis system, aspect extraction process should be carried out successfully.

Aspect extraction that is one of the core tasks of sentiment analysis has been studied by many researchers [4,25,26]. Furthermore, the ongoing research has recently started addressing this problem by using topic models and its variants.

Titov and McDonald [27] considered two distinct types of topics, that is, global and local. They assumed that a word in a document is sampled either from a mixture of global topics or from a mixture of local topics. They devised Multi-Grain LDA (MG-LDA) to model local and global topics for extracting product aspects. With the local topics, they intended to capture rateable aspects, while with global topics, they intended to capture product properties. In this experiment, they used a set of 27,564 reviews of hotels taken from TripAdvisor.com. PRanking algorithm that is a perceptron-based online learning method was used as an aspect rater method. For each aspect i, a rating score was calculated by PRanking. The authors extracted topics using both LDA and MG-LDA and then combined them with unigram, bigram and trigrams in the text to represent the input features. The Gibbs sampling algorithm, for both MG-LDA and LDA, was applied and executed for 800 iterations. All experiments were evaluated with an average ranking loss that defines the average difference between the actual and predicted rating values for a given N test instances. They compared four models for the ranking loss. The first model gives a rating of 5 to each aspect, the second model applies PRanking over input features of raw data, the third and fourth models include words obtained from LDA and MG-LDA, respectively, and then apply PRanking algorithm. In making a conclusion, their method surpasses the remaining three methods. Lin and He [28] implemented an LDA-based fully unsupervised Joint Sentiment/Topic Model (JST) to extract sentiments and aspects from movie reviews simultaneously. With this model, they also classified sentiment polarities of reviews. Brody and Elhadad [29] proposed Local LDA as an unsupervised method for aspect extraction. Mutual Information was used for representative words of each aspect. For example, the representative words for ‘value’ are ‘portions, quality, worth, size, cheap and so on’. Adjective extraction was realised with conjunctions, negations and polarities of these adjectives, which were determined with a conjunction graph. Wang [30] developed a semi-supervised topic model, Co-LDA, in which aspects and sentiments are modelled simultaneously. This model considers sentiment LDA and topic LDA. Jo and Oh [31] assumed that words in the same sentence are under the same topic with Sentence LDA. Then they extended Sentence LDA, called Aspect and Sentiment Unification Model (ASUM), to model aspects and sentiments together and obtained aspect sentiment pairs. Xianghua et al. [32] extracted aspects from Chinese reviews. In order to extract global and local topics, they utilised LDA and sliding window, respectively.

Bagheri et al. [33] preferred Aspect Detection Model based on LDA (ADM-LDA). In this model, they used Markov Chain instead of the ‘bag-of-words’ assumption. Wang et al. [34] devised two new semi-supervised methods called Fine-Grained Label LDA (FL-LDA) and Unified Fine-Grained Label LDA (UFL-LDA). In FL-LDA, the aspect seed lexicon was utilised to extract the aspects in the reviews, and in UFL-LDA, unlabeled documents were used to extract high frequency aspects. Zheng et al. [35] developed Appraisal Expression Patterns LDA (AEP-LDA) for extracting product aspects from restaurant, hotel, MP3 player and camera reviews. They assumed that words in the same sentence were covered under the same topic. Yin et al. [36] proposed a new LDA-based approach, that is, Dependency Topic Affects Sentiment LDA (DTAS), where they ignored ‘bag-of-words’ and preferred the Markov Chain. Poria et al. [37] utilised Sentic-LDA, which was an improved variant of the LDA that incorporated semantic similarity. This method offered better performance compared with the baselines. Yang et al. [38] implemented CAT-LDA, which was based on LDA and used two-layer categorical information. By using the hierarchical relation between products, they constructed a general category from a subcategory. In their empirical evaluation, they preferred five general categories of Amazon.com Review Dataset. Shams and Baraani-Dastjerdi [39] used Enriched LDA (ELDA) to extract aspects by combining word co-occurrence as prior knowledge with LDA. The ELDA was evaluated with some English and Persian datasets and showed reasonable accuracy. Unlike English and other languages, there is only one study that applied the LDA method in Turkish documents. Ekinci and Ilhan [1,40] applied the LDA model for restaurant reviews in English and hotel reviews in Turkish. Besides, Atıcı et al. [41] used the LDA model to determine complaints and dissatisfactions about products, services or companies from a Turkish complaint dataset.

3. Background

3.1. Babelfy

In natural language processing (NLP), word sense ambiguity (WSA) causes poor performance and serious problems with annotation [42]. Thus, there is a need for a powerful approach to this problem.

Babelfy is the first approach that performs both multilingual WSD and EL at the same time [22]. It uncovers semantic relations between word meanings and named entities by using BabelNet [22,23]. BabelNet is a multilingual semantic network with 9 million concepts and named entities in 50 languages and lexicalisations and glosses for them. It takes advantage of WordNet, Wikipedia, OmegaWiki, Open Multilingual WordNet and Wiktionary for annotation. Babelfy completes a task in following steps: (1) for exact matching, all possible meanings of all words in a sentence are considered; (2) a partial matching is realised; (3) in order to solve ambiguity problem, all the candidate meanings are linked with each other; (4) a dense sub graph is extracted from connections; (5) at the final step, most suitable meanings are chosen; and (6) the text in any language is disambiguated [43].

3.2. LDA

Probabilistic topic modelling methods have become an important field of research and have gained a great attention in machine learning and text mining applications in recent years [21,44 –46]. In fact, the probabilistic topic modelling methods are defined as a group of algorithms that discover hidden thematic knowledge in document collections by converting this knowledge into small dimensions [47]. In these methods, LDA is very popular because of its high success rate.

LDA, which is based on the bag-of-words assumption, is fully unsupervised and does not need prior knowledge. It is described as a generative probabilistic model for collections of discrete data such as text corpora by Blei et al. [45]. The generative model specifies document creation by using latent variables. The basic idea behind LDA is that documents in a collection exhibit multiple latent topics and a topic has a probability distribution over words. A topic can be defined as a collection of words that frequently occur together and are related to the same subject. In this model, it is assumed that, given the parameters, the words in a review are independent, which is known as the ‘bag-of-words’ assumption in NLP. The generative model and posterior distribution of LDA is shown in Figure 1.

Figure 1.

Generative model for LDA.

In LDA, documents are mixtures of latent topics and each of the words that compose the document is selected from one of these topics. The topics have probability distributions over words that are coming from a fixed dictionary. At first, the generative model begins by sampling the words under a topic. In the second step, each topic is sampled from a document, and, consequently, topic proportions for each document are obtained. In the last step, each word in a document is drawn from one of the topics. Word distribution over topics and topic distribution over documents are obtained by using the Dirichlet distribution, which is a prior conjugate for multinomial.

The graphical model of the LDA is represented with plate notation. The plate notation for LDA describes the random variables and explains how these variables are generated from the propagation along the directional edges for the observed data. The plate notation of the LDA is given in Figure 2.

Figure 2.

Plate notation for LDA.

In Figure 2, $M$ is the total number of documents, $K$ is the number of latent topics and $V$ is the total word count in the vocabulary. $α$ and $β$ are the Dirichlet parameters. $θ$ represents multinomial distribution of topics. $φ$ represents the multinomial distribution of words. $N_{m}$ is the number of words in the $m th$ document. $z_{m, n}$ is the latent topic of word n in the $m th$ document and $w_{m, n}$ is the $n th$ word in the $m th$ document.

In the given generative model, plates are used for indicating replicated variables. Nodes are random variables; edges indicate dependencies between these nodes. While the shaded nodes are observed variables, the non-shaded nodes are hidden variables. According to the graphical model given in Figure 2, the joint distribution of all hidden and observed random variables is given below

p (φ, θ, z, w) = (Π_{k = 1}^{K} p (φ_{k} | β)) (Π_{m = 1}^{M} p (θ_{m} | α)) (Π_{n = 1}^{N} p (z_{m, n} | θ_{m}) (w_{m, n} | z_{m, n}, φ_{k}))

As mentioned above, the main aim of the LDA is to obtain the model parameters. For this purpose, the posterior distribution given below is used

p (φ_{1 : K}, θ_{1 : M}, z_{1 : M} | w_{1 : M}) = \frac{p (φ_{1 : K}, θ_{1 : M}, z_{1 : M}, w_{1 : M})}{p (w_{1 : M})}

The joint probability in the numerator can easily be determined for any combination of the hidden variables. The denominator, which is the marginal probability of the observed data, is intractable to compute because it represents the probability of seeing the observed corpus under any topic model, and the possible topic models are quite large. Thus, for approximating this posterior, variational expectation maximisation or the Collapsed Gibbs Sampling (CGS) methods are preferred. In this study, to discover the latent topics, CGS is used.

3.3. CGS

CGS is a special kind of Markov Chain Monte Carlo (MCMC) sampling, which is used for the posterior distribution in the Bayesian inference and provides information about distributions. CGS was first introduced by Griffiths and Steyvers in 2004 [48]. It is performed to approximate the intractable statistics in generative models like LDA [48,49].

In this method, only the latent variable z is sampled, θ and φ are marginalised. Sampling is realised by calculating the probability that the current word is assigned to each topic and conditioned on the topic assignments to all other words, which are accepted as model parameters [21]. This process is applied iteratively for each word in the document and each word in the document collection based on the equation below

p (z_{i} = j | z_{- i}, w_{i}, α, β) = \frac{n_{ij} + α}{N_{i} - 1 - K α} \frac{n_{w_{i} j} + β}{\sum_{w \in V} n_{wj}}

In the above equation, $z_{i} = j$ shows the topic assignment of word $i$ to topic $j$ , $z_{- i}$ defines the topic assignments of all other words. $n_{ij}$ represents the number of times for a word in document $i$ is assigned to topic $j$ . $N_{i} - 1$ is the total number of words except for the current one in document $i$ . While $n_{w_{i} j}$ is the number of times word $i$ is assigned to topic $j$ , $n_{wj}$ is the number of times a word, except $i$ , is assigned to topic $j$ . $K$ is the number of latent topics, $V$ is size of the vocabulary and $α$ and $β$ are the Dirichlet parameters. The first multiplier in this equation is used to learn the relevance of each topic to the current document and to update $θ$ . The second one is used for learning the relevance of each topic to the current word and updating $φ$ . CGS is executed by predefined iterations.

4. Proposed approach

A crucial aspect of NLP is to describe semantically well-represented word vectors. LDA clusters co-occurred words together by ignoring semantic relationships between them. In order to handle this lack of semantic information, we propose a semantic word enrichment method by expanding reviews with concepts and named entities obtained from Babelfy. The current study has been inspired by our analysis that some semantically related words are not being included in the same topic in LDA. For an example, if the document collection is represented by the bag-of-words model, then ‘waiter (A person whose occupation is to serve at table) (as in a restaurant)’ and ‘waitress (a woman waiter)’ cannot be included in the same topic. However, by adding concepts of the ‘waiter’ and ‘waitress’, which are ‘person’ and ‘restaurant’ for waiter; ‘woman’ and ‘waiter’ for waitress, we obtain a more accurate topic which includes ‘waiter’ and ‘waitress’ together. The major assumption of our model is that the words that indicate the same concepts tend to have a similar meaning. Therefore, for extracting concepts and named entities based on the true sense of the words, Babelfy is used in this study. In addition, when we considered the successful and extensive usage of Babelfy for a WSA problem in NLP, this interface also offers us an advantage for solving the WSA problem for topic modelling.

The proposed model includes the following steps: (1) multi-words are learned from an original dataset by using Babelfy; (2) stemming, converting lowercase and stop word elimination are applied to the dataset; (3) Babelfy is used to extract the concepts and named entities from the pre-processed dataset; (4) the pre-processed dataset and the concepts and named entities are combined to create a final dataset; and (5) LDA is applied to obtain the semantic topics from the final dataset.

According to the steps given above, we first explain the multi-word extraction using Babelfy. Then the basic pre-processing steps such as stemming, converting lowercase and stop word elimination are presented. The Babelfy-based concept and named entity extraction are introduced in the next section. The proposed LDA is presented at the end. The basic framework of the proposed approach is illustrated in Figure 3.

Figure 3.

Framework of the proposed approach.

In user reviews, the product aspects can either be single worded or multi-worded, so for a depth sentiment analysis, multi-word aspects besides single-word aspects must be detected [50]. For this purpose, the first aim is to extract all the multi-words from the reviews. When the reviews are examined, four types of multi-words are observed: multi-words taking place in the dictionary, domain-based multi-words, multi-word named entity phrases and misspelled compounds. In a review sentence, ‘… it is the one with an open English muffin’, ‘English muffin’ is a multi-word and dictionary contains this word. Domain-based multi-words, such as ‘mustard sauce’, do not exist in the dictionary, ‘mustard’ and ‘sauce’ come together based on a domain and content of the sentence. Like ‘mustard sauce’, ‘poached egg’ in the sentence ‘My girlfriend had a couple of poached eggs with spinach and cheese’ is a domain-based multi-word. In the sentence ‘… along Washington Boulevard’, ‘Washington Boulevard’ is the multi-word named entity phrase. In user reviews, there are a lot of misspellings, for example, the word ‘seafood’ is written as ‘sea food’. By using Babelfy and its properties, all type of multi-words can be easily extracted from user reviews. In our study, 1982 multi-words were extracted efficiently from the restaurant reviews.

In the second stage, stemming is applied to extract word radicals contained in the dataset rearranged with multi-words. The morphological variants of a word are reduced by stemming. For this task, Stanford CoreNLP (see https://stanfordnlp.github.io/CoreNLP) library is preferably used. Then, all uppercase letters are converted to lowercase because of case sensitivity. Stop word elimination is performed to remove irrelevant words such that Babelfy does not extract concepts and named entities for them. At last, all punctuations are removed from the dataset. We use the OpenNLP (see http://opennlp.apache.org) for sentence detection.

In NLP studies, automatic acquisition of meaning from a text is the major task due to ambiguity [51]. Thus, accurate concept and named entity extraction is very crucial for capturing true meaning and semantically related topics. Even in this stage, Babelfy is used with following extraction steps: (1) concepts and named entities in the text are linked to a set of vertices by using a lexicalised semantic network, (2) possible meanings of linkable fragments extracted from text are determined by using semantic network and (3) convenient meanings for each fragment are selected by using dense sub graph [51]. Concept-LDA which is applied to a final dataset obtained from the combining dataset step is presented in Figure 4.

Figure 4.

Concept-LDA model.

In Figure 4, $E$ is the number of words, concepts and named entities in the review; ‘ $concept$ ’ is one of the concepts of word $w$ and ‘ $named entity$ ’ is one of the named entities of word $w$ . In Concept-LDA, each word, concept and named entity, from each document, are drawn from one of the topics. This is the only difference from the LDA model.

5. Evaluation

In this section, we describe the evaluation of Concept-LDA and compare it with LDA both qualitatively and quantitatively. For the qualitative analysis, we present aspects generated by both models in terms of semantic relation and ability to capture details. For the quantitative analysis, we use topic coherence and F-Measure as a comparing measure. In the following section, dataset, evaluation measures and evaluation results are discussed.

5.1. Dataset

In experiments, we use 10 different public datasets. As the first dataset, we select the subset of a restaurant dataset from the popular web site Yelp (see http://www.yelp.com). The reviews in the selected subset are in American (New) category. For the remaining datasets, we select 9 domains out of 50, which were crawled from Amazon.com [52]. The standard pre-processing steps which are realised in the NLP works are applied to datasets as explained in Section 4. After applying these steps, the final datasets are enriched by expanding features with concepts and named entities as deeper semantic knowledge. The original and final datasets are enlisted in Table 1.

Table 1.

Details of the datasets.

Domain	Dataset	Number of reviews	Number of sentences	Number of words	Number of multi-words	Total number of words
Restaurant	Original	2647	33652	5451	1982	88779
Restaurant	Final	2647	33652	14025	6784	383126
Alarm clock	Original	5113	5113	693	385	12501
Alarm clock	Final	5113	5113	2403	1186	44797
Amplifier	Original	5731	5731	922	712	17404
Amplifier	Final	5731	5731	3235	2081	71100
Battery	Original	4056	4056	548	251	9592
Battery	Final	4056	4056	1901	839	35860
Blu-ray player	Original	9170	9170	1040	883	26991
Blu-ray player	Final	9170	9170	3296	2373	116180
Cable modem	Original	5754	5754	690	457	15282
Cable modem	Final	5754	5754	2375	1376	61787
Camera	Original	8958	8958	1217	980	27290
Camera	Final	8958	8958	3904	2601	103623
Car stereo	Original	5587	5587	760	518	15928
Car stereo	Final	5587	5587	2640	1554	64915
Cell phone	Original	5713	5713	923	608	15277
Cell phone	Final	5713	5713	3063	1855	59967
DVD player	Original	6256	6256	871	647	16473
DVD player	Final	6256	6256	2980	1879	64611

5.2. Evaluation measure

In this study, topic coherence and F-Measure (based on precision and recall) are used as a performance measure. Topic coherence is used to reflect the semantic coherence of the individual topics and is computed as follows [53]

C (t; V^{t}) = \sum_{m = 2}^{M} \sum_{l = 1}^{m - 1} \log \frac{D (v_{m}^{(t)}, v_{l}^{(t)}) + 1}{D (v_{l}^{(t)})}

where $V^{(t)} = (v_{1}^{(t)}, \dots, v_{M}^{t})$ is the list of most probable $M$ aspects in the topic $t$ . $D (v_{m}^{(t)}, v_{l}^{(t)})$ is the co-document frequency of word-types $v_{m}$ and $v_{l}$ and 1 is used to avoid logarithm being equal to 0. $D (v_{l}^{(t)})$ is the document frequency of word-type $v_{l}$ . The higher value of the topic coherence reflects the high quality of the extracted aspects.

In order to make an accurate evaluation of the Concept-LDA, the words in the datasets are labelled as an aspect or non-aspect manually by three annotators, independently. The words are labelled as an aspect, if all of the annotators are agreed otherwise as a non-aspect. Since the topic words are extracted by both the Concept-LDA and LDA models, they are evaluated separately on the basis of annotators’ decision F-Measure used for evaluation.

5.3. Evaluation results

In many earlier studies, the model parameters are set arbitrarily without justification [44,54]. Therefore, for both models, we run 50, 100, 200, 500 and 1000 iterations of CGS and use symmetric priors $α = 50 / K$ and $β = 0.01$ , the same values for $α$ and $β$ work well with different text collections [21,35]. The topic number is decided as 100 based on experiments and the top 10 words are determined as topic words. The experimental results based on topic coherence are given in Figure 5 comparatively.

Figure 5.

Average topic coherence with different iteration count selection.

Figure 5 shows that the topic coherence in Concept-LDA is better than that in the LDA model. Therefore, incorporating concepts and named entities into topic modelling enhances the generalisation performance significantly. As shown in Figure 6, even for a smaller number of iterations such as 50, the improvement by Concept-LDA is more than 6% on average. If the number of iterations is increased, the improvement over topic coherence increases slightly. For 1000 iterations, the improvement over coherence is more than 8%. That is, the Concept-LDA results in better generalisation performance in terms of topic coherence. As a result of these evaluations, it can be asserted that with 1000 iterations the highest average topic coherence can be obtained for Concept-LDA. Thus, we run 1000 iterations for Gibbs sampling, which is adequate to achieve best results in our experiments.

Figure 6.

Average improvement of Concept-LDA over topic coherence with different iteration counts.

It can be gleaned from Figure 7 that Concept-LDA achieves 3% improvement on average as compared with LDA. This improvement is due to the fact that incorporating semantic knowledge could make better the extracted aspects.

Figure 7.

Average F-measure, Precision and Recall of Concept-LDA and LDA.

The top 10 topic words, used for the qualitative analysis as the most emphasised aspects of restaurant domain, are presented in Table 2.

Table 2.

The example topics obtained from Concept-LDA and LDA in the Restaurant domain.

Drinks		Salad		Parking		Money		Fast food
LDA	Concept-LDA	LDA	Concept-LDA	LDA	Concept-LDA	LDA	Concept-LDA	LDA	Concept-LDA
drink	Drink	salad	oil	parking	region	food	money	burger	burger
cocktail	drinking	salad	vinegar	street	area	money	price	office	sandwich
dinner	cocktail	beet	cheese	spot	boundary	people	right	father	ingredient
list	vodka	goat cheese	onion	area	parking	dollar	purchase	sweet potato fries	bun
martini	Gin	vinaigrette	avocado	corner	place	pay	option	beer	cake
round	liquor	cucumber	mustard	park	patio	order	sell	ketchup	beef
question	martini	brother	bone	scientology	car	lady	cost	bun	hamburger
rock	dry vermouth	hazelnut	pepper	parking lot	park	gourmet	tip	fo	Japan
foundry	Vermouth	crouton	tomato	grub	lot	spending	style	substitution	onion
bartender	variety	mark	garlic	skirt steak	parking lot	step	bit	fries	medium

LDA: latent Dirichlet allocation.

Errors are italicised and given in red.

The topics are manually labelled on the basis of our interpretation. If Concept-LDA and LDA are compared quantitatively, it can be easily realised that all labelled topics of Concept-LDA are semantically related, are coherent and have the ability to capture the details while LDA results in quite low quality. If Concept-LDA topics are examined, it can be seen that the aspects in ‘Drinks’ topic are coherent and semantically related. Thus, the topic label can easily be determined. In spite of that, the topic words for LDA such as ‘question’, ‘rock’ and ‘foundry’ in ‘Drinks’ topic are not related to drinks. Furthermore, LDA cannot catch the topic words like sandwich, ingredient and hamburger, which are discussed significantly in the reviews. Therefore, aspects in of Concept-LDA are more informative, coherent and semantically related while those in LDA are not. Consequently, when we compare the two separate topic models of restaurant domain obtained from Concept-LDA and LDA, respectively, it is obvious that the topic words in Concept-LDA provide a better representation of the restaurant domain.

Both quantitative and qualitative results indicate that our proposed assumption about enriching the feature space with the concepts and named entities is reasonable and the quality of extracted aspects is very good.

6. Conclusion

Automatic topic detection in an unlabeled document collection is a very crucial and remarkable research problem in NLP. LDA is a highly significant and widely used method for this kind of problems. LDA represents each document as a probability distribution over topics, where each topic is modelled by a probability distribution over words in a fixed vocabulary. The success of a topic modelling in LDA is based on a fixed vocabulary and the model does not consider the semantic structure of documents. It is obvious that words in a document usually contain deeper semantic information about the topics. Thus, this kind of semantic information should be considered, while the more accurate extraction of topics is achieved from the documents. In this context, a novel method based on incorporating deeper semantic knowledge, which is obtained by using extracted concepts and named entities into the LDA model, is proposed.

Our experimental results prove that rather than bag-of-words representation in LDA, the applied bag of {words + concepts + named entities} representation in Concept-LDA offers better performance in terms of topic coherence, F-measure and quality of topics. This is the main contribution of this work. Besides, our proposed model can be used as an aspect extraction tool for a sentiment analysis system. Thus the aspect extraction problem, which is one of the core tasks of sentiment analysis, can be solved effectively by Concept-LDA. From this view point, the more accurate topic representations mean the more accurate aspects for sentiment analysis.

Furthermore, in NLP studies, the domain dependence in word semantics is a crucial point to be solved in order to disambiguate word senses. Since Concept-LDA does not depend on the domain of the corpus, it can be easily applied to any domain without any previous knowledge.

As a future work, it would be interesting to compare the improvement of concepts and named entity words used for Concept-LDA with the improvement of word embedding methods such as word2vec or doc2vec in LDA.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Ekin Ekinci

References

Ekinci

İlhan

. Extracting implicit aspects based on latent Dirichlet allocation. In: Proceedings of the doctoral consortium – DCAART (ICAART 2017), Porto, 24–26 February 2017, pp. 17–23. Setúbal: Science and Technology Publication.

Türkmen

Ekinci

İlhan Omurca

. A novel method for extracting feature opinion pairs for Turkish. In: Dichev

Agre

(eds) Artificial intelligence: methodology, systems, and applications. Cham: Springer, 2016, pp. 162–171.

Pawar

Jawale

Kyatanawar

. Fundamentals of sentiment analysis: concepts and methodology In: Pedrycz

Chen

(eds) Sentiment analysis and ontology engineering. Cham: Springer, 2017, pp. 25–48.

Liu

. Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 2012; 5: 1–167.

Liu

. Mining opinion features in customer reviews. In: Proceedings of the 19th national conference on artificial intelligence, San Jose, CA, 25–29 July 2004, pp. 755–750. Menlo Park, CA: AAAI Press.

Liu

. Mining and summarizing customer reviews. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, 25–29 August 2004, pp. 168–177. New York: ACM.

Wei

Yang

. Understanding what concerns consumers a semantic approach to product feature extraction from consumer reviews. Inf Syst E-bus Manag 2010; 8: 149–167.

Popescu

Etzioni

. Extracting product features and opinions from reviews. In: Proceedings of the HLT ‘05 proceedings of the conference on human language technology and empirical methods in natural language processing, Vancouver, BC, Canada, 6–8 October 2005, pp. 339–346. New York: Springer.

Bafna

Toshniwal

. Feature based summarization of customers’ reviews of online products. Procedia Comput Sci 2013; 22: 142–151.

10.

Shi

Lina

Yijun

. Improving aspect extraction by augmenting a frequency-based method with web-based similarity measures. Inf Process Manag 2015; 51: 58–67.

11.

Liu

. Opinion feature extraction using class sequential rules. In: Proceedings of the computational approaches to analyzing weblogs, Palo Alto, CA, 27–29 March 2006, pp. 61–66. Menlo Park, CA: AAAI Press.

12.

Poria

Cambria

, et al. A rule-based approach to aspect extraction from product reviews. In: Proceedings of the second workshop on natural language processing for social media (SocialNLP), Dublin, 24 August 2014, pp. 28–37. Dublin: Dublin City University.

13.

Kang

Zhou

. RubE: Rule-based methods for extracting product features from online consumer reviews. Inf Process Manag 2017; 54: 166–176.

14.

Poria

Cambria

Gelbukh

. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-based Syst 2016; 108: 42–49.

15.

Tang

Qin

Liu

. Aspect level sentiment classification with deep memory network. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), Austin, TX, 1–5 November 2016, pp. 214–224. Vancouver, BC, Canada: Association for Computational Linguistics.

16.

Ruder

Ghaffari

Breslin

. INSIGHT-1 at SemEval-2016 Task 5: deep learning for multilingual aspect-based sentiment analysis. In: Proceedings of the international workshop on semantic evaluation 2016, San Diego, CA, 16–17 June 2016, pp. 330–336. Vancouver, BC, Canada: Association for Computational Linguistics.

17.

Liu

Wang

, et al. Aspect based sentiment analysis for online reviews. In: Park

Loia

, et al. (eds) Advances in computer science and ubiquitous computing. Singapore: Springer, 2018, pp. 475–480.

18.

Jeyapriya

Selvi

CSK

. Extracting aspects and mining opinions in product reviews using supervised learning algorithm In: Proceedings of the 2nd international conference on electronics and communication systems, Coimbatore, India, 26–27 February 2015, pp. 548–552. New York: IEEE.

19.

Huang

Zhu

. Sentiment analysis with global topics and local dependency. In: Proceedings of the 24th AAAI conference on artificial intelligence, Atlanta, GA, 11–15 July 2010, pp. 1371–1376. Menlo Park, CA: AAAI Press.

20.

Alam

Ryu

Lee

. Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Inform Sciences 2016; 339: 206–223.

21.

Steyvers

Griffiths

. Probabilistic topic models. In: Landauer

McNamara

Dennis

, et al. (eds) Handbook of latent semantic analysis. Hillsdale, NJ: Laurence Erlbaum, 2007, pp. 2–15.

22.

Moro

Cecconi

Navigli

. Multilingual word sense disambiguation and entity linking for everybody. In: Proceedings of the 13th international semantic web conference, Riva del Garda, 19–23 October 2014, pp. 25–28. New York: ACM.

23.

Navigli

Pozetto

. BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif Intell 2012; 193: 217–250.

24.

Huang

Milne

Frank

, et al. Learning a concept-based document similarity measure. J Am Soc Inf Sci Tec 2012; 63: 1593–1608.

25.

Suleman

Vechtomova

. Discovering aspects of online consumer reviews. J Inf Sci 2015; 42: 492–506.

26.

Rana

Cheah

. Sequential patterns rule-based approach for opinion target extraction from customer reviews. J Inf Sci 2018; 1–13.

27.

Titov

McDonald

. Modeling online reviews with multi-grain topic models. In: Proceeding of the 17th international conference on World Wide Web, Beijing, China, 21–25 April 2008, pp. 111–120. New York: ACM.

28.

Lin

. Joint sentiment/topic model for sentiment analysis. In: Proceedings of the 18th ACM conference on information and knowledge management, Hong Kong, China, 2–6 November 2009, pp. 375–384. New York: ACM.

29.

Brody

Elhadad

. An unsupervised aspect-sentiment model for online reviews. In: Proceedings of the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA, 2–4 June 2010, pp. 804–812. Vancouver, BC, Canada: ACL.

30.

Wang

. Sentiment analysis of online product reviews with semi-supervised topic sentiment mixture model. In: Proceedings of the 7th international conference on fuzzy systems and knowledge discovery, Yantai, China, 10–12 August 2010, pp. 2385–2389. New York: IEEE.

31.

. Aspect and sentiment unification model for online review analysis. In: Proceedings of the 4th ACM international conference on web search and data mining, Hong Kong, China, 9–12 February 2011, pp. 815–824. New York: ACM.

32.

Xianghua

Guo

Yanyan

, et al. Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet Lexicon. Knowledge-based Syst 2013; 37: 186–195.

33.

Bagheri

Saraee

Jong

. ADM-LDA: an aspect detection model based on topic modelling using the structure of review sentences. J Inf Sci 2014; 40: 621–636.

34.

Wang

Cai

Leung

, et al. Product aspect extraction supervised with online domain knowledge. Knowledge-based Syst 2014; 71: 86–100.

35.

Zheng

Lin

Wang

, et al. Incorporating appraisal expression patterns into topic modeling for aspect and sentiment word identification. Knowledge-based Syst 2014; 61: 29–47.

36.

Yin

Han

Huang

, et al. Dependency-topic-affects-sentiment-LDA model for sentiment analysis. In: Proceedings of the 2014 IEEE 26th international conference on tools with artificial intelligence, Limassol, Cyprus, 10–12 November 2014, pp. 413–418. New York: IEEE.

37.

Poria

Chaturvedi

Cambria

, et al. Sentic LDA: improving on LDA with semantic similarity for aspect-based sentiment analysis. In: Proceedings of the 2016 international joint conference on neural networks, Vancouver, BC, Canada, 24–29 July 2016, pp. 4465–4473. New York: IEEE.

38.

Yang

Chen

Qiu

, et al. Aspect extraction from product reviews using category hierarchy information. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics, Valencia, 3–7 September 2017, pp. 675–680. Vancouver, BC, Canada: ACL.

39.

Shams

Baraani-Dastjerdi

. Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Syst Appl 2017; 80: 136–146.

40.

Ekinci

İlhan Omurca

. Ürün Özelliklerinin Konu Modelleme Yöntemi ile Çıkartılması. TBV Bilgisayar Bilimleri Mühendisliği Dergisi 2017; 9: 51–58.

41.

Atıcı

İlhan Omurca

Ekinci

. Product aspect detection in customer complaints by using latent dirichlet allocation. In: Proceedings of the 2017 international conference on computer science and engineering, Antalya, 5–8 October 2017, pp. 266–270. New York: IEEE.

42.

Sanderson

. Word Sense disambiguation and information retrieval. In: Proceedings of the seventeenth annual international ACM-SIGIR conference on research and development in information retrieval (eds Croft

Van Rijsbergen

), Dublin, 3–6 July 1994, pp. 142–151. London: Springer.

43.

Navigli

. Multilinguality at your fingertips: BabelNet, Babelfy and beyond!http://sssw.org/2015/?page_id=379 (2015, accessed 31 August 2018).

44.

Mei

Zhai

. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA. Inform Retrieval 2011; 14: 178–203.

45.

Blei

Jordan

. Latent Dirichlet allocation. J Mach Learn Res 2004; 3: 993–1022.

46.

Blei

. Probabilistic topic models. Commun ACM 2012; 55: 77–84.

47.

Boyd-Graber

Blei

. Syntactic topic models. In: Koller

Schuurmans

Bengio

, et al. (eds) Neural information processing systems. New York: Curran Associates, 2009, pp. 185–192.

48.

Griffiths

Steyvers

. Finding scientific topics. Proc Natl Acad Sci U S A 2004; 101: 5228–5235.

49.

Xiao

Stibor

. Efficient collapsed Gibbs sampling for latent Dirichlet allocation. In: Proceedings of the 2nd Asian conference on machine learning, Tokyo, Japan, 8–10 November 2010, pp. 63–78. Delhi, India: MLR Press.

50.

Ekinci

Türkmen

İlhan Omurca

. Multi-word aspect term extraction using Turkish user reviews. IJCEIT 2017; 9: 15–23.

51.

Moro

Raganato

Navigli

. Entity linking meets word sense disambiguation: a unified approach. TACL 2014; 2: 231–244.

52.

Chen

Liu

. Topic modeling using topics from many domains, lifelong learning and Big Data. In: 31st international conference on machine learning, Beijing, China, 21–26 June 2014, pp. 167–176. Delhi, India: MLR Press.

53.

Chen

Liu

. Mining topics in documents: standing on the shoulders of Big Data. In: Proceedings of the 20th ACM SIGKDD conference on knowledge discovery and data mining, New York, 24–27 August 2014, pp. 1116–1125. New York: ACM.

54.

Wei

Croft

. LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, Seattle, WA, 6–11 August 2006, pp. 178–185. New York: ACM.