Predictive aspect-based sentiment classification of online tourist reviews

Abstract

With the increase of online tourists reviews, discovering sentimental idea regarding a tourist place through the posted reviews is becoming a challenging task. The presence of various aspects discussed in user reviews makes it even harder to accurately extract and classify the sentiments. Aspect-based sentiment analysis aims to extract and classify user’s positive or negative orientation towards each aspect. Although several aspect-based sentiment classification methods have been proposed in the past, limited work has been targeted towards the automatic extraction of implicit, infrequent and co-referential aspects. Moreover, existing methods lack the ability to accurately classify the overall polarity of multi-aspect sentiments. This study aims to develop a predictive framework for aspect-based extraction and classification. The proposed framework utilises the semantic relations among review phrases to extract implicit and infrequent aspects for accurate sentiment predictions. Experiments have been performed using real-world data sets crawled from predominant tourist websites such as TripAdvisor and OpenTable. Experimental results and comparison with previously reported findings prove that the predictive framework not only extracts the aspects effectively but also improves the prediction accuracy of aspects.

Keywords

Aspect-based sentiment analysis machine learning opinion mining sentiment classification

1. Introduction

Tourism is a dynamically growing industry and a key to a country’s economic growth. Its significance and importance cannot be overemphasised [1]. Every year, thousands of tourists visit various places around the world and share their sentiments in the form of online reviews on traveller’s websites such as TripAdvisor and OpenTable. These sentiments present the feeling of tourists regarding various aspects of a specific place. Due to the diversity of the sentiments posted regarding a tourist place, it is a challenging task to extract and classify the sentiments into positive or negative. A number of sentiment mining methods [2 –13] have been proposed in the past to classify the sentiments. However, in majority of the online reviews, multiple aspects regarding a particular place are mentioned which are not considered by the aspect-less sentiment classification techniques [2 –13]. Aspect-based sentiment classification deals with this issue of mining sentiment polarity of aspect categories referred in each online review.

Aspect-based sentiment classification methods [14 –31] not only assist in extracting the various aspects mentioned in the reviews but also help in classifying each aspect into positive or negative polarity. Take an example of an opinion ‘Food is delicious but service is ordinary’. In this short review, ‘food’ and ‘service’ are the two aspects which are explicitly mentioned. Food is classified as a positive aspect due to the positive sentiment word ‘delicious’ and service is classified as negative aspect because of the negative word ‘ordinary’.

Aspect-based sentiment classification methods basically consist of two tasks: (1) aspects identification and (2) sentiment classification of identified aspects into positive or negative.

In aspect identification task, there are three main issues. First is the difficulty of identifying implicit aspects. For example, take an opinion about a restaurant ‘Last night my wife and I visited Hasten restaurant, the taste was awesome’. In this opinion, tourist implicitly gives a sentiment about an important aspect ‘food’ that was not mentioned explicitly in review text. The second problem is the identification of co-referential aspects. Co-referential aspects are those aspects which are mentioned in the reviews using synonyms. For example, atmosphere and ambience are co-referential aspects because both refer to the environment. The third issue is the identification of infrequent aspects, the aspects that are not frequently mentioned in the review but are of great importance for a domain. For example, swimming pool and Wi-Fi are less frequent aspects but both carry a lot of importance in hotel and restaurant domain.

In the existing aspect identification methods [14 –24], there is not a single method which deals with all the three aforementioned aspect identification issues effectively. For each issue, separate methods have been proposed in the literature, and there is a dire need of a method that can cater these issues together, with reasonable effectiveness. This motivated us to develop a single-aspect identification method for handling implicit, co-referential and infrequent aspects.

Similarly, for the second task of sentiment classification of identified aspects, there is a major problem of handling multi-aspect reviews. Classification of the multi-aspect reviews is a complex task as multiple aspects are discussed in a review and each aspect has either a positive or a negative sentiment. For example, take a review posted about a hotel ‘The hotel staff were extremely rude, but our room was very comfortable. Perfect location for a family with children. Clean bathroom, beautiful views’. In this review, tourist gives sentiments about multiple aspects namely ‘staff’, ‘room’ and ‘location’. It can be seen from the review that aspects have different sentiments, that is, for ‘staff’, there is negative sentiment while for ‘room’ and ‘location’ the sentiment is positive.

Approaches reported in the literature [19,25 –31] have not dealt with this daunting issue of multi-aspect classification. To the best of our knowledge, there is not a single machine learning (ML)–based approach available for aspect classification which effectively handles this complex problem. We argue that aspect-based sentiment classification could only be improved if multi-aspect problem is adequately handled by aspect classification methods.

In this article, we propose a predictive aspect-based sentiment classification framework, and our contributions are three fold; (1) a tree-based method has been proposed for implicit aspect extraction; (2) an extension of rule-based aspect identification method has been devised for accurate identification of infrequent aspects and (3) an effective mechanism has been developed for transforming multi-aspect reviews into single-aspect phrases for accurate classification of aspects.

Experiments have been performed using real-world data of hotels and restaurants collected from TripAdvisor and OpenTable. Five predominant ML algorithms, namely naïve Bayes multinomial (NBM) [32], support vector machine (SVM) [33], maximum entropy (ME) [34], random forest tree (RFT) [35] and fuzzy lattice reasoning (FLR) [36] were selected for experimentation. The results were compared with similar ML approaches reported in literature [19,25–26,29–30]. The comparison shows considerable improvement in performance of our proposed framework with 89.34% and 91.53% accuracy on restaurant and hotel data sets, respectively, using NBM technique.

The rest of the article is organised as follows. Section ‘Related work’ presents the overview of the related work. In section ‘Proposed framework for predictive aspect-based sentiment classification’, the details of our predictive aspect-based sentiment classification are given. Followed by section ‘Experimental results and discussion’, in which experimental results are presented and discussed. In section ‘Comparative analysis of the results’, comparative evaluation of the proposed framework with the similar approaches has been given. Finally, the findings of our work are concluded in section ‘Conclusion and future work’.

2. Related work

This section presents the related work of aspect-based sentiment classification for tourism domain. The objective of this section is to critically evaluate the existing approaches of aspect identification and classification to identify the existing gaps and limitations.

2.1 Aspect identification

Aspect identification is a primary task in opinion mining. The work done in this area has been broadly categorised into three types of methods namely rule-based methods, seeds-based methods and topic models–based methods [37].

Rule-based methods identify aspects by employing rules based on importance score and frequency of occurrence. Such an approach has been proposed by Jiménez-Zafra et al. [14] which utilises a bag of words containing aspect terms based on frequency using Freebase. Freebase is a collaborative knowledge base that contains information about more than 70 different domains including tourism. Similarly, Muangon et al. [15] extracted high-rated aspects by employing rank-based approach. In this approach, features were obtained using LexToPus lexicon and ranked according to their frequency of occurrence in reviews. Their approach helped in extracting the aspects having high ranks.

A different approach of utilising part-of-speech (POS) tagger has been proposed by Wang et al. [16]. In order to select the potential aspects, common morphological and inflexional endings were removed by employing a novel porter stemming algorithm. This algorithm helped in tagging similar aspects to enhance the classification process. Words with identical spellings were easily and automatically identified using the porter algorithm. A similar POS–based approach has been exploited by Marrese-Taylor et al. [17,18]. In their work, the authors transformed reviews into sentences and later applied POS tagger on sentences to extract nouns. A frequency threshold of 10 was set to extract similar aspects. To advance the rule-based approaches, Afzaal et al. [19] proposed an improved method using fuzzy-based learning. Authors adopted ML-based algorithm called fuzzy unordered rule induction algorithm (FURIA). In this method, frequent nouns and noun phrases from the reviews were extracted, and rules were generated using FURIA.

Although rule-based methods are easy to adopt and very effective in the identification process, there are a number of problems which are not tackled by such methods. For instances, rule-based methods tend to produce very limited number of rules. Moreover, the generated rules are unable to identify infrequent aspects, as a number of irrelevant aspects are extracted by the limited number of rules produced by these methods.

There is another category of approach called seeds-based approach. In seed-based methods, aspects are identified using the grammatical connection between seed sets and review words. In line with this category of approaches, Colhon et al. [20] chose five most-discussed aspects from reviews and formed a seed set for each aspect. In order to build seed sets, grammatical connections between the terms in sentences and every aspect in seed sets is checked; afterwards these terms were assembled accordingly. It helps in identifying the important aspects of a review by using co-occurrence of different words.

In the quest of finding interconnection of review words, Mukherjee et al. [21] proposed a method of finding semantic relationship between review words. Words which have semantic relationships are grouped in the form of a seeds set. This approach helps in identifying not only the co-occurrence of aspect terms but also the semantic relationships between them. A slightly different seed-based approach has been proposed by Kayaalp et al. [22], which uses an index-based aspects extraction method. The index-based method consists of three main steps. First, four most-discussed aspects in restaurants reviews such as food, service, price and ambience are selected. Second, reviews words were indexed and stored in different files with proper tagging. Finally, indexed words were categorised under the selected aspects based on similarity between aspects and indexed words.

Unlike the rule-based methods which did not utilise the relationships among the different review words, the seed-based methods offer a number of advantages in terms of identification of co-referential aspects. However, the limitation of seeds-based methods is that extensive domain knowledge is required for the selection of aspects and seeds key words. In addition, the words extracted with limited number of aspects are not enough to cover the complete domain. For instance, food, service, price and ambience cannot cover the complete domain of a typical hotel or restaurant, as this domain has a large number of other important aspects, that is, location and room facilities.

The last category of aspect identification is topic model–based techniques, which depend on the assumptions that every sentiment is a blend of different topics and each topic under discussion is basically a probability distribution of various words. Wu et al. [23] proposed a brought together probabilistic model on clients inclinations about different aspects. In this model, it is expected that every sentiment is associated with an aspect. Aspect importance can be described by the sentiment that relies upon three components: global importance, user importance and how much probability that an aspect will be specified in different sentiments. In light of these assumptions, they utilised added substance-generative technique to identify the aspects.

Similarly, Shams et al. [24] proposed an enriched latent Dirichlet allocation (LDA) model to discover more precise aspects from reviews by incorporating co-occurrence relations as prior domain knowledge into the LDA topic model. In the proposed method, first, the preliminary aspects are generated based on LDA. Then, in an iterative manner, the prior knowledge is extracted automatically from co-occurrence relations and similar aspects of relevant topics. Finally, the extracted knowledge is incorporated into the LDA model. The iterations improve the quality of the extracted aspects as compared to simple LDA model.

In conclusion, all the aspect identification methods have three major issues. First, aspects identification methods do not provide effective means of removing irrelevant aspects and completely neglected the extraction of infrequent aspects. Second, implicit aspects extraction has not targeted by the existing approaches. Implicit aspects are discussed in reviews but not with an explicit aspect name. For instance, ‘food’ is an understood aspect in ‘The taste was awesome’ review sentence. Finally, the co-referential aspects were less emphasised in the literature. Review sentences have different synonym words and expressions to depict an aspect; for instance, environment and atmosphere mentioned to a similar aspect in restaurant reviews. These issues motivated us to form a method which can tackle these challenging problems of aspect extraction.

2.2 Aspect-based sentiment classification

In this section, we present the review of the work conducted under aspect-based sentiment classification in the tourism domain. It is important to highlight here that we only review the ML-based approaches in this study.

In order to get accurate classification using ML, Mubarok et al. [25] proposed a naïve Bayes-based model to classify the restaurants reviews of various aspects. In this model, three simple steps were performed to achieve the classification task. First, reviews were pre-processed to remove irrelevant information and converted into POS tags. Second, feature selection was applied on POS tags by utilising chi square to select highly relevant words from each review. Finally, naïve Bayes classifier was employed on selected features to classify the reviews into polarity classes for each aspect. Results showed that the proposed model achieved 77% accuracy of aspect-based classification.

Another ML-based approach was presented by Xueke et al. [26], where the authors have used SVM on tourist reviews to predict sentiments about different aspects discussed in reviews. In this method, natural language processing (NLP) toolkit was applied for sentence segmentation and to improve the classification performance. Sentences annotated with either ‘positive’ or ‘negative’ sentiments were used for achieving better accuracy results. Experimental results on real-world data sets showed 83.9% accuracy using seven-fold cross-validation. Authors proved that for sentiment classification, SVM produces better results than naïve Bayes algorithm.

The aforementioned algorithms were utilised using an ensemble approach by Catal et al. [27] to develop a newly vote ensemble classifier. This classifier utilised each algorithm on individual basis to predict the sentiment class in tourist reviews. First, bagging algorithm was used as base classifier for ensemble classifier. Second, naïve Bayes was employed due to its features-handling power. Third, SVM was applied to find maximum margin and provide maximum separation between sentiment classes. Finally, voting mechanism was adopted to predict the sentiment class on the basis of majority voting.

In the quest of achieving better accuracy, ME algorithm was employed by Pontiki et al. [28, 29] to classify the aspect-related opinions into positive or negative. In this system, unigram features from the given sentence are extracted and then integer-valued functions were used to identify the aspect category of tuple being used. ME classifier was trained on extracted features, and prediction was tested using golden data set labelled by experts of that domain. The results of the system indicate the robustness and more stable performance with 78.69% on restaurants reviews.

Instead of adopting the traditional ML algorithms Afzaal et al. [19] utilised fuzzy-based algorithms for aspect-based sentiment classification. In the proposal, a three-stage fuzzy method was developed for classification. In the first stage, opinion-less sentences were reviewed from the reviews. In the second stage, features were built from filtered opinion sentences like n-grams and POS tags. In the last stage, fuzzy logic algorithms were applied on the built features to classify the sentiments into positive and negative.

A number of approaches have been proposed recently, which worked on features selection before classifying the reviews in order to improve the classification results. Akhtar et al. [30] utilised the particle swarm optimisation (PSO) for features selection to improve the classification of tourist reviews. In this approach, authors extracted different feature sets from reviews using lexical and semantic information and then selected the most suitable feature set based on PSO. After that ML algorithms such SVM, ME and conditional random field were employed on selected features to classify the reviews.

Similarly, Ali et al. [31] worked on feature selection to remove irrelevant reviews and enhance the performance of classification reviews. In this technique, raw reviews were converted into words to generate tokens using semantic information. Fuzzy domain ontology was developed to select the relevant features from generated tokens. In feature selection process, the authors make sure that each review should contain a noun and a verb. On selected features set, SVM was applied to classify the reviews into different polarity classes.

Although ML-based methods are easy to adopt and very effective in the classification process, there are a number of challenges which are not handled by these methods. First, existing methods are unable to classify the multi-aspect opinions into positive or negative. Review data contain sentences that discuss multiple aspects in a single sentence. Another related issue is about data noise. When reviews are crawled from third-party websites like TripAdvisor and OpenTable.com, then these reviews are assigned to different classes by the reviewers. In such a case, some irrelevant and opinion-less sentences like self-introductions and previous stories are present in the review text and affects the opinion classification accuracy of aspects in a negative way.

In order to overcome the issues of ML-based classification, there is a need of a predictive framework that is capable of extracting the various types of aspects effectively and also has the power to predict unlabelled reviews accurately.

3. Proposed framework for predictive aspect-based sentiment classification

In this section, we propose an enhanced aspect-based sentiment classification framework. The primary focus of this framework is to identify the explicit and implicit aspects from tourist reviews and then classify the sentiments about identified aspects into positive or negative. Figure 1 describes the main phases of the framework. In the first phase of data collection, tourist reviews are collected about different tourist places like hotel and restaurant from social media websites. In the second phase of data pre-processing, data redundancy is removed and transforms the reviews into sentences. In the third phase of aspect identification, a novel aspect identification approach is proposed to identify the explicit and implicit aspects from pre-processed data sets. In the last phase of classification, an ML-based sentiment classification approach is proposed, in which multi-aspect opinions are transformed into single-aspect phrases, and then generate features from these phrases to build the model for classification of aspects. Models are evaluated in the last step on the basis of advance evaluation measures, and the model with best results is selected for prediction.

Figure 1.

Proposed framework for aspect-based sentiment classification.

3.1 Data collection

In data collection phase, we collected reviews from different travel websites such as TripAdvisor, OpenTable and Expedia using existing Application Programming Interfaces (APIs). Each data set has different size and different domain. In restaurant domain, 2000 reviews were collected, including 1000 positive and 1000 negative reviews. In hotel domain, 4000 reviews were collected, including 2000 positive and 2000 negative reviews. For hotels and restaurants selection, we have chosen the top five hotels and restaurants in London city to crawl the user reviews.

3.2 Data pre-processing

The second component of the framework is for the pre-processing of reviews data. Data crawled from the travel websites contain noise and inconsistencies. Such data require data cleansing to avoid redundancies. Moreover, as mentioned previously, we intend to apply sentence-level classification; therefore, sentences are separated from each review text using the sentence stop characters, that is, full stop (.) or question mark (?). Moreover, the duplicate sentences and identical review texts are removed.

The rationale behind this is that a review is often posted by the same user twice, and it affects the accuracy of sentiment classification. Finally, data ambiguity is removed to correct all vague terms like ‘greaaaat, yummmyy and coooool’ that are not standard English terms. After performing pre-processing on review data sets, a smaller sample of cleansed data remains for further analysis and aspect identification. It is important to mention here that for our experiments, we pre-processed both the two data sets and got 3787 sentences for the restaurant data set and 7802 sentences for the hotel data set.

3.3 Aspect identification

The next component of the framework deals with the main task of aspect identification. Aspect identification is a relatively complex task, as it involves the identification of implicit and co-referential aspects. In this study, we have developed approach to identify explicit, implicit and co-referential aspects. In the following sub-sections, the approaches are explained for each type of aspect identification.

3.3.1 Explicit aspects identification

For explicit aspect identification, an extension to the already existing rule-based methods [14 –19] has been proposed. The proposed method assists in handling irrelevant and infrequent aspects. Our extension consists of three important steps namely noun recognition, pattern extraction and irrelevant aspect removal.

For noun recognition step, existing rule-based methods [18,19] are employed which takes sentences one by one and split into single words using delimiters such as space and punctuation marks. Each word of a sentence is then checked by employing POS tagger to determine whether the word is a noun or not. If it is, it checks the adjacent word to make sure whether that particular aspect is a single word or a combination of two words such as ‘food quality’ and ‘service quality’. If both adjacent words are nouns, it combines the words using space and adds it in the nouns list. If next adjacent word is not a noun but an adjective or verb, it is not added in the list. Table 1 shows a set of extracted nouns from hotels data set.

Table 1.

Recognised nouns.

Data set	Explicit aspects
Hotels	Roommate, ads, agent, distance, stay, street, family, time, business, afternoon, agency, night, anniversary, room, bed, comfort, renovation, carpet, furniture, bathroom, bed, bedroom, restroom, air conditioner, minutes, bed sheet, rock room, bedspread, service, staff, waitress, reception, front desk, group, queue, manager, supervisor, chefs, price, cent, value, amount, section, money, budget, pay, discount, bill, environment, atmosphere, baby, décor, background, morning, decoration, address, ambience, airline, architecture, brand, noon, art, food, dinner, weekend, cheese, anything, meal, dessert, breakfast, bell, lunch, salmon, course, pasta, conference, gate, seafood, wine, drink, girlfriend, beer, platter, coffee, starter, glass, dishes, couple, portion, water

Data set

Explicit aspects

Hotels

Roommate, ads, agent, distance, stay, street, family, time, business, afternoon, agency, night, anniversary, room, bed, comfort, renovation, carpet, furniture, bathroom, bed, bedroom, restroom, air conditioner, minutes, bed sheet, rock room, bedspread, service, staff, waitress, reception, front desk, group, queue, manager, supervisor, chefs, price, cent, value, amount, section, money, budget, pay, discount, bill, environment, atmosphere, baby, décor, background, morning, decoration, address, ambience, airline, architecture, brand, noon, art, food, dinner, weekend, cheese, anything, meal, dessert, breakfast, bell, lunch, salmon, course, pasta, conference, gate, seafood, wine, drink, girlfriend, beer, platter, coffee, starter, glass, dishes, couple, portion, water

It can be seen from Table 1 that a large number of irrelevant aspects were recognised which have no relevance with the domain under analysis (hotel and restaurant). These include ‘anniversary’, ‘ ads’, ‘night’, ‘agent’, ‘afternoon’ and so on.

For the second step of pattern extraction, irrelevant aspects are identified and removed after deriving meaningful patterns from sentences. Patterns are derived from sentences between sentiment words and extracted aspects. The rationale behind this extraction is that irrelevant explicit aspects are not usually discussed in specific patterns. Thus, if patterns are derived, irrelevant aspects can be identified and removed.

In order to achieve this, we propose a pattern extraction algorithm (PEA). The steps of PEA are shown in Algorithm 1. It takes reviews and POS tagger as input and provides a list of patterns with their corresponding frequencies as output. Algorithm 1 initialises the patterns list and frequency list with empty array shown at line 1. Second, it employs POS tagger on each sentence to get POS tags such as NN (Noun), JJ (Adjective) and VB (Verb) as shown in lines 4 and 5. Third, it initialises the single pattern extraction array that stores the POS tag from starting position to ending position of a pattern. Fourth, process all POS tags one by one to extract the possible patterns from a sentence. In pattern extraction procedure, noun and adjective are considered as the starting and ending points of a pattern. For example, consider a review sentence ‘The service was excellent and lovely food’. This sentence contains two patterns, the first pattern starts from ‘service’ which is a noun (NN) tag and ends at ‘excellent’ which is an adjective (JJ) tag. Second pattern starts with the word ‘lovely’ which is an adjective and ends at ‘food’ which is a noun. Fifthly, extracted patterns from a sentence are added in the pattern list. After pattern extraction, frequency of each pattern is counted and duplicate patterns are removed as shown in lines 26–30. Finally, algorithm returns the pattern list with their associated frequencies in reviews.

Algorithm 1. Patterns extraction algorithm.
Input: reviews = ${S_{1}, S_{2}, S_{3}, \dots, S_{n}}$
POS_TAGGER = ${T_{1}, T_{2}, T_{3}, \dots, T_{n}}$
Output: patterns
patterns_frequency
1. initialise patterns = {}, patterns_frequency = {}
2. for all reviewsdo
3. transform reviews_i into sentences
4. for all sentences do
5. tags = tagger (sentences_j, POS_TAGGER)
6. initialise pattern_item = {start_ position = FALSE, end_position = FALSE}
7. for all tagsdo
8. if NN in tags_kOR JJ in tags_kthen
9. ifFALSE in start_positionthen
10. $start_position \leftarrow TRUE$
11. end if
12. else
13. end_position← TRUE
14. end else
15. end if
16. if TRUE in start_position then
17. pattern_item ←tags_k
18. end if
19. if TRUE in end_position then
20. patterns←pattern_item
21. initialise pattern_item = {start_ position = FALSE, end_position = FALSE}
22. end if
23. end for
24. end for
25. end for
26. for all patterns do
27. frequency = pf(pattern_n, patterns)
28. remove_dublicate(pattern_n, patterns)
29. pattren_frequency_n← frequency
30. end for
31. return patterns, pattren_frequency

Algorithm 1 generates a large number of patterns, and generalised sequential pattern (GSP) algorithm [38] has been employed on generated patterns to select the candidate patterns. GSP takes patterns, associated frequencies, number of sentences and appropriate percentage for candidate patterns such as ‘1%’ in whole as input and provides a list of selected sequential patterns as output. Table 2 shows the candidate patterns generated after applying GSP algorithm.

Table 2.

Derived patterns.

Patterns	Frequency
NNP/VBD/JJ	3251
JJ/NN	2463
JJ/IN/NN	1874
JJ/NNS	1703
NN/VBZ/RB/JJ	1276

The final step is the removal of irrelevant aspects. Once the patterns are derived, relevance of each aspect is calculated to identify and remove the irrelevant aspects. Relevance of an aspect is calculated by the patterns in terms of number of times aspect appeared in reviews as extracted patterns.

Equation (1) shows the formula used to calculate the relevance of an aspect

R (a) = \sum_{k = 0}^{m} p_{k} (identify (a, reviews))

(1)

where p represents the derived patterns list, m indicates the number of patterns in the list, a is the target aspect whose relevance is being checked in given reviews. Equation (1) checks the aspect appearance in reviews according to derived patterns from the last step. If such patterns are not identified for any aspect, then such aspect is considered as an irrelevant aspect. Hence, it is removed from the list of explicit aspects. For example ‘Food’ aspect has two appearances in given reviews such as ‘Food is delicious’ and ‘Awesome Food’. The derived patterns list has two patterns such as NN/IS/VB or VB/NN.

Now using the patterns derived, we calculate the relevance of ‘food’ aspect. As it can be seen that the appearance of food aspect is in line with the derived patterns; therefore, ‘Food’ is considered as a relevant aspect. Table 3 shows the explicit aspects from hotel and restaurants data set after removing the irrelevant aspects. This approach helped in removing irrelevant aspects to improve the quality of aspect identification process.

Table 3.

Explicit aspects.

Data set	Explicit Aspects
Hotels	Room, bed, comfort, renovation, carpet, furniture, bathroom, bed, bedroom, restroom, air conditioner, bed sheet, rock room, bedspread, service, staff, waitress, reception, front desk, queue, manager, supervisor, chefs, price, cent, value, amount, money, budget, pay, discount, bill, environment, atmosphere, décor, background, decoration, ambience, architecture, art, food, dinner, cheese, meal, dessert, breakfast, lunch, salmon, course, pasta, seafood, wine, drink, beer, platter, coffee, starter, dishes, portion

Data set

Explicit Aspects

Hotels

Room, bed, comfort, renovation, carpet, furniture, bathroom, bed, bedroom, restroom, air conditioner, bed sheet, rock room, bedspread, service, staff, waitress, reception, front desk, queue, manager, supervisor, chefs, price, cent, value, amount, money, budget, pay, discount, bill, environment, atmosphere, décor, background, decoration, ambience, architecture, art, food, dinner, cheese, meal, dessert, breakfast, lunch, salmon, course, pasta, seafood, wine, drink, beer, platter, coffee, starter, dishes, portion

3.3.2 Grouping co-referential aspects

The next step for the proposed framework is grouping the co-referential aspects. Aspects referring to the same category with different meanings are grouped in this step. For example, location, place and venue refer to the same aspect. In order to achieve automatic grouping, we apply WordNet synonym set [39] on each identified aspect and determine the relation between aspects regarding meaning, indication and synonym. If aspects are similar in meaning, or if a synonym is used, a group of these aspects is created. Among that group, the word with the highest frequency is selected as a leader. Table 4 shows the grouping of co-referential aspects.

Table 4.

Co-referential aspects groups.

Group Leader	Co-referential aspects
Room	Room, bed, comfort, renovation, carpet, furniture, bathroom, bed, bedroom, restroom, air conditioner, bed sheet, rock room, bedspread
Service	Service, staff, waitress, reception, front desk, queue, manager, supervisor, chefs
Price	Price, cent, value, amount, money, budget, pay, discount, bill
Environment	Environment, atmosphere, décor, background, decoration, ambience, architecture, art
Food & drinks	Food, dinner, cheese, meal, dessert, breakfast, lunch, salmon, course, pasta, seafood, wine, drink, beer, platter, coffee, starter, dishes, portion

3.3.3 Implicit aspects identification

After extracting the explicit aspects, the implicit aspects are identified from each review sentence. In this article, a novel implicit aspects extraction method has been proposed, which builds binary trees using the aspect-specific lexicon to recognise the implicit aspects. First, the aspect-specific lexicon is created by extracting the sentiment words. Such words co-occur with explicit aspect in either a sequential manner or in a random order. Second, a binary tree for each aspect is being created by utilising the aspect-specific lexicon. Finally, the tree structure is employed on each sentence to identify the implicit existence of aspects. The details of each of implicit aspect identification are given in the following sub-sections.

3.3.3.1 Creation of aspect-specific lexicon

In the first step of implicit aspects identification, sequential or random co-occurrences between explicit aspects and sentiment words are extracted to create an aspect-specific lexicon. In sequential co-occurrence, the sentiment words that co-occur with aspects before clauses such as ‘and’ and ‘but’ are extracted. For example, if we take a review sentence ‘Food was delicious but expensive’, in this sentence only ‘delicious’ sentiment word is extracted with ‘food’ aspect. In random co-occurrence, the sentiment words that co-occur after the clauses in a review sentence are extracted. Considering the same example, only ‘expensive’ sentiment word is extracted. The rationale behind non-extraction of ‘expensive’ as sentiment word in sequential co-occurrence is that in majority of the sentences multiple aspects are discussed in a single sentence and aspects sentiment are often changed after ‘and’ and ‘but’ clauses. Thus, if only clause-less sentiment words are extracted in sequential co-occurrence, then aspect-specific lexicon becomes more meaningful in creating binary trees for implicit aspects extraction. Table 5 shows the created aspect-specific lexicon from hotels data set that consists of a set of sentiment words of ‘food’ and ‘price’ aspects.

Table 5.

Aspect-specific lexicon

Aspect	Sentiment word	Sequential co-occurrence	Random co-occurrence
Food	Superb	452	134
	Tasty	472	53
	Delicious	521	35
	Soggy	752	12
	Yummy	542	13
	Tasteless	341	32
	Expensive	91	174
	Mediocre	172	21
Price	Additional	152	13
	Expensive	272	39
	Pricey	321	73
	Cheap	421	74
	Reasonable	234	88
	Affordable	345	64
	Economical	432	23
	Inexpensive	172	19

It can be seen from Table 5 that each aspect is associated with a number of sentiment words based on sequential and random co-occurrences. In this aspect-specific lexicon, each sentiment word can be associated with more than one aspect on the bases of co-occurrences. For example, if we take ‘expensive’ sentiment word, this is associated with both ‘food’ and ‘price’ aspects. The next step of the proposed method is to compute the relational score of each sentiment word with all associated aspects to formulate the final association of a sentiment word with the respective aspect.

3.3.3.2 Binary trees building

In the second step of implicit aspects extraction, binary trees are created for each aspect by utilising the aspect-specific lexicon. A tree-builder algorithm, titled Algorithm 2, has been proposed, which compares the co-occurrence of a sentiment word with aspects and uses as tree node for generating an appropriate aspect tree. The workflow of the proposed algorithm for implicit aspects is as follows.

First, initialise the nodes array and process lexicon words one by one as shown in lines 1 and 2 of Algorithm 2. Second, the relational score is calculated by dividing the sequential co-occurrences with random co-occurrences of lexicon words with the current aspect. Third, calculated relational score is compared with others aspects relational scores. If current aspect score is greater than other aspects scores, this lexicon word is added into nodes array of current aspect as shown from line 4–8 of Algorithm 2. Suppose that a sentiment word ‘expensive’ sequentially co-occurs with ‘food’ 50 times and randomly co-occurs 75 times, the relational score becomes 0.70. Similarly, if sentiment word ‘expensive’ that sequentially co-occurs with ‘price’ 15 times and randomly co-occurs 16 times, the relational score becomes 0.93 and ‘expensive’ sentiment word is then added in ‘price’ nodes array due to high relational score. Finally, binary tree is created for targeted aspect using additional nodes. Figures 2 and 3 show the binary trees generated for ‘food’ and ‘room’ aspects.

Algorithm 2. Tree-builder algorithm.
Input: lexicon = {L} /* Aspect specific lexicon */
Aspects = ${A_{1}, A_{2}, A_{3}, \dots, A_{n}}$ /* Aspects */
current_asp = {A} /* Tree aspect */
Output:tree /* Tree for Implicit aspects extraction */
1. initialise nodes = {}
2. for all lexicondo
3. $rela_scor e_{current_asp} = \frac{lexico n_{i} [a] [seq_score]}{lexico n_{i} [a] [ren_score]}$
4. for allaspectsdo
5. $rela_scor e_{as p_{k}} = \frac{lexico n_{i} [as p_{k}] [seq_score]}{lexico n_{i} [as p_{k}] [ren_score]}$
6. if $rela_scor e_{current_asp}$ > $rela_scor e_{as p_{k}}$ then
7. nodes $\leftarrow$ $lexico n_{i}$
8. end if
9. end for
10. end for
11. tree = binary_tree (nodes, current_asp)
12. return tree

Figure 2.

Binary tree for food aspect identification in restaurants reviews.

Figure 3.

Binary tree for room aspect identification in hotels reviews.

3.3.3.3 Employment of binary trees

In the last step of implicit aspects extraction, the binary trees created in the last step are employed on sentences to identify the implicit aspects. The procedure of employment is as follows. First, the sentences are broken down into words by using delimiters such as space and punctuation marks. Second, binary trees are employed one by one on sentence words to check the existence of implicit aspects. Finally, if a binary tree returns the TRUE value, the tree aspect is assigned to the review, else it moves forward to check the next binary tree. Table 6 shows the set of sentences from hotels data set after extracting implicit aspects using binary tree generated using Algorithm 2.

Table 6.

Implicit aspects extraction.

Sentence	Implicit aspects
Our room was spacious, relaxing, no rowdy parties and cheaper	Environment, price
Spicy, clean and beautiful view of the dessert.	Location, food
The service was polite and accommodating, the comfortable, and the service quick and delicious	Food, room
My fiancé and I came in very loud, outdated, horrible guest services!	Environment, room

3.4 Aspect-based sentiment classification

After the identification of various types of aspects, the framework provides an ML-enabled aspect-based sentiment classification. In order to perform ML-based classification, we exploited the predominant ML classifiers for predicting the positive or negative sentiments for each aspect. Our proposed method consists of three basic stages. First, it transforms the multi-aspect sentiment sentences into single-aspect sentiment phrases and discards the irrelevant sentences. Second, it generates feature vectors from single-aspect sentiment phrases like (n-grams and POS). Finally, it builds classifiers on generated features using ML algorithms to classify the single-aspect sentiment phrases into positive or negative.

3.4.1 Transformation

A large number of sentences in online reviews contain multiple aspects, and each aspect has a sentiment associated with it. We propose a grammatical relationship-based transformation method that splits multi-aspect sentences into single-aspect phrases. In this method, we utilise typed dependency [40] to identify the grammatical relations between review words and then extract phrases having sentiments using these relations. Each opinion phrase contains one aspect and multiple sentiment words that depend on that particular aspect. Three grammatical relations amod, nsubj and advmod were identified. Amod is a relation in which adjectival phrase serves to modify the meaning of the noun phrase. Nsubj is two words relation consisting of one noun and an adjective, in which noun represents the aspect and adjective expresses the sentiment of that aspect.

For instance, consider a sentence ‘Food is good and service is bad’. In this sentence, two nsubj phrases were extracted {food – good} and {service – bad}. Advmod is the short form for adverb modifier; for example, a sentence ‘food is very good’, the advmod is {good – very}. Once the grammatical relations were identified, their opinion phrases are extracted in which each phrase consisted of one aspect and multiple sentiment words (adjective, adverb) that expressed the opinion towards the aspect. Figure 4 shows the visual representation of transforming the multi-aspect sentences into single-aspect phrases.

Figure 4.

Transformation of multi-aspect sentences to single-aspect phrases.

3.4.2 Feature generation

In this phase, we generate features from single-aspect phrases that are to be used to train ML algorithms. Four types of features were generated (unigrams, bigrams, trigrams and POST) from each data set. The process of obtaining n-grams and POS tags is as follows: in the first step of the process, we tokenised the phrases to split by spaces and punctuation marks and generated a bag of words. However, we made sure that short forms such as ‘don’t’, ‘I’ll’, ‘she’d’ are considered as a single word. For POS tags, we extract only verb, adverb and adjective from the data set. In the second step, we remove stop words (‘a’, ‘an’ and ‘the’) from the bag of words. In the last step, we dealt with negation, negation (such as ‘no’ and ‘not’) is attached to a word which precedes it or follows it. For example, a sentence ‘I do not like fish’ will form three bigrams: ‘I do + not’, ‘do + not like’, ‘not + like fish’. The last step allows improving the accuracy of the classification since the negation plays a crucial role in sentiment expressions.

3.4.3 Classifier building

The mechanism of aspect-based sentiment classification framework is based on ML algorithms. These algorithms are commonly used by the ML community. For sentiment classification, the classifiers decide the class of sentiment sentence by considering all aspects and their linkages to sentiment words. The situation becomes more complex when there exist a large number of aspects. In such cases, ML algorithms are very pivotal in predicting the correct class for each aspect discussed in a review.

In this study, we experimented with five predominant ML algorithms which were used for sentiment classification namely, NBM, SVM, RFT, ME and FLR. The purpose is to compare and evaluate these classifiers thoroughly to determine which one consistently performs better.

3.4.3.1 NBM

Generally, the term naïve Bayes is referring a model having strongly independent assumptions instead of distributing each feature particularly. In a naïve Bayes model, it is assumed that all features being used are conditionally independent of one another given some class. Formally, the calculation of probability of observing features from $f_{1}$ to $f_{n}$ , for a given class c, considering the assumptions of naïve Bayes, is performed when following holds

p (f_{1 \dots .} f_{n} | c) = Π_{i = 1}^{n} p (f_{i} | c)

(2)

Equation (2) [32] means that when a new example is classified using naïve Bayes model, it becomes simpler to work with posterior probability

p (c | f_{1 \dots} f_{n}) \propto p (c) p (f_{1} | c) \dots p (f_{n} | c)

(3)

Equation (3) [32] described that independence of assumptions is not always true, and that is the reason why some refer the model as ‘idiot Bayes’ model. The feature distribution has not been discussed until now. In simple words, $p (f_{i} | c)$ has been undefined. The term NBM allows to know that each $p (f_{i} | c)$ is a multinomial distribution. This method works well for easily countable data, like words count in text.

3.4.3.2. SVM

In this algorithm, data analysis is performed to define decision boundaries by hyper-planes. In binary classification, the document vectors are separated using hyper-planes. This separation is desired to be kept possibly large. For a training set having a labelled pair $(x_{i}, y_{i}), i = 1, 2, \dots, n$ , where $x_{i} \in R^{n}$ and $y \in {1, - 1}^{l}$ , the SVM method is needed to get solution of the following problem of optimisation that can be represented as

\begin{matrix} \begin{matrix} min \\ w, b, ε \end{matrix} \frac{1}{2} W^{T} W + C \sum_{i}^{l} ε_{i} \\ Subject to y_{i} (W^{T} φ (X_{i}) + b) \geq 1 - ε_{i} \\ ε_{i} \geq 0 \end{matrix}

(4)

In equation (4) [33] ‘W’ represents weight assigned to variables, $ε$ represents slack or added error correction and ‘C’ represents factor of regularisation. The purpose of resolving this problem is to minimise ‘ $(1 / 2) W^{T} W + CP \sum_{i = 1}^{l} E_{i}$ ’, where value of ‘ $y_{i} (w^{T} \emptyset (X_{i}) + b)$ ’ is required to be greater than ‘ $1 - ε_{i}$ ’, and ‘ $ε$ ’value is considered to be almost 0. Here training vector ‘ $x_{i}$ ’is mapped to higher dimensional space by ‘ $\emptyset$ ’. The text file of reviews is required to be converted to numeric vectors because SVM takes numeric vector input. After this, scaling process is performed, which is helpful in managing vectors and keeping them in range of [1, 0].

3.4.3.3 ME

This algorithm sets conditional distribution constraints using training data. These constraints are used to express the characteristics of training data. So, in terms of exponential function, the ME value can be expressed as

p_{ME} (c | d) = \frac{1}{Z (d)} \exp (\sum_{i} λ_{i, c} f_{i, c} (d, c))

(5)

In equation (5) [34], $P_{ME} (c | d)$ represents the chances of document ‘d’ assigning to class ‘c’, $f_{i, c} (d, c)$ represents feature/class function for $f_{i}$ feature and c class, $λ_{i, c}$ represents the estimating parameter and Z(d) represents the factor for normalising. This algorithm requires feature set selection. To classify the text, count of the words is taken as a feature. The feature/class function can be instantiated as follows

f_{i, c} (d, c) = {\begin{matrix} 0 \\ \frac{N (d, i)}{N (d)} \end{matrix} \begin{matrix} if c \neq c' \\ otherwise \end{matrix}

(6)

In equation (6) [34], $f_{i, c} (d, c)$ represents the features in word-class combination of class ‘c’ and document ‘d’, N(d,i) represents feature ‘i’ occurrence in document ‘d’ and ‘N(d)’ represents word count in ‘d’. According to the expression, each word has its word–class pair and its associated weight, when a word comes frequently in a class then that word gets higher weight of relevant word–class pair as compared with others. The word–class pairs with highest frequency are used in classification.

3.4.3.4 RFT

This algorithm is consisted of a set of free-structured classifiers. In this algorithm, two ML techniques are very helpful. These are bagging and random feature selection. This algorithm is a developed stage of bagging as only randomly selected subset of features is split at each node of growing tree. The prediction performance of this algorithm is assessed while algorithm performs a type of cross-validation by using the so-called out-of-bag (OOB) samples in parallel with the training step. The bootstrapping is sampling by performing the replacement from training data. Some sequences will be ‘left out’ of the sample, which will constitute OOB sample, while others will be repeated. On average, each tree is using $1 - e^{- 1} ≅ 2 / 3$ training sequences to grow it, while $1 - e^{- 1} ≅ 1 / 3$ training sequences are left out and used to constitute OOB sample [35]. As OOB sample is not used in tree growing, it can be used to estimate the prediction performance [41,42].

3.4.3.5 FLR

In this algorithm, some rules are derived from training data and then used to classify the testing data [36]. These rules are called as fuzzy lattice rules. Let us assume U as a set of data objects of all types that exist in this universe, but we will consider lattices only. A fuzzy lattice is designated as $〈 L, u 〉$ where L is lattice, and u is its valuation function. This is made up of many elements where each element belongs to a certain class. A set of classes that will be associated to these constituent elements of lattice is considered as C. Now fuzzy lattice rules are derived. Each fuzzy lattice rule is basically a couple like $〈 ui, ci 〉$ where ui is an object, and ci is its related class that performs function $h : U \to C$ . These couples are derived along the whole training set. When a new object comes in testing set for which no rule is available in training set, then existed rules compete to classify it. Finally, this new object is classified in one of the classes of set C on the basis of inclusion measure parameter of the object.

3.5 Classifier evaluation

After a classification model is available, the model is used to predict the sentiments of tourists regarding the various aspects. As one of the important step to ensure the model generalise well, the performance of the predictive model has to be evaluated using advanced measures such as precision, recall, F-measure and area under the curve (AUC).

Precision and recall are the measures that actually evaluate the accuracy of sentiment classification prediction model. Precision is basically the ratio of relevant records (TP) identified to the total number of irrelevant and relevant records. It is usually expressed as a percentage. In our example precision will be calculated as TP/(TP + TN).

Similarly, recall is the ratio of the number of relevant records (TP) retrieved to the total number of relevant records in the data set. It is also expressed in percentage as recall, that is, TP/(TP + FN).

A sentiment classification system should be measured by its ability to positive and negative sentiments, and we therefore use the true positive rate and receiver operating characteristic (ROC) curve to give a comprehensive evaluation of our prediction model, where value of AUC measures the effective of sentiment prediction framework.

AUC is obtained by plotting the precision and recall on a graph. So, we use AUC instead of accuracy because percentage of correctly identifying the positive opinion is not sensitive to a single cut-off value, hence recall rate and AUC are more adequate measures to evaluate sentiment prediction model.

4. Experimental results and discussion

In this section, we present the experimental results and evaluate the effectiveness of our proposed aspects-based sentiment classification framework. We evaluate the proposed framework in term of its ability to recognise the various types of aspects and its ability to accurately classify the sentiments of each aspect.

4.1 Results pertaining to aspect identification

As mentioned previously, the experiments were conducted on restaurants and hotels data sets. The purpose of the experiments was to measure the accuracy of aspect identification. The three challenging types of aspects were targeted in our experiments and the goal was to achieve high accuracy rates for explicit, implicit and co-referential aspects. Figure 5 shows the percentage correctness of each aspect type for both restaurants and hotels data sets.

Figure 5.

Correctly identified aspect percentages.

It can be seen from the results depicted in Figure 5 that our proposed aspect identification method achieved impressive performance. For explicit aspect identification, 80.5% accuracy was achieved for restaurants data sets and 87% for the hotel data set. Similarly, for implicit aspect identification, 82% implicit aspects were correctly identified for restaurants data set and 91% for Hotel data set. It is important to highlight here that the most challenging type of aspects is co-referential aspects, but our proposed tree-enabled method performed well to achieve accuracy around 78.5% for both the data sets. The primary reason to achieve these high accuracy result is a through pre-processing step to remove irrelevant, opinion-less and redundant reviews. It is a fact that the presence of noise in the data hinders the recognition accuracy. Unlike the previous approaches, we not only tackled these difficult to identify aspect types but also provided novel integration of tree-based methods with existing approaches to achieve better recognition rates.

It is also very important to highlight the percentage of aspects present in each data set. Figure 6 show the percentage of each type of identified aspects in restaurants and hotels data sets. If we analyse Figure 6, we can see that restaurants and hotels had 60% and 52% explicit aspects, respectively, from which high recognition rates were achieved. In other words, 80% of the 60% explicit aspects were correctly recognised. Similarly, the challenging type of aspect is the co-referential one, around 15% were identified and approximately 78% of this 15% were correctly identified with the help of our proposed methods.

Figure 6.

Percentage of identified aspects in both data sets.

4.2. Aspect-based sentiment classification results

The second part of our evaluation is for the sentiment classification step. In aspect-based sentiment classification, we evaluate the performance of each ML classifier by conducting experiments on different data sizes, feature-weighting methods and feature types. Moreover, we examine the time taken by each classifier on different sizes of data sets to establish the scalability issue of various classifiers used in our experiments. Table 7 shows the results of the classifiers. It can be seen that the advanced evaluation measures (precision, recall, F-measure and ROC) have been used to determine the superiority of a particular classifier. It is obvious from Table 7 that NBM classifier achieves best accuracy results for both the data sets. An accuracy of 89.34% and 91.53% has been achieved for restaurants and hotels data set, respectively. NBM works on the probability calculation method for each class. One of the reasons of getting better accuracies using NBM is that the probability calculation mechanism of NBM algorithm suits this two class problem of sentiment classification.

Table 7.

Classifiers performance on restaurants and hotels data sets.

Restaurants data set					Hotels data set
Classifier	Accuracy	Precision	Recall	F-measure	Accuracy	Precision	Recall	F-measure
NBM	89.34%	0.89	0.89	0.89	91.53%	0.92	0.91	0.92
SVM	89.02%	0.89	0.89	0.88	89.93%	0.90	0.90	0.90
ME	85.74%	0.85	0.85	0.85	83.83%	0.84	0.84	0.84
RFT	88.42%	0.88	0.88	0.88	87.79%	0.88	0.88	0.87
FLR	77.96%	0.78	0.78	0.78	80.12%	0.80	0.80	0.80

NBM: naïve Bayes multinomial; SVM: support vector machine; ME: maximum entropy; RFT: random forest tree; FLR: fuzzy lattice reasoning.

However, approaches in the past utilised this algorithm without proper pre-processing of the review text data. In our proposed method, the major reason of getting this accuracy improvement for NBM is the proper identification of aspects including implicit, co-referential and infrequent. Moreover, our proposed pre-processing method along with aspect identification mechanism has the power to augment not only NBM but also other classifiers which are known to be very accurate for two class problem such as SVM. Area under the ROC curve is an established measure to determine the supremacy of a classifier. It can be seen from Figures 7 and 8 that ROC curves plotted based also reflect that NBM and SVM outperformed other classifiers for both data sets.

Figure 7.

ROC curve on restaurants data set.

Figure 8.

ROC curve on hotels data set.

Figures 9 and 10 present accuracy of each algorithm according to the time taken to predict labels of reviews. This experiment was performed on a 64-bit Windows-based system with an Intel core (i7), 2.80 GHz processor machine with 8 Gb of RAM. The results on time-based experiments show that NBM takes the least time on both restaurants and hotels data sets while FLR takes the worst time. We argue that NBM is faster than SVM, ME, RFT, FLR or any other ML algorithm for review-related data sets and scales well with the increase in data size.

Figure 9.

Classifiers prediction time with different instances sizes on restaurants data set.

Figure 10.

Classifiers prediction time with different instances sizes on hotels data set.

Figures 11 and 12 present the effect of feature types like unigrams, bigrams, trigrams and POS on the performance of aspect-based sentiment classification. We run each algorithm on each feature type. We record that unigrams and POS provide better accuracy with NBM on both data sets, while bigrams and trigrams provide better accuracy with FLR on restaurants data set. The reason is that unigram takes single word for classification, and POS has the advantage of employing rule-based tagging algorithms. This gives an edge to POS over other feature types, as it covers computational linguistics and rule-based mechanisms.

Figure 11.

Classifiers performance with different feature types on restaurants data set.

Figure 12.

Classifiers performance with different feature types on hotels data set.

Figures 13 and 14 present the effect of feature-weighting methods such as presence, term’s frequency (TF) and term’s frequency with its inverse document frequency (TF-IDF) on the performance of aspect-based sentiment classification. We use the same approach in this experiment as applied in feature types. We run each feature-weighting method on each algorithm. We record that the presence-weighting method provides better accuracy with NBM on both data sets while TF and TF-IDF provide little bit less accuracy than presence.

Figure 13.

Classifiers performance with different feature methods on restaurants data set.

Figure 14.

Classifiers performance with different feature methods on hotels data set.

Figures 15 and 16 present effect of data set size on the performance of our proposed framework. We have two major data sets, one of 2000 restaurant reviews and second of 4000 hotel reviews. We split restaurant reviews into four chunks of 500, 1000, 1500 and 2000. Similarly, we split hotel reviews into four chunks of 1000, 2000, 3000 and 4000. We run each algorithm on each data set, and the results show that restaurants data set with 1000 reviews chunk provides better accuracy with NBM and hotels data set with 4000 reviews chunk provides better accuracy with NBM.

Figure 15.

Classifiers performance with different instances sizes on restaurants data set.

Figure 16.

Classifiers performance with different instances sizes on hotels data set.

The proposed framework has been implemented in the form of a mobile application by utilising DrupalGap and PhoneGap technologies. The developed prototype has all the methods incorporated, and it helps the tourist in situations where tourists want to know the best restaurants and hotels in a particular city he plans to visit. The other advantages of this application include intelligent recommendations about tourist places on the basis of aspects ratings, graphical representation of positive and negative opinions regarding each aspect.

5. Comparative analysis of the results

In this section, our proposed framework of aspects-based sentiment classification was compared with other aspects-based sentiment classification methods reported in the literature. Tables 8 and 9 present this comparison in which the results of our experiments were compared with the similar techniques reported in the literature. It is important to highlight at this point that we have not taken exactly the same data sets for comparison as it is not made available by the authors. We have similar data set and we compare our work in terms of overall accuracy achieved. Moreover, we compare that if the previously reported techniques have handled the various types of aspect using their proposed methods.

Table 8.

Comparison with other aspects identification methods.

Paper	Explicit aspects		Implicit aspects	Co-referential aspects	Irrelevant aspects	Method	Results
	Frequent	Infrequent
Jiménez-Zafra et al. [14]	High	Low	Null	Not handled	Handled	Rules based	71.2%
Kayaalp et al. [22]	High	Null	Null	Handled	Handled	Seeds based	66%
Shams et al. [24]	High	Medium	Null	Not handled	Not handled	Topic based	75%
Afzaal et al. [19]	High	Low	Low	Handled	Not handled	FURIA based	79%
Proposed method (restaurants data set)	High	Medium	High	Handled	Handled	Extended rule based	82%
Proposed method (hotels data set)	High	Medium	High	Handled	Handled	Extended rule based	91%

Table 9.

Comparison with other aspects-based sentiment classification methods.

Paper	Data set	Multi-aspect opinion handled	Method	Results
Mubarok et al. [25]	Review: 1202	Not handled	Naïve Bayes	77%
Xu et al. [26]	Reviews: 3214	Not handled	Support vector machine	83.9%
Akhtar et al. [30]	Reviews: 3845	Not handled	Conditional random field	77.3%
Pontiki et al. [29]	Reviews: 320	Not handled	Maximum entropy	78.69%
Afzaal et al. [19]	Reviews: 4000	Not handled	Fuzzy logical reasoning	86.02%
Proposed method (restaurants data set)	Reviews: 2000	Handled	Naïve Bayes multinomial	89.34%
Proposed method (hotels data set)	Reviews: 4000	Handled	Naïve Bayes multinomial	91.53%

The results of comparison show that our proposed framework performs more efficient and improved way during tasks of aspects identification and aspect-based sentiment classification.

In terms of aspect extraction, proposed framework handles irrelevant aspects efficiently by calculating the relevance of each aspect. On the other hand, existing approaches use ‘frequency constraint’ due to which many frequently occurring irrelevant aspects are left to discard. Our framework also focuses on implicit aspects extraction along explicit ones because of their most often occurrence in tourism domain; however, literature work less emphasised on implicit aspects extraction.

In terms of aspect-based opinion classification, proposed framework classifies the multi-aspect opinions into positive or negative by transforming into multiple single-aspect opinions, whereas existing approaches only deal with single-aspect opinions. Moreover, our framework utilised efficient ML algorithms for the classification of these multi-aspect opinions into positive and negative. Finally, we argue that our method outperforms similar methods reported in the past in terms of accuracy. For both data sets, we managed to improve the accuracy percentage from 79% to 91%. This shows that our predictive framework is more accurate, handles more types of aspects and flexible enough to accommodate different ML classifiers to achieve even better results than the reported ones in this study.

6. Conclusion and future work

In this article, a predictive aspect-based sentiment classification framework has been proposed which enables users to classify sentiments regarding the various aspects of a tourist destination. In this framework, a novel aspect identification method has been proposed that automatically identifies the explicit, implicit and co-referential aspects from tourist opinions. An ML algorithm–based opinion classification approach is proposed that classifies the opinion about extracted aspects into positive or negative. Experiments were performed on real-world on data sets taken from restaurant and hotel reviews to evaluate the proposed framework. The results of the proposed framework were better than already reported results in the literature. In aspects extraction, proposed method achieved impressive performance with 82% correctly identified aspects in restaurants data set and 91% correctly identified aspects in hotels data set. In opinion classification method, NBM algorithm showed better results than other ML classifiers with 89.34% and 91.53% accuracy on restaurants and hotels data sets, respectively. A detailed comparative analysis highlights the supremacy of our proposed framework over the exiting similar approaches reported in the past.

In future, we intend to utilise multi-label learning classifiers for classification instead of transforming multi-label review to single label for classification. Moreover, we plan to experiment with the reviews of other tourist attractions, that is, museums, beaches and parks instead of hotel and restaurant reviews which have been used quite often in the aspect-based opinion-mining literature.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

References

Sharma

Kulshreshtha

Paygude

Tourview: sentiment based analysis on tourist domain. Int J Comput Sci Inf Tech 2015; 6: 2318–2320.

Thet

Khoo

CS.

Aspect-based sentiment analysis of movie reviews on discussion boards. J Inf Sci 2010; 36: 823–848.

Khoo

Johnkhan

SB.

Lexicon-based sentiment analysis: comparative evaluation of six sentiment lexicons. J Inf Sci. Epub ahead of print 19 April 2017. DOI: 10.1177/0165551517703514.

Zhang

Using data-driven feature enrichment of text representation and ensemble technique for sentence-level polarity classification. J Inf Sci 2015; 41: 531–549.

Onan

Korukoğlu

A feature selection model based on genetic rank aggregation for text sentiment classification. J Inf Sci 2017; 43: 25–38.

DD.

Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 2010; 48: 354–368.

Lin

Everson

et al . Weakly supervised joint sentiment-topic detection from text. IEEE T Knowl Data En 2012; 24: 1134–1145.

Martín-Valdivia

Martínez-Cámara

Perea-Ortega

et al . Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches. Expert Syst Appl 2013; 40: 3934–3942.

Moraes

Valiati

Neto

WPG

. Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 2013; 40: 621–633.

10.

Moreo

Romero

Castro

et al . Lexicon-based comments-oriented news sentiment analyzer system. Expert Syst Appl 2012; 39: 9166–9180.

11.

Seki

Kando

Aono

Multilingual opinion holder identification using author and authority viewpoints. Inform Process Manag 2009; 45: 189–199.

12.

Wang

Sun

et al . Sentiment classification: the contribution of ensemble learning. Decis Support Syst 2014; 57: 77–93.

13.

Tan

Cheng

Wang

et al . Adapting naive Bayes to domain adaptation for sentiment analysis. In: Proceedings of the European conference on information retrieval, Grenoble, 26–29 March 2009, pp. 337–349. New York: Springer.

14.

Jiménez-Zafra

Martín-Valdivia

Martínez-Cámara

et al . Combining resources to improve unsupervised sentiment analysis at aspect-level. J Inf Sci 2016; 42: 213–229.

15.

Muangon

Thammaboosadee

Haruechaiyasak

. A lexiconizing framework of feature-based opinion mining in tourism industry. In: Proceedings of the fourth international conference on digital information and communication technology and it’s applications (DICTAP), Bangkok, Thailand, 6–8 May 2014, pp. 169–173. New York: IEEE.

16.

Wang

Song

Ranking product aspects through sentiment analysis of online reviews. J Exp Theor Artif In 2017; 29: 227–246.

17.

Marrese-Taylor

Velásquez

Bravo-Marquez

. Opinion zoom: a modular tool to explore tourism opinions on the web. In: Proceedings of the 2013 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), vol.3, Atlanta, GA, 17–20 November 2013, pp.261–264. New York: IEEE.

18.

Marrese-Taylor

Velásquez

Bravo-Marquez

A novel deterministic approach for aspect-based opinion mining in tourism products reviews. Expert Sys Appl 2014; 41: 7764–7775.

19.

Afzaal

Usman

Fong

ACM

et al . Fuzzy aspect based opinion classification system for mining tourist reviews. Adv Fuzzy Syst 2016; 2016: 6965725.

20.

Colhon

Bădică

Şendre

. Relating the opinion holder and the review accuracy in sentiment analysis of tourist reviews. In: Proceedings of the international conference on knowledge science, engineering and management, Melbourne, VIC, Australia, 19–20 August 2014, pp. 246–257. New York: Springer.

21.

Mukherjee

Liu

Aspect extraction through semi-supervised modeling. In: Proceedings of the 50th annual meeting of the association for computational linguistics: long papers, vol. 1, Jeju Island, Korea, 8–14 July 2012, pp. 339–348. New York: ACM.

22.

Kayaalp

Weckman

WAY

et al . Extracting customer opinions associated with an aspect by using a heuristic based sentence segmentation approach. Int J Bus Inf Syst 2017; 26: 236–260.

23.

Ester

Flame: a probabilistic model combining aspect based opinion mining and collaborative filtering. In: Proceedings of the eighth ACM international conference on web search and data mining, Shanghai, China, 2–6 February 2015, pp. 199–208. New York: ACM.

24.

Shams

Baraani-Dastjerdi

Enriched LDA (ELDA): combination of latent Dirichlet allocation with word co-occurrence analysis for aspect extraction. Expert Syst Appl 2017; 80: 136–146.

25.

Mubarok

Adiwijaya Aldhi

MD.

Aspect-based sentiment analysis to review products using naïve Bayes. AIP Conf Proc 2017; 1867: 020060.

26.

Cheng

Tan

et al . Aspect-level opinion mining of online customer reviews. China Commun 2013; 10: 25–41.

27.

Catal

Nangir

A sentiment classification model based on multiple classifiers. Appl Soft Comput 2017; 50: 135–141.

28.

Pontiki

Galanis

Papageorgiou

et al . SemEval-2016 task 5: aspect based sentiment analysis. In: Proceedings of the 10th international workshop on semantic evaluation (Semeval-2016), San Diego, CA, 16–17 June 2016, pp. 19–30. Stroudsburg, PA: Association for Computational Linguistics.

29.

Pontiki

Galanis

Papageorgiou

et al . SemEval-2015 task 12: aspect based sentiment analysis. In: Proceedings of the 9th international workshop on semantic evaluation (Semeval-2015), 2015, pp. 486–495, http://alt.qcri.org/semeval2015/task12/

30.

Akhtar

Gupta

Ekbal

et al . Feature selection and ensemble construction: a two-step method for aspect based sentiment analysis. Knowl-Based Syst 2017; 125: 116–135.

31.

Ali

Kwak

Kim

YG.

Opinion mining based on fuzzy domain ontology and support vector machine: a proposal to automate online review classification. Appl Soft Comput 2016; 47: 235–250.

32.

Kibriya

Frank

Pfahringer

et al . Multinomial naive Bayes for text categorization revisited. In: Proceedings of the Australasian joint conference on artificial intelligence, Melbourne, VIC, Australia, 19–20 August 2004, pp. 488–499. New York: Springer.

33.

Suykens

Vandewalle

Least squares support vector machine classifiers. Neural Process Lett 1999; 9: 293–300.

34.

Berger

Pietra

VJD

Pietra

SAD

. A maximum entropy approach to natural language processing. Comput Linguist 1996; 22: 39–71.

35.

Jiang

Wei

et al . RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Res 2007; 35: W47–W51.

36.

Kaburlasos

Athanasiadis

Mitkas

PA.

Fuzzy lattice reasoning (FLR) classifier and its application for ambient ozone estimation. Int J Approx Reason 2007; 45: 152–188.

37.

Zhang

Liu

. Aspect and entity extraction for opinion mining. In: Chu

(ed.) Data mining and knowledge discovery for big data. New York: Springer, 2014, pp. 1–40.

38.

Hirate

Yamana

Generalized sequential pattern mining with item intervals. J Comput 2006; 1: 51–60.

39.

Miller

GA.

WordNet: a lexical database for English. Comm ACM 1995; 38: 39–41.

40.

De Marneffe

M-C

Manning

. The Stanford typed dependencies representation. In: Proceedings of the Coling 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation, Manchester, 23 August 2008, pp. 1–8. Stroudsburg, PA: Association for Computational Linguistics.

41.

Svetnik

Liaw

Tong

et al . Random forest: a classification and regression tool for compound classification and QSAR modeling. J Chem Inf Comput Sci 2003; 43: 1947–1958.

42.

Díaz-Uriarte

De Andres

SA.

Gene selection and classification of microarray data using random forest. BMC Bioinf 2006; 7: 3.