Abstract
Online customer reviews are an important assessment tool for businesses as they contain feedback that is valuable from the customer perspective. These reviews provide a significant basis on which potential customers can select the product that best meets their preferences. In online reviews, customers describe positive or negative experiences with a product or service or any part of it (i.e. features). Consumers frequently experience difficulty finding the desired product for comparison because of the massive number of online reviews. The automatic extraction of important product features is necessary to support customers in search of relevant product features. These features are the criteria that make it possible for customers to characterise different types of products. This article proposes a domain independent approach for identifying explicit opinionated features and attributes that are strongly related to a specific domain product using lexicographer files in WordNet. In our approach, N_gram analysis and the SentiStrength opinion lexicon have been employed to support the extraction of opinionated features. The empirical evaluation of the proposed system using online reviews of two popular datasets of supervised and unsupervised systems showed that our approach achieved competitive results for feature extraction from product reviews.
1. Introduction
Social media, such as Twitter, 1 Facebook, 2 forum discussions and blogs, have become a significant reference point for Internet users. They have a profound influence on nearly all human behaviour, from customer comments regarding the best mobile phone to buy to changes in the political situations of countries brought about by citizens, as in the case of some Middle Eastern countries [1]. Customer reviews are considered a valuable source of information for businesses looking to produce products and services that meet customer needs. At the same time, these reviews assist customers in selecting the product or service that meets their expectations. For instance, travellers often rely on customer feedback from review sites, such as TripAdvisor, 3 to select hotels.
Product reviews also play a powerful role in the creation of new customers if they are positive. According to independent market research company eMarketer, 4 online users trust customer reviews 12 times more than the product details provided by businesses [2].
Tracking and monitoring online opinions are difficult tasks because of the growing number of social websites and the huge amount of opinions published on each site. For instance, Twitter users publish more than 200 million tweets daily [3], which is impossible to analyse manually. Thus, an automated treatment of opinionated information is necessary to help the average person identify relevant online reviews and extract opinions from them.
The computational treatment of opinions on social networks involving entities, such as people, events and products, is referred to as sentiment analysis [4]. Generally, sentiment analysis has been investigated in previous studies on two levels [5], namely, coarse-grained sentiment analysis and fine-grained sentiment analysis. Coarse-grained sentiment analysis views the entire document as expressing a single opinion (or sentiment) about a single entity [6]. However, this type of analysis is inappropriate if the document contains multiple perspectives about manifold entities [7]. By contrast, fine-grained sentiment analysis is focused on identifying the statements or features (aspects) of each entity mentioned in a document [1]. This approach is more comprehensive than coarse-grained sentiment analysis because the extracted opinion is not reflective of the entire document, but it could be for certain entities (e.g. a camera) and their features (e.g. battery life).
In this work, we investigate the task of extracting opinionated product features and attributes from online reviews, referred to as feature extraction. Specifically, feature extraction is a natural language processing (NLP) task that manipulates customer reviews to identify the main product features, such as ‘camera zoom’ and ‘battery life’. This task is considered the cornerstone of the feature-based opinion mining problem [8] (see Figure 1).

Feature-based opinion mining tasks.
Most sentiment analysis approaches are domain dependent [9], and building a feature extraction model for each domain is a complex and time-consuming process. Extending and fitting a domain-dependent method to other domains are also extremely difficult [10]. Thus, automatic extraction of product features is required to address the problem of specificity of domain-dependent methods.
Hu and Liu [11] highlighted the importance of automatic feature extraction from customer reviews, citing several reasons. First, we cannot rely on the names of features that have been provided by merchants because the customers may use different words for the same features in their comments. Second, customers may comment on some features that the merchant may disregard. Third, the merchant may intentionally hide weak features from customers. Furthermore, business managers view customer reviews as a valuable means of determining product features that are important to customers, allowing them to design efficient and focused marketing strategies [12].
Sentiment analysis assumes that the opinions expressed in customer reviews should contain target features [10]. In e-commerce websites, these product features have been associated with positive and negative opinions, indicating the customers’ satisfaction with these features [2]. These opinionated features are more probably to be the relevant features for a specific domain product [13]. This article primarily proposes a technique for extracting opinionated features and attributes from online opinions that are relevant to a specific domain product from online customer reviews in an unsupervised manner using lexicographer files in WordNet. The majority of the product features are explicitly mentioned in online reviews [9]; thus, our method is used for explicitly extracting features.
2. Literature review
Feature extraction is one of the most complex tasks in sentiment analysis, in which NLP methods are applied to unstructured textual data to automatically extract representative features [14].
Two approaches have been used in the feature extraction process from online reviews, namely, supervised and unsupervised techniques. Supervised techniques require human supervision in creating the training data to be used for the feature extraction process. For instance, Liu et al. [15] proposed a supervised feature extraction framework from the pros and cons of product reviews, whereby the training data are annotated by a human who identifies the main product features. The hidden Markov model is another technique used by Jin and Ho [16] to propose a novel framework for learning patterns to extract opinions and features. Jakob and Gurevych [17] applied another sequential learning method based on conditional random fields (CRFs) in feature extraction tasks; their method is applied in single and different-domain settings. By contrast, Htay and Lynn [18] proposed a supervised approach for extracting probable product features using a training dataset for each domain in addition to a set of extraction patterns. Another domain-dependent study was introduced by Toh and Wang [19] to extract opinion targets from customer reviews using CRF as the learning algorithm for the extraction task; they used external recourses to extract semantic and syntactic features. In their work, they used WordNet lexicographer files only to determine whether a token belongs to the intended semantic category (like ‘noun.food’).
Supervised methods generate acceptable results in the feature extraction process, but to properly accomplish the task, they cannot be constructed without human supervision of the training data being built; therefore, this approach is a time-consuming and even costly process, particularly in domain-independent models [20].
On the contrary, unsupervised techniques address the problem of supervision; these approaches are primarily corpus-based and incorporated with statistical analysis. Hu and Liu [11] produced a major paper that has inspired several researchers to further explore feature extraction. Their approach is based on the extraction of nouns and noun phrases from the review followed by the application of the Apriori algorithm to identify frequently appearing item sets; the item set is regarded as frequently appearing if its value is larger than a specific threshold (i.e. 1% of the subjective sentences in the review). This approach is known to be efficient because numerous commercial organisations have applied it in their systems with some enhancements [1]. Moreover, Hu and Liu [11] used a simple heuristic technique to identify infrequent features by searching nearby opinion words. However, this technique retrieves several nouns/noun phrases that are irrelevant to the product domain. For instance, consider the following customer review:
The salesman was easy-going and let me try all the models on display.
In the approach of Hu and Liu [11], ‘salesman’ is extracted as an infrequent feature because the opinion word ‘easy’ is close, whereas the extracted word is not a genuine product feature (GPF). However, they argued that the process of extracting infrequent features was simply a completeness process in their work, claiming that the problem was not serious.
Precision was a problem in the previous approach, but it was improved in Popescu and Etzioni [21]. In their work, Popescu and Etzioni [21] calculated the point-wise mutual information (PMI) between the discovered noun phrases and some meronymy discriminators that were associated with the product class, such as ‘of phone’, ‘phone has’ and ‘phone comes with’, where parts of the product have been extracted from WordNet using an ‘is–a’ hierarchy.
Moreover, Balahur and Montoyo [22] proposed a method for extracting the product features and attributes that belonged to the same ‘relevant domain’. Relevant domains [23] are new resources that have been extracted semi-automatically from WordNet glosses to accumulate the most representative words for a given domain. The next step in this approach is extracting the synonyms and meronyms of the relevant domain words using different ontologies.
Recently, Eirinaki et al. [13] proposed an opinion search engine model that was the closest to our research. In their work, they introduced the algorithm HAC (High Adjective Count) to extract the most representative product features. They stipulated that these features would always be nouns; hence, they counted the number of opinion words (adjectives) that were close to a noun and considered this number as a score for that noun. These nouns were subsequently ranked to select the probable features based on a human-supervised threshold. However, the most important limitation of this work is the large number of irrelevant features that could be extracted. For example, in the sentence ‘This camera is perfect for an enthusiastic amateur person’, the noun ‘person’ is regarded as a product feature, particularly if it has been described repeatedly in the reviews. Furthermore, this system omitted the extraction of noun phrases as product features, in which customers commonly used words and phrases to describe product features [24]. Moreover, potential customers could consider these phrases important. Another limitation of Eirinaki et al.’s [13] work is the algorithm’s use of a stemming technique for each tagged review; however, the algorithm does not unify different forms of the extracted nouns with their base form. For instance, they considered the words ‘cameras’ and ‘camera’ as two different features, which severely harmed the precision of the results.
A novel similarity measure (PMI-TFIDF (term frequency-inverse document frequency)) is introduced by Quan and Ren [10] to calculate the association between the nouns contained in a specific product review and the product name, such as ‘camera’ and ‘phone’. The candidate features are selected based on the association measure. An interesting unsupervised approach is proposed by Jimenez-Zafra et al. [25] to extract product aspects from online reviews using a bag of words from a collaborative knowledgebase called Freebase, 5 which consists of information about multiple domains (games, food and drink, physics, etc.). In their work, once they identify the domain of their study, they can extract a bag of words of all aspects of a specific domain from Freebase. They compared their approach with a supervised system using the dataset SemEval-2014 Task 4 [26]. Another study was introduced by García-Pablos et al. [27] to extract and rank a list of candidate domain features in an unsupervised manner to be used in annotating other reviews in the same domain. They used a graph co-ranking approach described in Liu et al. [28] to model the domain features and the opinion words as a graph. The authors used the restaurant dataset of SemEval-2015 Task 12 to compare their approach with the baseline method provided by the organisers of SemEval-2015 [29].
The sentiment analysis domain generally seeks subjective information expressed in conversations on the web. Accordingly, simply considering frequently appearing nouns in online reviews as the real product features is insufficient. These features should also be associated with emotions, sentiments and opinions to be regarded as a significant reference for potential customers. We adopted this observation in our approach in addition to recognising the importance of extracting the features that were the most relevant to the product domain.
3. Methodology
3.1. Overview
The system architecture of opinionated-feature extraction based on unsupervised learning is illustrated in Figure 2. The structure of this model is principally decomposed into three stages, namely, NLP, opinionated-feature extraction and opinionated-feature selection. In the NLP stage, various text mining techniques, such as part of speech (PoS) tagging, stop word elimination and lemmatisation, have been applied for noise reduction. This stage ends by identifying nouns/noun phrases in customer reviews. In the opinionated-feature extraction stage, N_gram analysis and the SentiStrength sentiment lexicon have been efficiently used for extracting the candidate product features and genuine opinion words, respectively. The final stage, opinionated-feature selection, represents the essential stage of this study, in which a popular WordNet lexicon is efficiently employed to extract both product features and attributes.

Overall system architecture.
3.2. First stage: NLP of customer reviews
This stage begins by tagging the customer reviews using the Stanford
6
tagger, which determines the grammatical structure of the sentences by assigning PoS tags to every word in a sentence based on the context [30]. In our case, we consider only noun-related patterns (NN and NNS) as the candidate features of a product. Further processing is applied on the extracted nouns to remove useless characters, such as the hyphen character (-) in noun words (i.e. ‘auto-focus’). Moreover, stop word removal is performed using a predefined stop word
7
list. The main benefit of using stop word elimination in this stage is that it allows verification of the inappropriateness of the extracted nouns from customer reviews for the stop word list. The lemmatisation process has an important role in this stage. Specifically, to correctly extract product features, different forms of a single word must be unified with its base form, such as ‘cameras → camera’ and ‘qualities → quality’; therefore, some lemmatisation rules have been used in this stage to unify different forms of plural nouns. By the end of the lemmatisation process, a single list of all words that have been identified as nouns is generated as
Phrases share the same importance of nouns as product features, such as ‘battery life’ and ‘picture quality’. Therefore, the basic idea of extracting the noun phrase in this work is based on the following: Let
This stage ends by extracting the frequency
3.3. Second stage: opinionated-feature extraction
The analysis of customer reviews posted on the web chiefly aims to extract positive or negative experiences with a specific product or service or any part of it. Potential customers primarily seek these experiences to guide their own buying intentions of that product. Consequently, customer reviews should contain an opinion word(s) to generate beneficial feedback for other customers.
Most state-of-the-art sentiment analysis domains start their feature extraction process by identifying the potential product features (mostly nouns/noun phrases) mentioned in customer reviews, for example, Quan and Ren [10] and Htay and Lynn [18]. Conversely, in our approach, we start the feature extraction task by locating the opinion words in the customer reviews. The identification of the opinion word strongly indicates that the desired feature that we are seeking is around the opinion word. At this moment, two essential questions arise:
How are these opinion words located in customer reviews?
How should the candidate features, which could be around the opinion word, be identified?
Opinion words have been investigated intensively as part of the research on sentiment analysis; most of these studies consider adjectives as the main source of subjectivity in textual data [10,31–33]. By contrast, other approaches used adverbs or verbs as opinion words in addition to adjectives [18,34–36]. Thus, considering only adjectives as opinion words could omit the multiple opinion words of other syntactic structures such as adverbs, which possibly describe some attractive product features for future customers. Accordingly, opinion lexicons have been adopted to identify sentiments about products rather than examining the syntactic structure of the customer review. In our case, we used SentiStrength lexicon for the problem (1).
SentiStrength [37] is an opinion lexicon that has been designed using several linguistic resources, such as the Bing Liu and Multiple Perspective Question Answering (MPQA) lexicons [38]. SentiStrength is mainly focused on analysing subjective textual contents in forum discussions, blogs or social networks. Positive and negative sentiments are the major components of this lexicon, where each sentiment word is associated with a sentiment strength ranging from
To address problem (2), we used N_gram analysis in capturing the candidate features mentioned in customer reviews. N_gram models are extensively used in the task of sentiment classification, that is, determining the polarity of opinion words, positive or negative [39–43]. A few studies employed N_gram analysis in extracting product features from customer reviews. For example, Hu and Liu [44] proposed a supervised approach for identifying the features of a product from the pros and cons of customer reviews. They used 3-gram (three words) associated with their PoS tags (see Figure 3) only for partitioning long sentences into short segments. They claimed that in pro-and-con reviews, only one feature in each segment sentence was identified at the most. In our case, the proposed approach employed trigram analysis mainly for feature extraction from free format reviews, which are more complicated.

Hu and Liu’s approach with N_gram.
More specifically, this stage involves the following processes:
For each customer review, a sequence of lexical analysis (tokenisation, lower casing and punctuation removal) and stop word elimination processes have been applied to convert the original form of this review (physical view) to a form that facilitates processing (logical view). For instance, consider the customer review in Figure 4.
The second process involves the identification of the opinion words in the logical view of the review using the SentiStrength lexicon to prepare for the extraction of opinionated features in the review. For noun extraction, after applying the lemmatisation process on the plural nouns contained in the review, we refer to the list of nouns (N) in the previous phase to identify the nouns mentioned in the review.
The final process of this stage starts the parsing procedure of the review by locating the positions of the opinion words. Notably, the position of the opinion word in the review determines the direction of trigram analysis, that is, forward, backward or both directions, of the opinion word, as illustrated in Figures 5–7, respectively.

Converting the representation of a customer review from a physical to a logical view.

Forward N_gram direction.

Backward N_gram direction.

Bidirectional N_gram.
To emphasise, for each opinion word in the review, we apply – at most – trigram analysis on both sides of the opinion word (forward and backward) or on one side (forward or backward) based on its position in the review to identify the candidate product features (nouns/noun phrases). Once we identify a candidate feature(s) at any side of the opinion word, a numerical score called feature score (fs) for each feature is increased by 1. This step is repeated until the final opinion word in the review has been reached. Whenever the boundaries of the review are located, the searching process for the candidate features is terminated. After applying this process on all of the customer reviews, we generate the outputs of this process in the form of two lists of candidate opinionated features. The first list refers to the opinionated nouns (ON) in the customer review, whereas the second list contains the opinionated noun phrases (ONP) as follows
By the end of this stage, several uninteresting opinionated nouns/noun phrases have been extracted, and considering them as product features is difficult. Our target in this study is to mine the genuine product-related features that have been discussed positively or negatively by the reviewers; these features could aid prospective customers in their decision to buy the product. The final stage of the proposed approach is considered the backbone of this work; it suggests two major criteria with the collaboration of the knowledge of the popular lexicon WordNet to select the most representative features of a specific product. The detailed steps of this stage are discussed in the subsequent section.
3.4. Third stage: feature selection
In the previous stage, we argued the importance of extracting those product features that have been discussed in the customer reviews either positively or negatively. We also considered the opinions of these features as valuable resources for other customers interested in the same product. However, the extracted lists of features for this approach, thus far, contain many irrelevant nouns and noun phrases that are difficult to view as product features. This limitation is overcome at this stage by selecting the GPFs upon which the customer could rely to select the intended product. In this stage, we propose the following two major criteria for selecting the most representative features of a specific product:
1. The first criterion is chiefly based on our claim that the GPF should be described positively or negatively by many customers to be considered an important product feature. Based on this claim, we compute the weight of opinionated feature
where
2. The second criterion is applied if the
The importance of this criterion is certainly derived from its goal of extracting those features that are strongly related to the product domain to add them to the GPF list; this could be considered a significant guide to potential customers who are considering the intended product.
Moreover, this criterion has been decomposed into three main sub-criteria. The first two criteria efficiently employ the popular lexicon WordNet to realise our target of selecting GPFs as well as product attributes such as ‘quality’ and ‘clarity’. The proposed approach does not use WordNet to extract synonyms or hyponyms of product features as numerous studies have already applied WordNet. By contrast, we focus on computing the correlation between product names, such as ‘phone’ and ‘camera’, and the candidate opinionated features using lexicographer files in WordNet. The third sub-criterion selects product features based on feature scores (fs) that have been extracted in the previous stage. In contrast to the first criterion, the value of
3.4.1. Sub-criterion 1: lexicographer files in WordNet
WordNet 8 is a large lexical database containing various syntactic categories of the English language, such as nouns, verbs, adjectives and adverbs. These categories are organised in cognitive synonyms called ‘synsets’, and each synset describes a lexical concept. Among the most important components of WordNet are lexicographer files (Lex-files). Each lexicographer file contains the synsets for a specific syntactic category. The synsets of each lexicographer file are categorised into separate files based on their domains. For example, WordNet is structured into 45 lexicographer files based on PoS (see Table 1). The name of each lexicographer file is formed as pos.suffix, where PoS refers to the PoS (noun, adj, verb or adv). By contrast, suffix indicates the name of related synset categories such as noun.food. This file contains all nouns that denote foods and drinks. All lexicographer files are indexed in a separate file in WordNet called lexnames, and the format of each line in this file is shown in Table 2.
Lexicographer files in WordNet
Format of lexnames file in WordNet
lexcode, a two-decimal number, denotes the code of the lexicographer file, acknowledging that these codes are contained in other parts of the WordNet database, such as data.noun, indicating to which lexicographer file a specific synset belongs. For instance, consider the following part of a line from the file data.noun: 02942699 06 n 02 camera 0 …; the second field of this line (06) indicates that the noun word (camera) belongs to a lexicographer file named noun.artifact. With regard to the syntactic category codes, these integers denote the PoS type as shown in Table 3. In this study, the product features are nouns/noun phrases; however, the second criterion is based on noun lexicographer files in WordNet. These files and their descriptions are presented in Table 4.
Syntactic category codes.
Noun lexicographer files in WordNet
In sum, the second criterion is based on the degree of correlation between the product name and the candidate opinionated features that have been extracted in stage 2. This correlation chiefly depends on the domain of the product name, such as noun.artifact (06) and noun.food (13), and the extent to which the candidate product features belong to the same domain. However, this aspect raises the question of the mode of mining the product domain using WordNet.
In the WordNet database, two files are required to represent its content, namely, index.pos and data.pos, in which pos could be a noun, verb, adjective or adverb. We consider index.noun and data.noun in this study because the expected product features are nouns/noun phrases. The main components of the index.noun file are lemma (the base form of an English word) and synset_offset (synset), where each synset (synonym set of lemma) is represented as a byte offset that points to the synset containing the lemma in data.noun. The two lines in Table 5 have been obtained from index.noun and data.noun.
Line representation in index.noun and data.noun
Based on the previous table, the byte offset ‘
Thus, we can mine the domain of the product depending on the Lex-file codes of its synsets. For instance, in Table 5, the product name (camera) has two synsets, with byte offsets presented in the index.noun file; each synset is associated with a Lex-file code in the data.noun file, indicating the lexicographer file name containing the synset, as shown in Figure 8, where the domain of the product ‘camera’ is noun.artifact. By contrast, some product names have several synsets that belong to different domains. For instance, the product name ‘Phone’ has two synsets belonging to noun.artifact and one synset belonging to noun.communication.

Process of identifying the domain of a product.
The product name has been mined using the Lex-file codes of its synsets. However, the process of extracting the correlation between this product name and the candidate opinionated features that have been extracted in stage 2 must be determined. Thus far, for each product name (i.e. ‘camera’), we obtain the Lex-file code(s) (i.e. ‘06’) of its synsets, which in turn determine the domain(s) (i.e. noun.artifact) of the product. In this criterion, the correlation between each opinionated feature and the product name
where

Example of computing C (lens, camera).
3.4.2. Sub-criterion 2: opinionated product attribute selection
Some of the candidate opinionated features in the previous sub-criterion have the value
3.4.3. Sub-criterion 3: opinionated-feature selection using arithmetic mean
In the first criterion of this stage, we select the opinionated features based on the weight of that feature
In the current sub-criteria, we employ a dynamic threshold rather than a fixed threshold approach because of its efficiency, which has been approved by Jakob and Müller [45].
Overall, criterion 2 in stage 3 is a multi-criteria approach that aims to select those genuine opinionated product features that could not be selected in the first criterion of this stage. In other words, the current criterion is principally applicable only if the following sub-criterion is realised
where the value of
4. Evaluations
In this section, we perform experiments to evaluate the effectiveness of our proposed approach using the approaches and datasets from two main stream of works in the field of sentiment analysis. The first experiment compares our approach with unsupervised approaches from Hu and Liu [11] and HAC [13], whereas the second experiment employs the supervised approach from the baseline used in SemEval-2015 Task 12 [29]. Task 12 of SemEval-2015 is a continuation of SemEval-2014 Task 4, where the former task focused on extracting the attributes of each entity that have been evaluated in online reviews. This task has been decomposed into three slots; the first slot aims to identify the pair of entity (E) and attribute (A) in each review text. The second slot concentrates on extracting the Opinion Target Expression (OTE). The last slot is used to classify the sentiment for each pair of entity E and attribute A. In this study, we used Slot 2 for comparison purposes as it is more relevant to our research.
4.1. Evaluations with unsupervised baselines
In this experiment, we used the popular datasets of the customer reviews of four electronic products that were introduced by Hu and Liu [11]. The products involved in this study are two digital cameras, one cell phone and one Mp3 player. To evaluate the proposed work, two approaches are used as baseline methods for evaluation purposes, namely, the HAC approach proposed in Eirinaki et al.’s [13] study and Hu and Liu’s [11] feature extraction approach.
On the one hand, Hu and Liu’s approach considers the process of extracting infrequent product features using nearby opinion words as a complement to their work extracting frequent features by association mining. On the other hand, although the previous approach scores each candidate feature by its frequency in the corpus, the HAC algorithm scores each feature based on the number of adjectives close to the candidate feature. We cite two reasons for selecting these studies to evaluate the effectiveness of our approach. First, as part of their studies, these approaches used opinion words in an unsupervised manner to extract product features (as discussed in the ‘Literature review’ section in this study). Second, both studies share the same limitation of extracting a number of features that are irrelevant to the product domain.
The dataset of Bing Liu is chiefly focused on extracting all frequently appearing nouns and noun phrases in customer reviews based on association mining. By contrast, our study is concerned with explicit opinionated features (nouns/noun phrases) that have been described by several customers with positive or negative sentiments. Consequently, the evaluation process of this study considers only those review sentences that include opinions about the candidate product features. The details of the entire product datasets used in our experiments are shown in Table 6.
Details of review datasets
The performance of the proposed approach on each dataset is measured using the standard evaluation techniques of precision (P) and recall (R). Table 7 presents the experimental results of the baseline methods in addition to those of our approach.
Comparison of system results against baseline algorithms
HAC: High Adjective Count.
From the preceding experiments, the precision values of our approach outperformed the baseline approaches in all of the datasets. However, the proposed approach also exceeded the HAC algorithm in recall, precision and F_scores, whereas Hu and Liu’s work outperformed our approach in recall scores. The justification that could be introduced in this case is that Hu and Liu’s work chiefly depends on extracting frequently appearing nouns/noun phrases regardless of whether this noun/noun phrase is opinionated. Hence, several noun/noun phrases would be retrieved and regarded as product features only on the basis of their frequencies. For instance, in the customer review, ‘the day finally arrived when i was sure i’d leave sprint’, they consider the word ‘sprint’ as a product feature, which it is not. Similar customer reviews increased the recall values for Hu and Liu’s work. The only exception is the Cell phone dataset, where the F_scores of the proposed approach give better values than Hu and Liu’s. We presume that this is because the dataset contains many GPFs that have been described positively or negatively by many customers.
However, Yan et al. [46] indicated that the precision (P) measure was the most popular evaluation measure in sentiment analysis and was principally efficient at ranking problems. Similarly, in the feature extraction process, we are concerned with the capacity of the system to extract the most popular product features that have been discussed in customer reviews, in which these features are the most opinionated.
4.2. Evaluations with supervised baselines
Although our approach uses the unsupervised approach, this experiment is conducted to compare our proposed work of feature extraction with the supervised baseline system used in SemEval-2015 Task 12 of Aspect Based Sentiment Analysis (ABSA). The dataset SemEval 2015 (restaurant only) is used to measure the performance of our system. The original collection SemEval-2015 Task 12 Slot 2 used a restaurant dataset only for identifying explicitly mentioned aspects in Web reviews. The restaurant dataset consists of both training and testing reviews as shown in Table 8. As our proposed work is an unsupervised one, we have omitted the usage of the training dataset. Therefore, only testing dataset are used in evaluating the effectiveness of the proposed work.
Details of SemEval-2015 reviews dataset
Although the main objective of our approach is to focus on sentences that can be characterised as opinionated, that is, positive or negative opinions, the testing dataset of SemEval-2015 contains review sentences with aspect terms that are either opinionated or non-opinionated [29]. To make a fair comparison, we create a subset of the original datasets with only an opinionated aspect. This gives us a total of 1234 sentences for Restaurant reviews. Using this dataset, we partitioned the dataset into a set of review classes (D1, D2, D3) with different splits of training and testing datasets (50–50, 67–33, 80–20), respectively, as shown in Table 9. The purpose was to investigate the performance of our unsupervised approach to feature extraction with different sizes of review datasets; we also compared the results of our approach against the performance of the supervised baseline approach of SemEval-2015 Task 12 Slot 2.
Details of review sentences of restaurant dataset with different splits used in this study
In this study, the SemEval-2015 baseline has been trained with different splits of training datasets presented in Table 9, and both approaches are tested with the testing datasets of the domain of restaurants to get the results described in Table 10.
Comparison of our system results against SemEval-2015 baseline algorithm
The main objective of the SemEval baseline is to identify all the aspect terms that appear in the testing dataset based on the collected aspect terms from the training dataset; therefore, the performance of the SemEval baseline is based on the number of aspect terms identified from the training dataset; the results presented in Table 10 indicate that the values of the F-measure improve when the training dataset is increased. On the other hand, our approach outperformed the SemEval baseline with respect to the F-measure only when the testing dataset was large (617 review sentences) because the extraction process of the aspect terms is mainly based on the idea that the genuine product aspects should be opinionated by many customers in online reviews. As a result, our work depends on the ability of the sentiment lexicon (SentiStrength) to identify the opinion words in the review sentence, and based on these opinion words, the system is able to extract the surrounding potential aspects. In addition, N_gram analysis improves the dependency relationship between the opinion word and the product feature, which enhances the performance of the proposed work. Moreover, in Table 10, the values of precision in our approach outperformed the precision values of the SemEval baseline in all the review classes (D1, D2, D3) of the dataset because of the high specificity of extracting those aspect terms that have been described positively or negatively in the customer reviews. The higher precision scores indicate that most of the identified aspects in the customer reviews are genuine aspects. However, the low values of recall indicate that the system could not extract the majority of product aspects that appeared in the review sentences.
5. Conclusion and future research
In this article, we proposed a domain-independent approach for automatically extracting the opinionated features of a product from customer reviews. Two major criteria have been proposed to extract the GPFs. First, product features should be described positively or negatively by many customers to be considered GPFs; therefore, a weighting process for each candidate feature has been initiated to determine the extent to which a specific feature has been opinionated in the reviews. Second, the extracted features should be strongly related to the product domain. To verify this, we compute the degree of correlation between each candidate feature and the product name using the lexicographer files in WordNet. Based on these two criteria, we can select the most representative features of a product, thereby allowing potential customers to use these features as a guide for selecting the best product type that fits their preferences. Two types of comparisons have been conducted to test the performance of the proposed work. The first type of experiment is used to compare our system against two popular unsupervised approaches (HAC, and Hu and Liu). In these experiments, we used the datasets of four electronic products introduced by Bing Liu. In the second type of experiment, we used the SemEval-2015 Task 12 Slot 2 dataset of restaurants to compare the performance of the proposed algorithm with a supervised baseline approach supported by the SemEval organisers. In the experiments, we split the restaurant dataset into different splits of training and testing datasets to investigate the performance of both approaches with different dataset sizes. Compared with the SemEval baseline, the results of our approach are mainly based on the size of the dataset. In large datasets, the performance of our approach improves because the GPF is expected to be opinionated by many customers in the reviews. Overall, the results of our system with respect to both types of experiments show competitive and enhanced results in the task of extracting product features from customer reviews.
This work identified the explicit product features in customer reviews. In the future, we plan on developing the current research to extract implicit product features and thus support customers with a comprehensive set of product features. Furthermore, the utilisation of opinion lexicons in this work was limited to identifying opinion words. Subsequent research should efficiently employ these lexicons for sentiment classification of product features.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
This research was supported by Universiti Sains Malaysia (USM) Research University Grant (1001/PKOMP/811335: Mining Unstructured Web Data For Tour Itineraries Construction), USM.
