Sentiment analysis of customer data

Abstract

This paper presents an application of sentiment analysis on customer feedback data in the area of heavy equipment repair services. Sentiment analysis is used as a part of a framework for text mining-based Customer Loyalty Improvement Recommender System (CLIRS). In order to provide business users of the system with accurate predictions for customer satisfaction metrics, the original algorithm for the opinion mining needed to be improved. The paper presents the background of the proposed approach, the current techniques used to mine text data and existing applications of sentiment analysis. We propose an aspect-based, taxonomy-driven approach for customized sentiment analysis. The contribution of this paper is the implementation and evaluation of the proposed methods that improve the accuracy and coverage of the opinion mining algorithm. The improvements are illustrated with examples covered by the algorithm in the customer dataset. The application of the proposed methods resulted in increasing the algorithm’s accuracy from 92% to 96%, and coverage from 36% to 48%. This research is an attempt to handle well-known issues in natural language processing that are currently not handled by text mining algorithms, such as ambiguity and context, opinionated verbs/nouns, subject recognition from pronouns. This is significant because the proposed techniques are generalizable to any application that uses sentiment analysis algorithm.

Keywords

Sentiment analysis opinion mining action rules recommender systems visualization

1. Problem description

Sentiment analysis (opinion mining) is the field of study that analyzes people’s opinions, sentiments, evaluations, attitudes, and emotions from written language. It became one of the most active research areas in natural language processing (NLP), widely studied in data mining, web mining and text mining. Sentiment analysis systems are being applied in almost every business and social domain. Opinions are central to almost all human activities and are key influencers of our behaviors. As J.Ellen Foster said in 1893: “Sentiment is the mightiest force in civilization…”.

Fig. 1.

Web-based CLIRS system – step 1. Choosing a company to analyze recommendations for improving its customers’ loyalty. The data of a chosen entity is merged with the data of its semantic neighbors.

Sentiment-analysis technologies have many potential applications. In recent years sentiment analysis applications have spread to almost every possible domain, from consumer products, services, healthcare, and financial services to social events and political elections.

Sentiment analysis is usually part of a larger text analytics framework. Within our research, text mining and sentiment analysis is a part of a recommender system for improving customer loyalty.

1.1. Data-driven customer loyalty improvement recommender system (CLIRS)

This research is conducted on a customer dataset shared with our research group by a consulting company, which specializes in improving customer satisfaction/loyalty of their clients. Their clients are companies that provide repair services for heavy equipment. The dataset contains 400,000 records, where each record represents a structured questionnaire collected from the telephone survey of a randomly chosen customer of the client company. The data heterogeneity reflects different types of surveys, such as service, parts, rentals, etc. Additionally, the dataset describes 34 different companies scattered geographically across the United States and Canada (Fig. 1). The records also contain information about the company, details of the service or product being assessed by the survey, information about a surveyed customer, and survey details. The dataset contains mostly numerical scores for each question (called “benchmarks”), but also free-form text comments from the customers. Each survey record is labeled with a Net Promoter Score (NPS) status of a surveyed customer (Promoter/Passive/Detractor). Promoters are loyal customers of a company, who are likely to reference it to others. Detractors, on the other hand, are dissatisfied customers, who “detract” from the company’s reputation and are likely to not recommend it. There is also a group of customers called Passives, who are satisfied customers, but unenthusiastic about referring to others. The business goal is to increase a metric called Net Promoter Score (NPS), which is correlated with the revenue growth of a company. The NPS1

¹
NPS^®, Net Promoter^® and Net Promoter^® Score are registered trademarks of Satmetrix Systems, Inc., Bain and Company and Fred Reichheld.

is today’s industry standard for measuring customer satisfaction of a company ([20]). It is calculated as %Promoters − %Detractors. Increasing the NPS metric is challenging, especially at a certain level, and requires a fine-grained analysis of customer feedback.

The recommender system proposed in our research is built from the knowledge discovered through the data and text mining techniques ([39]). It utilizes data visualization techniques to improve the presentation of the recommendations and interaction with the system ([47]). Our unique approach quantifies the predicted NPS impact of the recommendations (Fig. 2). This quantification is based on the algorithm that calculates the NPS impact from the statistical characteristics of the extracted rules: confidence and support, and from the meta-actions that trigger these rules.

Fig. 2.

Web-based CLIRS system – step 2. The system allows for interaction with the user to adjust the feasibility of different business changes. In step 3 the system generates recommendations and ranks them according to the predicted impact on NPS and feasibility.

A new version of the recommender system is proposed, which is based entirely on the text data, the free-form comments of customers, which express their opinions. The rationale behind this work is that the business expects its customer feedback to be in the unstructured form primarily in the future. Our solution to this problem involves transforming the unstructured text data into a structured form using text mining techniques. Once we have transformed the text into a structured form, we will apply a previous numerical-based approach for recommendations. Therefore, the critical element of the method is the transformation procedure based on sentiment analysis.

1.2. The techniques for sentiment analysis

Sentiment analysis involves analyzing the subjective part of the text for its polarity – whether they denote a positive or negative opinion. Analyzing the opinion can be performed at three levels: (1) document-level – extracting the overall sentiment of an entire comment; (2) sentence-level – sentiment detected for each sentence of a comment; or (3) aspect-level – sentiment analysis in reference to certain aspects or features of the product/service, such as price or staff ([14]).

1.2.1. Document-level sentiment analysis

Document-level sentiment analysis is mostly conducted with supervised learning techniques (classification), but there are also some unsupervised methods applied ([10]). Sentiment classification can be formulated as a classification problem with two decision classes: positive and negative. Existing supervised methods, such as Naive Bayes, Support Vector Machines (SVM), can be readily applied to sentiment classification. The earliest work of automatic sentiment classification at the document level classified movie reviews from IMDB into two classes, positive and negative ([35]). Features used to classify can be terms and their frequencies, part of speech tags, opinion words, syntactic dependency, and negation ([34]). Besides the binary prediction of sentiment, there were also models predicting the rating scores of reviews ([33]). In that case, the problem was formulated as a regression problem, as the rating scores are ordinal, and solved using SVM regression. There were also unsupervised methods proposed for sentiment classification: Turney et al. performed the classification based on some fixed syntactic patterns that are likely to be used to express opinions ([54]). The syntactic patterns were composed of part-of-speech tags.

1.2.2. Sentence-level sentiment analysis

At the sentence level, each sentence in the document is analyzed and classified as either positive or negative. The methods are similar as in the case of document-level sentiment analysis. Sentence-level sentiment analysis can use rules, based on the clauses of a sentence ([26]). Sentiment classification does not try to find concrete features that were commented on. Therefore, its granularity of analysis is different from that of aspect-based sentiment analysis.

1.2.3. Aspect-based sentiment analysis

Opinions extracted at the document or the sentence level often do not provide the detail of the sentiment. These details are needed for some applications, which need opinions on certain aspects or features of the object (on what people exactly liked and did not like). Aspect-level sentiment analysis performs a finer-grained analysis. It is based on the idea that an opinion consists of sentiment (positive or negative) and a target of opinion. It helps to understand the sentiment problem better and to address mixed opinions, such as: “Although the service is not that great, I still love this restaurant”. This sentence has a positive tone, but in the service aspect, it is negative.

The major tasks in the aspect-based sentiment analysis are: (1) aspect extraction (feature identification); (2) recognition of polarity towards given aspect (positive/negative/neutral); (3) producing a structured summary of opinions about aspects, which can be further used for qualitative and quantitative analyses ([10]). The text mining process for web reviews involving aspect-based sentiment analysis and summarization is considered a pioneer work on feature-based opinion summarization ([19]). Three subtasks of generating feature-based summaries are defined: (1) identifying features of the product; (2) identifying review opinionated sentences; (3) producing summaries. The system crawls the web for the customer reviews and stores them in a database. Then, it extracts the most frequent features on which people expressed their opinion, using part-of-speech tagging. Association rule mining, based on the Apriori algorithm, is used to extract frequent itemsets as explicit product features. Frequent itemsets are itemsets that have the support of at least equal to minimum support ([9]). Secondly, opinion words are extracted using the resulting frequent features. Semantic orientations of the opinion words are determined based on WordNet and positive/negative word dictionary. The opinion sentences are identified as those that contain one or more feature words, as well as opinion words describing these features. Lastly, the orientation of each opinion sentence is identified and a final summary is produced. An opinion aggregation function is applied to determine the final orientation of the opinion on each object feature in the sentence. Our approach also handles negations and ‘but-clauses’.

Aspect extraction When domain knowledge is not available from the experts directly to build a domain-specific aspect dictionary, aspects need to be automatically extracted as the first step. There are four main approaches for the aspect extraction: (1) extraction based on frequent nouns and noun phrases; (2) extraction by exploiting opinion and target relations (syntactical relations); (3) supervised learning; (4) topic modeling/unsupervised learning. There exists a variety of methods for aspect extraction, such as word n-grams, bi-grams, word cluster, casting, POS tagging, parse dependencies, relations, and punctuation marks. Supervised learning techniques include Hidden Markov Models (HMM) and Conditional Random Fields (CRF). Topic modeling is an unsupervised learning method that assumes each document consists of a mixture of topics and each topic is a probability distribution over words. There were two basic models proposed: pLSA (Probabilistic Latent Semantic Analysis) and LDA (Latent Dirichlet allocation) ([4,18]). In another study, an unsupervised information extraction system called OPINE was developed ([36]). OPINE first extracts noun phrases from reviews and retains those with a frequency greater than an experimentally set threshold. Another work proposed a term extraction technique based on heuristics and selection algorithms. A multi-knowledge based approach for movie reviews and summarization was proposed ([62]). The method used the keyword list and dependency relation templates together to mine explicit feature-opinion pairs ([59]). Another method based on syntactical dependency relations was presented for extracting the product feature and identifying opinions that are associated with the product features in each sentence ([46]). Firstly, parsing and dependency analysis are performed as a pre-processing step. The reviews are parsed using the Stanford parser, resulting in a dependency tree ([11]). While parsing the sentence, noun phrases are identified as product feature candidates using linguistic patterns. Then, for each product feature candidate in every dependency parse tree, related opinion words are searched for, amongst adjectives and verbs. A set of candidate feature-opinion pairs is generated and then the probabilistic-based model, based on maximum entropy is used to predict the relevance of each feature-opinion relation. Additionally, the authors proposed using the product ontology to resolve the problem of incompatible terminology – different customers referring to the same product features using different terminology. The ontology contains encoded semantic information and provides a source of shared and precisely defined terms.

After aspect extraction, an optional step is to group them into synonymous categories, where each category represents a unique aspect. For example, “call quality” and “voice quality” refer to the same aspect for phones. The first method to handle this topic was based on several similarity metrics, defined with string similarity, synonyms, and lexical distances measured using WordNet ([6]). A more sophisticated approach used publicly available hierarchies/taxonomies of products and the actual product reviews to generate the ultimate aspect hierarchies ([60]). In another work, a semi-supervised learning method was proposed to group aspect expressions into user-specified aspect categories ([61]).

The sentiment strength The most important indicators of sentiments are sentiment words (opinion words). A list of such words is called sentiment lexicon (opinion lexicons). Opinion lexicons are resources that associate sentiment orientation and words. Over the years, researchers have designed numerous algorithms to compile such lexicons. These lexicons can be used not only for polarity detection but also for further supervised expansions of lexicons ([5]). Hu and Liu used a semantically labeled list of adjectives ([19]). This list was expanded with some nouns by Liu et al. ([24]). It consists of two lists: one has positive entries (2,003 entries in total) and another contains negative words (4,782 entries in total). The most popular sentiment dictionary is SentiWordNet, built on WordNet, a lexical database for the English language ([15]). In SentiWordNet, each sense of a word is assigned a pair of positive and negative polarity score ([2]). Each entry in SentiWordNet comprises all possible parts of speech the word can appear in, all the meanings (senses) corresponding to each part of speech and a pair of polarity scores associated with each sense. There are 28,431 sentiment-bearing entries (out of total 86,994 WordNet terms). The values for polarity scores are on a continuous scale ranging between 0 and 1. The default algorithm, described on the SentiWordNet website, calculates an overall polarity for each sense of a word as a positive score minus negative score. In the next step, it calculates a weighted sum of all the overall polarities for all senses of the word, with the weights defined as the ranks of senses. The polarity scores in SentiWordNet were generated automatically using a semi-supervised method ([13]). AFINN is a strength-oriented lexicon with positive words (564 in total) scored from 1 to 5 and negative words (964) scored from $- 1$ to $- 5$ ([31]). It also includes slang, obscene words, acronyms, and web jargon. MPQA subjectivity lexicon was created by Wilson et al. as a part of their system, called Opinion-Finder ([58]). The lexicon consists of positive (2,295), negative (4,148) and neutral words (424). NRC emotion lexicon is emotion-oriented lexicon created by conducting a tagging process on the crowdsourcing Amazon Mechanical Turk platform ([28]). The words are annotated by eight emotions: joy, trust, sadness, anger, surprise, fear, anticipation, and disgust. They are also classified into two polarity classes: positive (2,312 in total) and negative (3,324 in total). There are also words not associated with any emotional state and tagged as neutral (7,714).

Summary generation In most sentiment analysis applications, there is a need to study the opinions of many people due to the subjective nature of opinions. Some form of summary is needed. Therefore, it is usually the next step, after detecting aspect/opinion words and calculating polarity. It involves two tasks: (1) for each feature, associated opinion sentences are divided into positive and negative “buckets”, based on the calculated polarity. Optionally, the number of positive/negative comments about a particular feature can be displayed; (2) features are ranked according to the frequency of their appearances in the reviews ([19]). For the purposes of reviewing summarization, often a variety of visualization methods are deployed. In its very basic form, for example, Amazon displays an average rating and several reviews next to it. Mousing over the stars brings up a histogram of reviewer ratings annotated with counts for the 5-star reviews, 4-star reviews, etc. A variety of other techniques for sentiment visualization were proposed, such as bar charts, rose plots, box plots, and two-dimensional visualizations ([17,24,29]).

1.3. The tools for sentiment analysis

Edward Abbey once said: “Sentiment without action is the ruin of the soul”. Indeed, it is important to discuss real-world applications of opinion mining/sentiment analysis. Growing demand for text analytics tools has raised the profile of specialized vendors such as Attensity OdinText, Clarabridge, and Kana, which perform trended and basic root-cause analysis of customers’ comments ([8,32,51]). SAS Institute, IBM SPSS, SAP (Insight) and Tibco (Insightful) offer tools for analyzing text for predictive insights. Lexalytics, Nstein and Teragram, a division of SAS, offer text mining specialized for sentiment analysis ([43,49,50]). There are also solutions attempting at the recognition of the importance of issues based on voice audio recordings and volume analysis, such as Verint Systems ([55]). Rosetta Stone is a solution using IBM SPSS text analytics software to analyze answers to open-ended questions in surveys of current and potential customers ([37]). It uses the resulting insights to drive decisions on advertising, marketing, and product development, strategic planning as well as to identify the strengths and weaknesses of products. Choice Hotels and Gaylord Hotels both applied text analytics software from Clarabridge to quickly gather sentiment out of thousands of customer satisfaction surveys gathered each day ([8]). The software recognizes positive and negative comments and associates them with specific hotel locations, facilities, service, rooms, and employee shifts. The feedback results in an immediate customer service response (through calls or letters) to acknowledge and apologize for the problems. More important is that the system allows chain and facility managers to track trends to spot problems and best practices.

Besides business, opinions are of substantial significance in politics. The important application is understanding what voters are thinking ([22] and [30]).

Sentiment-analysis and opinion-mining systems also have an important role as enabling technologies for other systems. One such application is an augmentation of recommendation systems. Question answering is another area where sentiment proves to be useful.

1.4. Quantifying the economic impact of sentiment

Reviews influence both the purchasing decisions of other customers, who read the reviews, as well as decisions of product manufacturers regarding product-development, marketing, and advertising. However, the subjective perception of “the influenced” and the reality might differ. Therefore, a key element is to understand the real economic impact of sentiment expressed in surveys and reviews. The results of such analysis can be used by companies to estimate how much effort and resources should be allocated to address the issues.

There have already been economics studies conducted to find out whether the polarity has a measurable, significant influence on customers ([44,45]). The most common approach is to use hedonic regression to analyze the value and the significance of different item features to a function, such as a measure of utility to the customer, using historical data ([42]). Specific economic functions under examination include revenue, revenue growth, stock trading volume, etc. Another approach attempts to assign a “dollar value” to various adjective-noun pairs, adverb-verb pairs, and similar lexical configurations ([16]). It is important to note that different subsegments of the consumer population may react differently. Additionally, in some studies, positive ratings have an effect but negative ones do not, and in other studies, the opposite effect is seen. However, in most studies, a positive correlation effect is observed between survey polarity and economic effect, and the correlation is statistically significant ([1,3,7,12,25,27,53]).

2. Methodology

This section presents our approach for customer satisfaction improvement, which is based on mining customer data for actionable knowledge. We also propose a method for transforming the text data into a structured form of a data table, using opinion mining. This approach illustrates how text data can be mined for actionable knowledge, using the sentiment analysis approach. We also propose quantifying the effects of changes in the sentiment on the NPS metric.

2.1. Actionable knowledge discovery

The customer dataset is mined for actionable knowledge. We are using algorithms for action rule discovery. The action rule concept was firstly proposed by Ras and Wieczorkowska in 2000, and since then investigated further in application areas such as business, healthcare, music automatic indexing and retrieval ([39–41,48,57]). Action rules present a new way in the machine learning, which solves problems that traditional methods, such as classification or association rules, cannot handle. The purpose is to analyze data to improve understanding of it and seek specific actions (recommendations) to enhance the decision-making process. An action shows a way of controlling or changing some of the values of the attributes for a given set of objects to achieve desired results.

2.1.1. Action rules

An action rule is defined as a rule that describes a transition that may occur within objects from one state to another, with respect to the decision attribute, as defined by the user ([40]). Decision attribute is a distinguished attribute, while the rest of the attributes are partitioned into stable and flexible attributes.

In logic nomenclature, action rule is defined as a term: $[(ω) \land (α \to β) ⟹ (Φ \to Ψ)]$ , where ω denotes conjunction of fixed condition attributes often called the header of the rule, $(α \to β)$ are proposed changes in values of flexible features, and $(Φ \to Ψ)$ is an expected change of a decision attribute value (action effect). When applying an action rule definition to the domain of NPS improvement, the decision attribute would be Promoter Status, with values Promoter, Passive, Detractor. Let us assume that Φ means ‘Detractors’ and Ψ means ‘Promoters’. The discovered knowledge would indicate how the values of flexible attributes need to be changed, under the condition specified by stable attributes, so the customers classified as Detractors will become Promoters. The rationale behind applying an action rule discovery to customer data is suggesting a change in flexible attribute values to help “reclassify” or “transit” an object (customer) to a different category (Passive or Promoter) and consequently, attain better overall customer satisfaction.

Formally defined, an action rule is built from atomic action sets. Atomic action term is an expression $(a, a_{1} \to a_{2})$ , where a is an attribute, and $a_{1}, a_{2} \in V_{a}$ , where $V_{a}$ is a domain of attribute a. If $a_{1} = a_{2}$ then attribute a is called stable on $a_{1}$ .

By action sets, we mean the smallest collection of sets such that:

If t is an atomic action term, then t is an action set.

If $t_{1}$ , $t_{2}$ are action sets, then $t_{1} \land t_{2}$ is a candidate action set.

If t is a candidate action set and for any two atomic actions $(a, a_{1} \to a_{2})$ , $(b, b_{1} \to b_{2})$ , contained in t, we have $a \neq b$ , then t is an action set. Here b is another attribute $(b \in A)$ , and $b_{1}, b_{2} \in V_{b}$ .

By an action rule, we mean any expression

r = [t_{1} \Rightarrow t_{2}]

, where

t_{1}

and

t_{2}

are action sets. The interpretation of the action rule r is, that by applying the action set

t_{1}

, we would get, as a result, the changes of states in action set

t_{2}

. So, the action rule suggests the smallest set of necessary actions needed for switching from the current state to another within the states of the decision attribute. It is necessary to extract these kinds of actions to build an effective recommender system for business users, which will provide them with actionable suggestions for improving the company’s growth. The recommender system for improving NPS, that was developed in previous work, is driven by action rules and meta-actions to suggest proper business changes that are expected to improve the revenue of companies ([39]). Action rules, applied in this domain, show minimum business changes a company needs to implement to improve its NPS. A single action rule shows how to change a customer’s NPS status from Detractor/Passive to Promoter.

2.1.2. Meta actions

Our approach uses the concept of meta actions. Meta-actions are the triggers used for activating action rules and making them effective. The concept of meta-action was initially proposed in Wang et al. and later defined in Ras et al. ([38,52,56]). Meta-actions are understood as higher-level actions. While an action rule is understood as a set of atomic actions that need to be made for achieving the expected result, meta-actions are the actions that need to be executed in order to trigger corresponding atomic actions. The relations between meta-actions and changes of the attribute values they trigger can be modeled using either an influence matrix or ontology. In our domain, we assume that one atomic action can be triggered by more than one meta-action. A set of meta-actions triggers an action rule that consists of atomic actions covered by these meta-actions. Also, some action rules can be invoked by more than one set of meta-actions. The goal is to select such a set of meta-actions (M) which would trigger a larger number of actions and, the same, bring greater effect in terms of NPS improvement. The effect of applying M is defined as the product of its support and confidence: $(sup (M) \cdot conf (M))$ , which is a basis for calculating the NPS impact. Triggers aiming at different action rules are extracted from the customer comments that correspond to the action rules triggered by M. For example, for a rule described by: $r = [(a, a_{2}) \land (b, b_{1} \to b_{2})] \Rightarrow (d, d_{1} \to d_{2})]$ , where a is a stable attribute, and b is a flexible attribute, the clues for generating meta-actions are in the comments of records matching the description: $[(a; a 2) \land (b; b 1) \land (d; d 1)] \lor [(a; a 2) \land (b; b 2) \land (d; d 2)]$ .

2.1.3. Action rules for sentiment

The first task in applying action rules for the sentiment analysis is transforming the text data into a structured form. A data table is built by applying the transformation algorithm based on text mining. In such a table each row represents a comment given in a survey. Each column represents an aspect of service/product that is relevant for sentiment analysis. These aspects were defined by the business domain experts and encoded into a domain dictionary. Examples of such aspects in the considered domain of the equipment repair are “Service Quality”, “Technician Knowledge”, “Staff Attitude”, etc. A value in a row/column represents a sentiment value extracted for the aspect given by a column from the text comment given in a row. In other words, each survey is represented as a vector of numbers, where each number represents a polarity value of sentiment towards the aspect given by a number in a vector. The sentiment value can be positive or negative. Additionally, the sentiment can be neutral, strong or very strong. Therefore, the values in the table are on a discrete scale in the range $⟨ - 2; 2 ⟩$ . The values result from applying an aspect-based sentiment analysis algorithm on a comment. In the second step of our method, action rules are extracted from the opinion table. The mined actionable patterns have the structure as the example presented in Listing 1.

Listing 1.

Sample action rule for the aspect-based sentiment

The attributes of atomic actions in the rule represent aspects in sentiment analysis. The interpretation of the rule is that a certain action in sentiment towards a precise aspect needs to be undertaken, in order to change the customer from the detractor to the promoter status. In the example in Listing 1, if the sentiment towards service quality changes from very negative to negative, towards technician knowledge – from negative to positive, and towards price competitiveness – from negative to very positive, then a customer changes from being a detractor to being a promoter of a company.

In the third step of our method, the extracted patterns are incorporated into the knowledge base of the recommender system. Having incorporated the rules, the recommender system will quantify the expected impact of the recommendations on NPS.

2.2. The algorithm for sentiment mining

We adopted the following procedure for sentiment analysis and text summarization ([21]):

Identifying opinion sentences and their orientation with localization.

Summarizing each opinion sentence using discovered dependency templates.

Opinion summarizations based on identified feature words.

Generating meta-actions with regard to given suggestions.

The process of mining customers’ comments uses sentiment analysis, text summarization and feature identification based on guided folksonomy. The domain-specific dictionaries were built with the business domain experts. Our approach results in generating suggestions (sets of meta-actions), which are the basis of the built recommender system. Our approach for aspect-based sentiment mining follows the schema described earlier ([19]). We use aspect-based sentiment analysis, where we extract an opinion that consists of sentiment (positive or negative) and a target of the opinion, that is, a specific aspect or feature of the object. Therefore, our approach offers a more detailed analysis than the most adopted document-level or sentence-level approaches for sentiment analysis.

2.2.1. Opinion identification

The first step in our algorithm is identifying an opinion sentence, based on the occurrence of an opinion word. We use a dictionary (list) of positive and negative words (adjectives). Context (localization) is also considered in the algorithm, by using additional context dictionaries, specific to our domain. For example, in the comment “the charge was too high”, “high” is recognized, according to the standard adjective lists, as neither positive nor negative. However, this comment still presents an insightful opinion about discontent about pricing. Therefore, “high” is added to the context list of pricing as a negative.

Fig. 3.

Web-based CLIRS system – step 3. The user can analyze fine-grained comments that were used by the system to generate the recommendations. The comments are summarized with regard to the aspect (e.g. “service done correctly”) and divided into positive and negative.

2.2.2. Segmentation using syntactical dependency

In the next step, the sentences, identified as opinionated in the first step, are aggregated into segments. Feature-opinion pairs are generated based on grammatical dependencies between features and opinion words. We use the Stanford NLP library for the recognition of the grammatical dependencies ([11]). A dependency relationship describes a grammatical relation between a governor word and a dependent word in a sentence. There is wide coverage of different dependency templates (about 50 defined dependencies). Therefore, we are able to detect the most occurring syntactical relations associated with opinion words. Additionally, we added code that detects negation and ‘but’-clauses to our algorithm.

2.2.3. Aspect identification

Having extracted segments in the previous step, aspects’ feature words are identified using the supervised pattern mining method ([23]). The parts-of-speech tags (POS) in the used NLP library and aspect dictionaries are used in the aspect identification step.

2.2.4. Segment clustering

Opinion summarizations are used in many sentiment analysis works to generate a final review summary about the discovery results on feature and opinions mining and rank them according to their appearances in the reviews ([19]). In our approach, we remove the redundancy of extracted segments and cluster segments into different classes. The feature clustering is based on the domain-specific dictionary of seed words/phrases. To cluster a segment into the corresponding class, the list of seed words is checked whether it contains the feature word or the base form of its feature.

2.2.5. Generating the sets of meta actions

To generate meta-actions, each feature class is divided into several subclasses. Each subclass is related to the specific aspect of that feature. These aspects have been defined in the domain-specific dictionaries. Correspondingly, the last step of the algorithm is generating sets of meta-actions. These are the actual output recommendations provided to the business users. In our approach, we also display comments that correspond to the meta actions (Fig. 3). It allows for a fine-grained analysis of the problem. Additionally, we use visualization techniques in the form of an expandable table to display positive and negative opinions per each aspect. Each comment is additionally annotated by its survey ID. Therefore the comment can be tracked in its reference to the specific survey and its context. The negative opinions displayed in our visualization provide insight into issues that need to be addressed, and the positive provide insight into issues that should be reinforced.

2.3. The improvements in the sentiment mining algorithm

The goal of improving sentiment mining is to provide a better performance of the text-mining based recommender system. This subsection describes, in more detail, the proposed methods and the changes introduced to the original algorithm of opinion mining. The following strategies are proposed in our research to improve the sentiment mining algorithm in terms of accuracy and coverage:

2.3.1. Adding opinion dictionaries

The original algorithm was based on the adjective list only. We added new dictionaries for sentiment: SentiWordNet and AFINN, described in the previous section. By using additional dictionaries we expect to increase coverage of the algorithm, in particular, the coverage of opinion words.

2.3.2. Using nouns and verbs as opinion words

Traditionally, in most opinion mining algorithms, adjectives and adverbs are used as opinion words. However, there are also examples of comments, where verbs and nouns are the indicators of the sentiment. We added new dictionaries – SentiWordNet and AFINN, which contain not only opinionated adjectives but also verbs and nouns with an assigned polarity score. We added code to the algorithm that handles nouns and verbs as opinion words. One potential problem in this approach is that if a word is first detected as an opinion word, it will not be considered further in the algorithm as an aspect word. The alternative strategy is to change the sequence of steps in the algorithm to detect the feature word first and the opinion secondly.

2.3.3. Increasing the sentiment polarity scale

The same as in the original algorithm, we adopt a dictionary-based sentiment analysis for both the opinion and aspect recognition. However, the new algorithm needs to detect not only the polarity of sentiment (negative or positive) but also its strength. Expanding the sentiment scale from ${- 1, 0, 1}$ to ${- 2, - 1, 0, 1, 2}$ will allow differentiating strongly opinionated comments. This will result in extracting more precise actionable knowledge. For example, the action rules will indicate the change of sentiment from the strong negative ( $- 2$ ) to negative ( $- 1$ ). The recognition of polarity strength is enabled with the new dictionaries. These additional dictionaries use a scale for polarity strength, instead of just assigning a negative/positive label. Additionally, we added to our algorithm the code for detecting words that indicate a strong opinion, for example, “really” or “very”. We assume that if an aspect was not mentioned in a comment, a customer is neutral towards this aspect. We assign a default value “0”, which denotes neutral sentiment. We assign a NULL value to the aspect that is not relevant for a particular type of survey. For example, “Service Completeness” is not relevant to the survey that asks about parts. We propose a new strategy for polarity detection. We incorporate changes by adding new sentiment dictionaries. We also add code to handle nouns and verbs as opinion words. Here, we propose a three-step opinion detection: (1) check an adjective list; (2) check AFINN; (3) check SentiWordNet for an opinion word. We use a combined approach of three dictionaries to increase the coverage of opinions in the text. SentiWordNet and AFINN additionally assign the polarity score to a word. Each synset (a word in a context) in SentiWordNet is assigned a positivity score (PosS) and negativity score (NegS). The polarity score for a word in a context is calculated as PosS − NegS. The polarity score for a word in the thesaurus is calculated as a weighted average of the scores of its synsets. This score has a continuous value from the range $⟨ - 1; 1 ⟩$ . Since in our approach we are using the discretized range $⟨ - 2; 2 ⟩$ , we have used the following mapping from the continuous to the discretized values:

$⟨ - 1; - 0.5) \to - 2$

$⟨ - 0.5; - 0.05) \to - 1$

$⟨ - 0.05; 0.05 ⟩ \to 0$

$(0.05; 0.5) \to 1$

$(0.5; 1) \to 2$

AFINN contains 564 positive and 964 negative words ([31]). Each word in AFINN list is assigned a discrete value in the range:

⟨ - 5; 5 ⟩

. Since the polarity scale used in our approach is

⟨ - 2; 2 ⟩

the following mapping is proposed:

$⟨ - 5; - 4 ⟩ \to - 2$

$⟨ - 3; - 1 ⟩ \to - 1$

$0 \to 0$

$⟨ 1; 3 ⟩ \to 1$

$⟨ 4; 5 ⟩ \to 2$

3. Experimental results

3.1. Test data

The experiments were conducted on the subset of data from 2016 with about 80,000 records. In the built recommender system we use the column Notes for Promoter Score as a primary source of text data. An alternative strategy to maximize the text content available is to concatenate all text data available for each record. The text from all the “Notes” columns can be merged for that purpose – Interviewer Notes, Resolution Notes, General Notes and Notes Benchmark for each benchmark. Even then, about 6,000 records (7.5%) have no associated text comments. To compare the machine versus human sentiment recognition we have identified a representative subset of 70 text comments. We have chosen comments from 35 customers identified as Promoters and 35 comments from Detractors of one company. Each record was manually annotated in terms of a sentiment value towards each aspect that was mentioned in a column.

3.2. The tested algorithm

The procedure, that tests the modified algorithm for opinion mining in the experimental setup is described by the following steps:

Preparing the file with text comments (XLS or XLSX). Each row represents a text comment from the “Notes for Promoter Score” column of the original dataset. Alternatively, the text can be a concatenation of all the “Notes” columns in a survey.

The file with text is preprocessed – file reader iterates through rows in the spreadsheet.

Processing the current comment, which may contain many sentences.

Processing the current sentence using Stanford Parser Treebank Language Pack ([11]).

Tagging the words in the current sentence with the Part-of-Speech labels using Stanford POS tagger ([11]).

Creating dependency list – grammatical dependency relations based on predefined templates in Stanford package, GrammaticalStructureFactory ([11]).

Identifying opinion words in a sentence, using: opinion word lists (Hu&Liu / AFINN/ SentiWordNet), negations list (“not”, “neither”, etc.), conjunctive words lists (“and”, “but”, “therefore”, etc.), strong words list (“really”, “very”, “much”, etc.), strong positive (“best”, “great”, “excellent”, etc.) and strong negative words lists (“worst”):

Check if the current word is a negation (if yes, set index for negation word).

Check if the current word is conjunction (if yes, set index for conjunction word).

Check if the current word is a strong opinion words (if yes, set index for a strong word).

Check if a word is in the strong positive/strong negative list set the polarity accordingly to 2 or $- 2$ and change polarity accordingly for the cases of negation or conjunction.

Check if the current word is present in positive/negative adjective list:

Set polarity accordingly to the list: $+ 1$ if the word found in the positive adjective list or $- 1$ if in the negative adjective list.

Consider cases of negation, comparative forms of adjectives and conjunction to change the original polarity.

If a valid strong opinion word is found in relation to the adjective, increase polarity strength to $+ 2$ or $- 2$ .

If a word is not found in adjective lists, look for its presence in AFINN dictionary:

If a word is found in AFINN, retrieve its polarity and use mapping to $⟨ - 2; 2 ⟩$ .

Consider cases of negation, comparative forms of adjectives and the conjunction to change the original polarity.

If a valid strong opinion word found in relation to the adjective, increase polarity strength to $+ 2$ or $- 2$ (if the previous polarity was $+ 1$ or $- 1$ ).

If a word is not found in adjective lists nor in AFINN, look for it in SentiWordNet dictionary:

If a word is found in SentiWordNet, the case for adjectives and adverbs:

Retrieve the polarity from the SentiWordNet dictionary considering the word POS tag, and use mapping function to convert the continuous numbers from $⟨ - 1; 1 ⟩$ to $⟨ - 2; 2 ⟩$ .

Consider cases of negation, comparative forms of adjectives and adverbs and conjunction to change the original polarity.

If a valid strong opinion word is found in relation to the adjective, increase polarity strength to $+ 2$ or $- 2$ (if the previous polarity was $+ 1$ or $- 1$ ).

The case for verbs and nouns as opinion words: retrieve the polarity from the SentiWordNet using the POS tag and map to the scale. For verbs, find its base form – and look in the dictionary for all its possible base forms. Consider cases of negation, conjunction and strength words to change the original polarity.

Opinion sentence summarization – finding words related to the found opinion word, using dependency relations.

Finding a feature keyword related to the opinion word based on the previous step’s results.

Feature aggregation – assigning the segment to the feature category.

Summarizing results – grouping segments by orientation, feature classes.

Generating meta-actions from oriented segments.

3.3. Evaluation metrics

The metrics of our primary interest with regard to evaluating the sentiment analysis algorithm are:

Accuracy – measured as the number of correctly recognized and classified opinions by the number of all opinions extracted.

Coverage – the number of opinions extracted divided by the total number of comments (in this experiment, 70 comments).

Weighted measure – a measure combining two previous measures, calculated as 0.5 ∗ Accuracy + 0.5 ∗ Coverage. The assumption is that these two metrics are equally important in the overall assessment of the sentiment analysis algorithm.

We assume that human can recognize sentiment with 100% accuracy and can recognize each existing sentiment (maximal coverage). The assumption is that the maximal coverage for the dataset is not necessarily 100%, as not all the comments are opinionated or contain the opinion about any of the aspects. A comment is assumed to be covered if at least one opinionated aspect in the comment was recognized (coverage per comment). In the second test case, coverage is measured by dividing by an actual number of opinionated aspects (coverage per opinion).

3.4. Test cases

The base case is the human sentiment recognition (Hum). The original algorithm for sentiment analysis is based on adjective lists only (Adj). The third case tested is based on the SentiWordNet dictionary only. This test case is further divided into subcases with/without using nouns and verbs as opinion words (correspondingly S+V/NN/S−V/NN). The fourth test case is based on AFINN dictionary only (AFINN). Each case was tested in isolation, using each dictionary separately. Then, a combined strategy involving all three dictionaries was designed and tested (All). The combined strategy is described in more detail in the previous subsection.

3.5. Results

In our test data, 62 out of 70 comments were opinionated. Therefore, the coverage of the base case (Hum) is 89% (see Table 1). In Hum the sentiment is recognized with 100% accuracy. The original algorithm (Adj) covered only 20% of the comments (Cov -b). It failed to recognize the sentiment in 48 out of 62 comments that were opinionated. The accuracy of the original algorithm was 71% (Acc -b), recognizing correctly 10 out of 14 comments it covered. Table 1 compares the results for different approaches to sentiment mining. It shows metrics before (-b) and after (-a) introducing additional modifications to the algorithm. The changes in the algorithm and dictionaries resulted in covering twice as many comments as before (Cov -a). The accuracy improved from 71% to 93% for Adj (Acc -a). When comparing different dictionaries, AFINN proved to have the worst coverage, but it was very accurate. Adj and S+V/NN have similar coverage – 39%, but the latter has worse accuracy. Therefore, its weighted metric is 60% vs 66% for Adj. S−V/NN covered less (31%) but was more accurate than the version with verbs and nouns as opinion words (95% versus 81%). Based on these results a final strategy was adopted for the combined approach. The current word is checked first in adjective lists, secondly in AFINN, and thirdly in SentiWordNet. Lastly, the strategy checks for verbs and nouns as opinion words (see subsection “The tested algorithm” for the detailed description). The combined strategy resulted in 43% coverage and 83% accuracy. The coverage increased but at the cost of accuracy. Table 2 presents results comparing different approaches, with metrics calculated based on the number of opinionated aspects, rather than on the comment-level. The combined approach (All) resulted in 38% coverage and 95% accuracy in recognizing all opinionated aspects across all comments. As previously, the approach based on the adjective list resulted in the highest coverage (36%) and accuracy (92%). The second best, in terms of coverage, was S+V/NN (33%). At the same time, this approach was worse in accuracy than Adj (88% vs 92%), S−V/NN (96%) and AFINN (95%). In the third experimental setup, additional adjustments were introduced to both the dictionary and the algorithm. Table 3 presents the final results and compares the accomplished machine sentiment recognition with human recognition (Hum). By introducing modifications, the coverage increased to 57%, as calculated per comment, and to 48%, as calculated per opinion. However, the accuracy decreased to 88%, as calculated per comment. On the other hand, when calculating per opinion, it increased to 96%. Overall, the algorithm was improved significantly. The weighted metric increased from 63% to 72%, when measuring per comment, and from 67% to 72%, when measuring per opinion. On the other hand, there is still a gap versus human recognition. Therefore, there is room for further improvements (see the next section “Discussion”). Another conclusion from the experiments is that is quite challenging to improve both coverage and accuracy at the same time. Usually, when improvements bring higher coverage, this is at the cost of precision, and vice versa. In general, we observed improvement in relation to the initial algorithm: the accuracy was improved from 92% to 96% and coverage from 36% to 48%.

Table 1
Comparing the accuracy and coverage for the sentiment analysis using different approaches – metrics calculated per comment

Metric Hum Adj S−V/NN S+V/NN AFINN All

Cov -b 89% 20% 16% 19% 11%

Cov -a 89% 39% 31% 39% 30% 43%

Acc -b 100% 71% 73% 46% 63%

Acc -a 100% 93% 95% 81% 100% 83%

Weight. 95% 66% 63% 60% 65% 63%

Metric	Hum	Adj	S−V/NN	S+V/NN	AFINN	All
Cov -b	89%	20%	16%	19%	11%
Cov -a	89%	39%	31%	39%	30%	43%
Acc -b	100%	71%	73%	46%	63%
Acc -a	100%	93%	95%	81%	100%	83%
Weight.	95%	66%	63%	60%	65%	63%

Table 2

Comparing the accuracy and coverage for the sentiment analysis using different approaches – metrics calculated per opinion

Metric	Hum	Adj	S−V/NN	S+V/NN	AFINN	All
Cover.	99%	36%	28%	33%	24%	38%
Accur.	100%	92%	96%	88%	100%	95%

Table 3

The results of the final combined strategy to sentiment analysis, compared to human recognition

Metric	Human	All – per comment	All – per opinion
Coverage	89%/99%	57%	48%
Accuracy	100%	88%	96%

3.5.1. Impact of improved sentiment on recommendations

Other experiments evaluating the improvements were related to (1) measuring the sparsity of the data table built from the opinion mining (Table 4 and Table 5); (2) action rule mining (Table 6). These two aspects are important with regard to the built text-based recommender system. The changes introduced to the sentiment algorithm primarily aimed at decreasing the sparsity of opinion table, which results from the transformation based on sentiment mining. This data table is further used for action rule mining, which is consequently used to generate recommendations for customer loyalty improvement. Therefore, higher accuracy and coverage of the opinion mining results in a greater predicted impact of the recommendations on NPS.

Table 4
The sparsity of opinion table before and after modifications of the opinion mining algorithm – the case for the dataset of company 16

Sentiment Sp-b Sp-a SpT-b SpT-a

−2 0% 4% 0% 0%

−1 0.9% 4% 0.1% 0.3%

1 12.3% 42.5% 0.8% 2.8%

2 0% 4% 0% 0%

All 13.2% 46.9% 0.9% 3.1%

Sentiment	Sp-b	Sp-a	SpT-b	SpT-a
−2	0%	4%	0%	0%
−1	0.9%	4%	0.1%	0.3%
1	12.3%	42.5%	0.8%	2.8%
2	0%	4%	0%	0%
All	13.2%	46.9%	0.9%	3.1%

The sparsity of the opinion table The sparsity of the opinion table is calculated as the number of the cells with non-NULL values divided by a total number of cells. The non-NULL value means that sentiment towards an aspect (column of the cell) was recognized by the algorithm applied to the comment given in a survey (row of a cell). Initially, the resulting opinion table was very sparse, with about 1% values present. After modifications to the sentiment analysis algorithm, the sparsity was reduced (3%). Tables 4 and 5 present the details on the comparison of sparsity before (-b) and after modifications (-a), for the client company 16 and 3, respectively. The actual names of the companies are confidential. Sparsity (Sp) is calculated by dividing by the number of rows in the table. The total sparsity (SpT) is calculated by dividing by the total number of cells in the table. The percentage value in the tables is interpreted as the relative occurrence of the extracted opinions. Tables present details for each sentiment value (in range $⟨ - 2; 2 ⟩$ ) and in general for all sentiment values (All).

The results show that the sparsity of opinion tables for both cases (company 3 and company 16) was reduced significantly, about three times versus the initial sparsity.

Table 5

Sparsity of opinion table before and after modifications of the opinion mining algorithm – the case for the dataset of company 3

Sentiment	Sp-b	Sp-a	SpT-b	SpT-a
−2	0%	0.1%	0%	0.01%
−1	0.9%	2.5%	0.06%	0.17%
1	11.2%	44.1%	0.7%	2.9%
2	0%	4%	0%	0.01%
All	12.0%	46.8%	0.8%	3.1%

Table 6

Coverage of action rules extracted in the previous approach (-ben) and in the new approach from the opinion table, before and after introducing the modifications to the sentiment mining algorithm. Coverage, times of rule extraction, the total number of rules were measured for Client3 and Client16 datasets

Test case	Time	Nr rules	Cover	Cover %
Client3-ben	50 min	14,769	98/115	85.2%
Client3-before-no0	12 sec	26	9/115	7.83%
Client3-after-no0	19 sec	155	17/115	14.78%
Client3-before-with0	1.5 min	348	12/115	10.43%
Client3-after-with0	6 min	2,123	18/115	15.65%
Client16-ben	4min	5,153	50/61	85.2%
Client16-before-no0	4 sec	113	3/61	4.92%
Client16-after-no0	6 sec	288	8/61	13.11%
Client16-before-with0	23 sec	1,181	5/61	8.2%
Client16-after-with0	1.5 min	5,596	10/61	16.39%

3.5.2. Action rules coverage

To test the quality of the extracted actionable knowledge from the opinion tables, the coverage of the extracted pattern was measured. The coverage of action rules is understood, here, as the number of distinct customers in the dataset matched with the extracted rules. There were several scenarios tested (Table 6). The main goal of this experiment was to compare the coverage of rules when working with numerical data tables with question scores (benchmark tables -ben), with the new approach of using a data table that results from the transformation of the unstructured data. The second goal was to compare the coverage before (-before) and after (-after) introducing the proposed changes to the sentiment mining algorithm. The test cases also differentiate between using the default (0) sentiment value (-no0/-with0) in the opinion table. When counting the default neutral sentiment values (-with0), the initial coverage of action rules was 10.43% and 8.2%, for company 3 and 16, respectively. Without the default sentiment value, the initial coverage of the rules was 7.83% and 4.92%, respectively for these companies. These numbers were significantly lower than in the previous approach (-ben), where action rules covered 85.2% customers for both companies. After introducing modifications to the opinion mining algorithm, the coverage of the action rules increased to 14.8% (-no0)/15.7% (-with0) and 13.1%/16.4% for company 3 and company 16, respectively. This is considered significant improvement, however still much lower than in the case of benchmark tables.

4. Discussion

This section discusses the examples of text covered by our approach, identifies remaining challenges and concludes with the main results within this study and future work. First, the enhancements introduced to the sentiment mining algorithm are discussed with examples of text from the tested dataset.

4.1. Verbs and nouns as opinion words

Using verbs and nouns as opinion words, based on polarity calculated in SentiWordNet, resulted in covering more comments and patterns. For example, the following comment was covered: “He stated the timeliness of the service and the friendly personnel. He worked with Rick from the $⟨ confidential ⟩$ , MO location”. The frequent pattern in the comments is “He stated + $⟨ noun ⟩$ ”, as an answer to the telephone questions “What made you Promoter”? Therefore, using nouns as opinion words brings significant improvement in the overall coverage. Another common phrasing is “lack of $⟨ aspect ⟩$ ”. Such a pattern should be handled to cover negative opinions, as “lack” is a noun with assigned negative polarity. One potential problem with the approach using verbs and nouns as opinion words is that, in the sentiment dictionaries, many nouns and verbs are labeled as neutral (have assigned the polarity of “0”). Indeed, most verbs and nouns do not bear any opinion and simply state the subject or the action in the sentence. Another problem is that, with the currently implemented text mining algorithm, when a word is recognized as an opinion word, it will not be considered as an aspect word. This may result in the actual worsening of the algorithm’s coverage and accuracy. One solution to the problem would be changing the algorithm to detect the feature word first, and the opinion word, related to it, secondly. Another problem was that SentiWordNet does not contain entries for verbs in different forms. For example, in the comment: “…that they are not completing everything that was needed for the service”, the word “completing” is not recognized (although “complete” has an assigned positive polarity). The implemented solution to this problem was to add a pre-processing step of extracting the base form of the verb.

4.2. The sentiment strength

Another important change was adding the detection of the strength of the sentiment. It was implemented with: (1) introducing dictionaries that assign the scale of opinion polarity to words; (2) detecting words that “strengthen” the polarity of words – for example “very”, “really”, etc. Examples of the comments with strong sentiment recognized by our algorithm are printed below:

“Chuck stated that he wasn’t there when they did it, but the mechanic who was stated that the technician did a very good job” . → did very good job:2

“Mike stated the field mechanic was very knowledgable and a great guy ”. → great guy:2

“He stated that the technician was very friendly and handled the difficult situation with the location of the equipment repair very well and did a great job .”.

→ technician very friendly: 2

→ did great job: 2

→ handled difficult situation: $- 2$

“Paul said that the service man, Chris, was very friendly , knowledgeable and “on top of it”.” → very friendly: 2

“He stated Charlie is a really good mechanic .” → really good mechanic: 2

4.3. Dealing with ambiguity and context

Some words cause ambiguity and depend heavily on the context. For example, initially, the word “good” was assigned to the category “Technician Attitude”, in the context of the “Technician” feature (e.g. “good mechanics”). However, after a more careful semantic analysis of the comment, it was concluded that the opinion holder meant rather “Technician Knowledge”. Therefore, the word “good” was reassigned to this category in the context of “technician”. Another example from the analyzed test dataset was: “He stated the guys are knowledgeable and the work was good”. The algorithm detected the word “guys” as the keyword for staff, and the generated meta-action was: “staff knowledgeability”=1. The same happened with the example: “Mike stated the field mechanic was very knowledgable and a great guy .”, which resulted in the opinionated segment: “great guy”=1. Again, the initial assignment of the word “guys” was to the category of staff instead of technicians. However, taking into account colloquialisms used by customers, most likely they meant technicians, by saying “guys”. This initial assignment resulted in lower accuracy when comparing with the human labeling of opinions. Therefore, the “guys” word was reassigned to the category “technician” to improve the accuracy of the algorithm.

4.4. Two opinion words for one aspect

The experiments revealed cases, where two opinion words were used to describe one feature, e.g. “ prompt and courteous service”. Initially, the algorithm considered the word to be an opinion word only if the two previous words were not opinion words. However, as in the case described above, two opinion words connected by “and” can describe two different aspects of one feature. In the example given above, the first word relates to “Service Timeliness”, while the second opinion word to “Service Quality”/“Staff Attitude”. Consequently, the algorithm was modified to drop this initial constraint, which resulted in increased coverage.

4.5. Expanding dictionaries for aspects

In the proposed approach, dictionary-based feature recognition is used. It means that the detection of feature words related to the discovered opinion words is based on the predefined libraries of seed words for features and more fine-grained aspects of the features. These were developed manually by looking through a large sample of comments. However, they are scalable and can be expanded as the uncovered examples are identified. For example, a common word occurring in comments is “experience”: “He stated this past experience was not great”. The opinion was initially not recognized, because “experience” was not a seed word for any of the features/aspects. However, it was observed that it was a commonly used word to express an opinion about general service. Therefore, the word “experience” was added to the “Service General” feature category, which corresponds to the “Service Quality General” aspect. Another common phrase is “do a good job”, but was not initially covered as an opinion. Therefore, it was assigned to the aspect category “Service General”. As mentioned above, the dictionaries for the feature and aspect seed words can be dynamically updated. The same refers to categories themselves: they can be added or redefined. For example, one of the change, resulting from the conducted empirical research, was adding the following aspects for the “Service General” feature: “Service Quality General” and “Service Timeliness” (“Service Completed Timely”).

4.6. Dealing with pronouns

Another problem is that the opinion is often expressed in relation to pronouns: “they”, “he”, etc. For example, the sentence: “ They work efficiently” is not covered, because the word “they” is not assigned to any of the feature classes. Similarly, opinions are often expressed in relation to words denoting the names of the companies, “dealer”, etc. To solve this problem and increase the coverage of the algorithm, such words were added to the “Service” category. “Personnel” was added to the ‘Staff” category. However, this created a new ambiguity. For example, in the comment “Dave states that they are friendly, helpful and knowledgeable”, “they” relates rather to the “staff” or the “technician” category. The same applies to the comment: “John shared that they do fine and there is no question; they all treat him well”. Therefore, although this solution improved the algorithm’s coverage increases, in some cases it became less accurate.

4.7. Consistency of aspect dictionaries

Other fallacies of the algorithm were identified and resolved. For example, each keyword in the feature category must correspond to an aspect category. Otherwise not all opinionated segments will result in generating recommendations. For example, for the segment printed below, no corresponding meta-actions were generated: subSegments features after aggregation: (stated machine new=machine,service, having condition fixed=„ they already having=having„ fixed on unit=fixed,service)

segment orientation:stated machine new: 1.

The reason was that the seed word “machine” for the “service” feature was not defined in any aspect of the service. The same case was for the segment: “Stated that they provide fast service” – it results in the opinionated segment: “provide fast service: 1”, but not in any meta-action, as “fast” was not defined as a seed word for the category “Service Timeliness”. The last but not least, it was observed that there are opinions about parts in the service surveys, for example: “He stated parts availability was the problem ”, “He stated this location needs more parts available on the shelf”, “ Parts purchase was trouble free’”. However, there are separate dictionaries defined for different categories of surveys: “Service” and “Parts”. The former does not contain features and aspect libraries for parts. Therefore, although these comments contain opinions, they are not related to any of the feature words, as the comments were given in the service surveys. Also, in the last example, “trouble free” was not correctly classified in terms of polarity – “trouble” was detected as a negative opinion word. An alternative strategy is to handle phrases “adjective + free” as the negation.

4.8. Adjusting sentiment dictionaries

Experiments revealed that sentiment scores are not accurately calculated for all the cases. For example, SentiWordNet dictionary uses a semi-supervised method was to label words with the sentiment. The word “good” has a calculated weighted polarity score of 0.63 (which is mapped to 2 in our scale), while the word “great” – 0.25, which after mapping corresponds to 1. Therefore, “good” results in being stronger positive than “great”. Such inconsistencies are solved by adjusting the sentiment dictionary manually. In our system, we changed the entry in the dictionary for the word “good”. The PosScore was changed from 0.75 to 0.25. A similar case was with the adverb “well” (its mapped sentiment based on SentiWordNet was 2).

4.9. Remaining challenges

Although sentiment words are important for the opinion mining algorithms, building systems based only on them will not give good results. Natural language recognition is much more complex, and involves the following issues:

Words can have opposite orientations in a different application domain.

Sentiment word might not express an opinion in question (interrogative) sentences and conditional sentences (e.g. “Can you tell me which Sony camera is good?, “If I can find a good camera in the shop, I will buy it”).

Sarcastic sentences are hard to deal with.

Sentences without sentiment words can also imply opinions (e.g. “This dishwasher uses a lot of water”).

Sentiment analysis, as each NLP task, must handle coreference, negation, disambiguation, comparative sentences. Despite the introduced change and improvements, the coverage is still unsatisfying. Correspondingly, the coverage of the action rules extracted from the table built on opinion mining leaves still room for improvement. This section discusses identified challenges with illustration on the examples from the customer dataset that are poorly handled by the algorithm.

Non-standard opinion patterns Generally, all the opinions that do not represent the “opinion word + feature aspect” pattern are not detected. This is a significant group of comments that were not recognized by the algorithm. Example of such comments, currently not handled by the algorithm, are:

Expressing opinions by describing the situation/incident (“storytelling”) without using actual opinion words – humans can infer the sentiment from the story.

Implicit opinions – similar to the case above, but more general. For example, humans can use objective words like numbers to express implicit opinions.

Complex and comparative sentences – resulting from the limitations of the syntactical dependencies recognition.

An opinion and an aspect in one word, for example, “repaired”.

Using opinion words for the purpose other than expressing opinions, for example for expressing the desired state or expectations.

Below are discussed the examples of comments from our dataset that proved problematic for the algorithm.

Complex and comparative sentences The sentiment recognition is limited to the syntactical dependencies that can be recognized. The experiments revealed that some syntactical relations are not recognized, especially in complex and comparative sentences. For example, in the sentence “…he had an issue with the time that was charged on his invoice ”, “issue” is not recognized together with the keyword “invoice”. It is because of the complexity of the sentence, where the opinion is expressed in a different part of the sentence than the subject of the opinion. Here, the algorithm makes a mistake and associates “issue” with “time”. The phrase “have an issue with…” is an example not handled by the algorithm. Customers often use expressions such as “the issue is…”, “the problem is…”: “Doug said the biggest issue he has is their sense of urgency ”, “The invoicing and prices were an issue as well”, “Bill stated that there have been a lot of communication issues ”, “He stated parts availability was the problem ”). One solution to cover such opinions would be by adding the words “issue” and “problem” as a negative opinion word in the context of the feature they talk about. Also, comparative sentences such as Matt stated that he was invoiced for more than the price he was quoted for this service, proved to be quite problematic to detect algorithmically.

Implicit opinions Implicit opinions can be expressed without using explicit opinion words. Examples of implicit opinions in the dataset are: “He stated that they had to make two trips and they charged him $500 , Mike said that it is four hundred dollars for $⟨ confidential ⟩$ just to come there to him, before they do anything, …”. Here, charging a particular amount of money means negative sentiment. Another example: “It should have not taken an hour and a half to make a hose and put it on. Nick said he has seen them do it in 30 minutes ”. Here, using numbers compares the expected and actual time of service to express dissatisfaction with “Service Timeliness”. Also, this comment uses specific words to describe the course of actions (“to make a hose and put it on”). Using very specific technical terms to describe service is difficult to recognize as a feature (opinion subject). Other similar examples of such comments involve: “Tony stated that the mechanics replaced a cylindar on there and there was parts that they could of used but they left part of it and took part of it so they could not do much with that”. Here, by describing the situation, a customer expresses an implicit negative opinion about “Technician Knowledge”. Another such example: “He stated his complaint would be, why he realizes the technician has to have a work phone and he was answering it for work, it was ringing off the hook”, says implicitly about “Technician Attitude” in a negative way, however, this can be only inferred by a human by analyzing the situation and context. “Willi stated that he simply calls $⟨ confidential ⟩$ , they arrive, …”, “…a local field guy that is usually there the same day” – in such comments, it is difficult to infer algorithmically that they are positive opinions about “Service Timeliness”.

Feature and opinion in one word Currently, the algorithm does not handle situations when the same word is used to describe the feature as well as opinion. As soon as the word is flagged an opinion word, other words are searched for dependencies with this word. Therefore, sample comments, such as: “He said they found a number of issues and repaired them.”, “This then allows all $⟨ confidential ⟩$ to charge more than what is expected” is not currently recognized by the algorithm. “Repaired” and “charge” both relate to features’ aspects (“Service Completeness” and “Price Competitiveness”), but also denote sentiment in these contexts (positive and negative correspondingly).

Another problem is that sometimes opinions are expressed with a single noun (without using opinion words) – as an answer to the question asked to customers to receive additional comments: “What made you Detractor?” or “What could have been improved?”. As a result, the frequent pattern of a comment is an actual answer: “Because of + $⟨ noun ⟩$ ” or just “ $⟨ noun ⟩$ ”. Examples in the tested dataset are: “Jeff stated that their communicatlion and that fact that they kept him in the loop of things”. The possible solution would be to assign “default” polarity based on the promoter status of the customer who provided the comment. For example, if no opinion word was found in the sample comment above, “Communication Quality” would be assigned default polarity score: positive, because it was recognized as a feature word and it was expressed by a customer labeled as Promoter. This is expected to result in decreasing the sparsity of the opinion table significantly.

Opinion words in different context Opinion words can be used for the purpose different than expressing a sentiment towards certain aspects. For example: “Bob said at $117 an hour for service he expects a qualified technician to be working on his equipment and know the problem to repair it” – in this comment “qualified technician” and “know the problem” are used to describe the desired state or expectations. Recognizing such as opinions results in the lower accuracy of the algorithm (“false positives”). Other examples include using opinion words to express needs: “Doug said his time frame with a River boat down is he needs the service fast ”.

Ambiguity The algorithm handles ambiguity related to different aspects poorly in some cases. For example, in “Kurt again stated it was the not being kept in the loop, and the very hefty invoice he received at the end of this service”, it is difficult to determine whether the opinion relates to “Invoice Expectations” or “Price Competitiveness”. Also, sometimes the wrong words are chosen as a feature, for example in “He stated that the charges for the travel time were very expensive” – “time” is recognized as a keyword for “Service Timeliness”. One solution would be to associate a certain opinion words to specific features only, for example, associating “expensive” with “Price”. In another comment: “He stated they have prompt and courteous service ” – “courteous” relates to the “staff” feature rather than “service”, as the keyword would suggest.

Handling misspellings As the text comments were transcribed from the telephone conversation with the customers, misspellings are quite common in the dataset: about 7% of the comments contained misspellings. It is especially problematic when a feature/aspect or opinion word is misspelled. Due to the dictionary-based approach, they are not recognized as opinionated, because of misspellings in the opinion words:

“Mike stated the field mechanic was very

knowledgable and a great guy.”

“Kevin said they know what they are doing and they are promt .”

“Eric stated that he would like better

comminication .”

“AJ stated that they are curtious and quick to respond when service is needed.”

“Jeff stated that their communicatlion and that fact that they kept him in the loop of things.”

This results in lower coverage than it would be with the correctly spelled opinion words. The proposed method is to add a pre-processing step that handles misspellings. Such an approach can be based on dictionaries of commonly misspelled words.

Phrases, idiomatic and phrasal verbs expressions Another category of cases poorly handled by the algorithm is text with phrases, common slang expressions, idiomatic expressions, and phrasal verbs. Examples from the tested dataset, not handled by the algorithm in opinion detection include:

“He stated they were right there on the spot .”

“Rob stated they got the machine working .”

“Jeff stated that they were able to get out in a timely manner .”

“Chris stated did work in a timely fashion .”

Dealing with proper nouns and entity recognition As already mentioned previously, pronouns are often used in relation to opinion words. Besides pronouns, also names of technicians are used, for example: “Paul said that the service man, Chris , was very friendly, knowledgeable and ‘on top of it’”. Here, the positive opinion words about the technician were detected, however, they were not associated with any of the aspects. Currently, our algorithm does not handle cases with proper nouns. A future solution for entity recognition will involve adding to the dictionaries most common proper nouns. Automatic techniques can be employed to extract such nouns from the text corpus. The same applies to the names of companies (dealers), which can be encoded in a dictionary. Another solution is implementing default aspect association, e.g. “knowledgeable” opinion word can be by default associated with the “Technician” aspect.

4.10. Conclusions

This paper presented a number of methods introduced to improve the accuracy and coverage of the sentiment analysis. The proposed methods have been evaluated and illustrated with examples of the real comments covered by the algorithm. This study resulted in improving the sentiment algorithm from 92% to 96% in its accuracy, and from 36% to 48% in its coverage. In the future, we plan to conduct experiments measuring other metrics of interest, such as Mean Absolute Error (MAE) or F-measure. Most importantly, the introduced methods brought significant improvement for the text mining – based recommender system CLIRS. After comparing and analyzing results with different sentiment dictionaries, the final strategy for the sentiment analysis algorithm incorporated into the CLIRS framework. The improvements were measured with the sparsity of the sentiment table and the coverage of action rules extracted from this table. The sparsity was reduced about three times, and the coverage of action rules increased about twice. Correspondingly, the quality of the recommendations of the built text mining-based system improved. However, there is still room for improvement when comparing to the human capabilities of sentiment recognition. The remaining challenges in NLP were described. New methods for improvement were identified in the discussion that will provide a basis for the future work within this research.

References

Archak,

Ghose and

Ipeirotis, Show me the money! Deriving the pricing power of product features by mining consumer reviews, in: Proc. 13th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, 2007, pp. 56–65.

Baccianella,

Esuli and

Sebastiani, SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining, in: Proc. 7th International Conference on Language Resources and Evaluation, European Language Resources Association, 2010.

Basuroy,

Chatterjee and

S.A.

Ravid, How critical are critical reviews? The box office effects of film critics, star power and budgets, Journal of Marketing 67 (2003), 103–117. doi:10.1509/jmkg.67.4.103.18692.

D.M.

Blei,

A.Y.

Ng and

M.I.

Jordan, Latent Dirichlet allocation, The Journal of Machine Learning Research 3 (2003), 993–1022.

Bravo-Marquez,

Frank and

Pfahringer, Positive, negative, or neutral: Learning an expanded opinion lexicon from emoticon-annotated tweets, in: Proc. 24th International Joint Conference on Artificial Intelligence, AAAI Press, Buenos Aires, Argentina, 2015, pp. 1229–1235.

Carenini,

Raymond and

Ed, Extracting knowledge from evaluative text, in: Proc. 3rd International Conference on Knowledge Capture, ACM, New York, NY, 2005, pp. 11–18. doi:10.1145/1088622.1088626.

J.A.

Chevalier and

Mayzlin, The effect of word of mouth on sales: Online book reviews, Journal of Marketing Research 43 (2006), 345–354. doi:10.1509/jmkr.43.3.345.

Clarabridge , Choice hotels deploys clarabridge text mining solution, 2016, http://www.clarabridge.com/press/choice-hotels-deploys-clarabridge-text-mining-solution/.

Daly and

Taniar, Exception rules mining based on negative association rules, in: Proc. International Conference on Computational Science and Its Applications, Springer, Berlin, Heidelberg, 2004, pp. 543–555.

10.

N.I.

Damerau, Handbook of Natural Language Processing, 2nd edn, Chapman and Hall/CRC, 2010.

11.

M.C.

de Marneffe and

C.D.

Manning, Stanford typed dependencies manual, 2008.

12.

Dellarocas,

Zhang and

N.F.

Awad, Exploring the value of online product ratings in revenue forecasting: The case of motion pictures, Journal of Interactive Marketing 21 (2007), 23–45. doi:10.1002/dir.20087.

13.

Esuli and

Sebastiani, Determining term subjectivity and term orientation for opinion mining, in: Proc. 11th European Chapter of the Association for Computational Linguistics, 2006.

14.

Feldman, Techniques and applications for sentiment analysis, Commun. ACM 56 (2013), 82–89. doi:10.1145/2436256.2436274.

15.

Fellbaum (ed.), WordNet an Electronic Lexical Database, The MIT Press, 1998.

16.

Ghose,

P.G.

Ipeirotis and

Sundararajan, Opinion mining using econometrics: A case study on reputation systems, in: Proc. 45th Annual Meeting of the Association of Computational Linguistics, Association for Computational Linguistics, 2007, pp. 416–423.

17.

M.L.

Gregory,

Chinchor,

Whitney,

Carter,

Hetzler and

Turner, User-directed sentiment analysis: Visualizing the affective content of documents, in: Proc. Workshop on Sentiment and Subjectivity in Text, Association for Computational Linguistics, Stroudsburg, PA, 2006, pp. 23–30. doi:10.3115/1654641.1654645.

18.

Hofmann, Probabilistic latent semantic indexing, in: Proc. Conference on Uncertainty in Artificial Intelligence, ACM, New York, NY, 1999, pp. 50–57.

19.

Hu and

Liu, Mining and summarizing customer reviews, in: Proc. 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, 2014, pp. 168–177.

20.

SATMETRIX. Improving your net promoter scores through strategic account management, 2016, http://info.satmetrix.com/white-paper-download-page-improving-your-net-promoter-scores-through-strategic-account-management.

21.

Kuang,

Z.W.

Ras and

Daniel, Personalized meta-action mining for NPS improvement, in: Proc. 22nd International Symposium on Methodologies for Intelligent Systems, Springer International, 2015, pp. 21–23.

22.

Laver,

Benoit and

Garry, Extracting policy positions from political texts using words as data, American Political Science Review 97 (2003), 311–331.

23.

Liu, Sentiment analysis and subjectivity, in: Handbook of Natural Language Processing, Taylor and Francis Group, Boca, 2010.

24.

Liu,

Hu and

Cheng, Opinion observer: Analyzing and comparing opinions on the web, in: Proc. 14th International Conference on World Wide Web, ACM, New York, NY, 2005, pp. 342–351. doi:10.1145/1060745.1060797.

25.

Liu,

Huang,

An and

Yu, ARSA: A sentiment-aware model for predicting sales performance using blogs, in: Proc. ACM Special Interest Group on Information Retrieval, ACM, New York, NY, 2007, pp. 607–614.

26.

Meena and

T.V.

Prabhakar, Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis, in: Proc. 29th European Conference on Research, Springer, 2007, pp. 573–580.

27.

Mishne and

Glance, Predicting movie sales from blogger sentiment, in: Proc. AAAI Symposium on Computational Approaches to Analysing Weblogs, 2006, pp. 155–158.

28.

Mohammad and

P.D.

Turney, Crowdsourcing a word-emotion association lexicon, Computational Intelligence 29(3) (2013), 436–465. doi:10.1111/j.1467-8640.2012.00460.x.

29.

Morinaga,

Yamanishi,

Tateishi and

Fukushima, Mining product reputations on the web, in: Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, ACM, New York, NY, 2002, pp. 341–349.

30.

Mullen and

Malouf, A preliminary investigation into sentiment analysis of informal political discourse, in: AAAI Symposium on Computational Approaches to Analysing Weblogs, 2006, pp. 159–162.

31.

F.A.

Nielsen, A new ANEW: Evaluation of a word list for sentiment analysis in microblogs, in: Proc. Workshop on Making Sense of Microposts: Big Things Come in Small Packages, 2001, pp. 93–98.

32.

OdinText, 2016, http://odintext.com/.

33.

Pang and

Lee, Seeing stars: Exploiting class relationships for sentiment categorization with respect, CoRR (2005), 115–124.

34.

Pang and

Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2(1–2) (2008), 1–135.

35.

Pang,

Lee and

Vaithyanathan, Thumbs up? Sentiment classification using machine learning, in: Proc. Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, 2002, pp. 79–86.

36.

A.M.

Popescu and

Etzioni, Extracting product features and opinions from reviews, in: Proc. Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, 2005, pp. 339–346.

37.

Preiss, Rosetta stone speaks fluent customer satisfaction with text analytics software from SPSS Inc., Rosetta-Stone-Speaks-Fluent-Customer-Satisfaction-Text, 2009, http://www.businesswire.com/news/home/20090615005156/en/.

38.

Z.W.

Ras and

Dardzinska, From data to classification rules and actions, International Journal of Intelligent Systems 26(6) (2011), 572–590. doi:10.1002/int.20485.

39.

Z.W.

Ras,

Tarnowska,

Kuang,

Daniel and

Fowler, User friendly NPS-based recommender system for driving business revenue, in: Proc. 2017 International Joint Conference on Rough Sets, Springer International, 2017, pp. 34–48.

40.

Z.W.

Ras and

Wieczorkowska, Action-rules: How to increase profit of a company, in: Proc. 4th European Conference on Principles of Data Mining and Knowledge Discovery, Springer, Berlin, Heidelberg, 2000, pp. 587–592. doi:10.1007/3-540-45372-5_70.

41.

Z.W.

Ras and

Wieczorkowska, Advances in Music Information Retrieval, Springer, 2010.

42.

Rosen, Hedonic prices and implicit markets: Product differentiation in pure competition, The Journal of Political Economy 82 (1974), 34–55. doi:10.1086/260169.

43.

Semantic Navigation Search Appliance Application | OpenText, 2016, http://www.opentext.com/what-we-do/products/customer-experience-management/web-content-management/opentext-semantic-navigation.

44.

Shapiro, Consumer information, product quality, and seller reputation, Bell Journal of Economics 13 (1982), 20–35. doi:10.2307/3003427.

45.

Shapiro, Premiums for high quality products as returns to reputations, Quarterly Journal of Economics 98 (1983), 659–680. doi:10.2307/1881782.

46.

Somprasertsri and

Lalitrojwong, Mining feature-opinion in online customer reviews for opinion summarization, Journal of Universal Computer Science 16(6) (2010), 938–955.

47.

Tarnowska,

Z.W.

Ras,

Daniel and

Fowler, Visual analysis of relevant features in customer loyalty improvement recommendation, in: Advances in Feature Selection for Data and Pattern Recognition, Springer, 2017, pp. 269–293.

48.

Tarnowska,

Z.W.

Ras and

P.J.

Jastreboff, Mining for actionable knowledge in tinnitus dataset, in: Thriving Rough Sets, Springer, 2017, pp. 367–395. doi:10.1007/978-3-319-54966-8_18.

49.

Teragram Linguistic Technologies | SAS, 2016, http://www.sas.com/en_us/software/teragram.html.

50.

Text Analytics Software SaaS and On-Premise | Lexalytics, 2016, https://www.lexalytics.com/.

51.

Text Analytics Software | KANA, 2016, http://www.kana.com/text-analytics.

52.

Touati,

Z.W.

Ras and

Studnicki, Meta-actions as a tool for action rules evaluation, in: Feature Selection for Data and Pattern Recognition, Springer, 2014, pp. 177–197.

53.

Tumarkin and

R.F.

Whitelaw, News or noise? Internet postings and stock prices, Financial Analysts Journal 57 (2001), 41–51. doi:10.2469/faj.v57.n3.2449.

54.

P.D.

Turney, Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews, in: Proc. 40th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, 2002, pp. 417–424.

55.

Voice of the Customer – VOC Solutions | Verint Systems, 2016, http://www.verint.com/solutions/customer-engagement-optimization/voice-of-the-customer-analytics/.

56.

Wang,

Jiang and

Tuzhilin, Mining actionable patterns by role models, in: ICDE,

Liu,

Reuter,

K.Y.

Whang and

Zhang, eds, IEEE Computer Society, 2006.

57.

Wasyluk,

Z.W.

Ras and

Wyrzykowska, Application of action rules to HEPAR clinical decision support system, Experimental and Clinical Hepatology 4(2) (2008), 46–48.

58.

Wilson,

Wiebe and

Hoffmann, Recognizing contextual polarity in phrase-level sentiment analysis, in: Proc. Conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Stroudsburg, PA, 2005, pp. 347–354.

59.

Yi and

Niblack, Sentiment mining in WebFountain, in: Proc. 21st International Conference on Data Engineering, IEEE Computer Society, 2005, pp. 1073–1083.

60.

Yu,

Z.J.

Zha,

Wang and

T.S.

Chua, Aspect ranking: Identifying important product aspects from online consumer reviews, in: Proc. 49th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, 2011, pp. 1496–1505.

61.

Zhai,

Liu,

Xu and

Jia, Grouping product features using semi-supervised learning with soft-constraints, in: Proc. 23rd International Conference on Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, 2010, pp. 1272–1280.

62.

Zhuang,

Jing and

X.Y.

Zhu, Movie review mining and summarization, in: Proc. 15th ACM International Conference on Information and Knowledge Management, ACM, New York, NY, 2006, pp. 43–50.

Sentiment analysis of customer data

Abstract

Keywords

1. Problem description

1 NPS®, Net Promoter® and Net Promoter® Score are registered trademarks of Satmetrix Systems, Inc., Bain and Company and Fred Reichheld.

1.2.1. Document-level sentiment analysis

1.2.2. Sentence-level sentiment analysis

1.2.3. Aspect-based sentiment analysis

1.3. The tools for sentiment analysis

1.4. Quantifying the economic impact of sentiment

2. Methodology

2.1. Actionable knowledge discovery

2.1.1. Action rules

2.1.2. Meta actions

2.1.3. Action rules for sentiment

2.2.1. Opinion identification

2.2.3. Aspect identification

2.2.4. Segment clustering

2.2.5. Generating the sets of meta actions

2.3. The improvements in the sentiment mining algorithm

2.3.1. Adding opinion dictionaries

2.3.2. Using nouns and verbs as opinion words

2.3.3. Increasing the sentiment polarity scale

3. Experimental results

3.1. Test data

3.2. The tested algorithm

3.3. Evaluation metrics

3.4. Test cases

3.5. Results

Table 4 The sparsity of opinion table before and after modifications of the opinion mining algorithm – the case for the dataset of company 16 Sentiment Sp-b Sp-a SpT-b SpT-a −2 0% 4% 0% 0% −1 0.9% 4% 0.1% 0.3% 1 12.3% 42.5% 0.8% 2.8% 2 0% 4% 0% 0% All 13.2% 46.9% 0.9% 3.1%

4. Discussion

4.1. Verbs and nouns as opinion words

4.2. The sentiment strength

4.3. Dealing with ambiguity and context

4.4. Two opinion words for one aspect

4.5. Expanding dictionaries for aspects

4.6. Dealing with pronouns

4.7. Consistency of aspect dictionaries

4.8. Adjusting sentiment dictionaries

4.9. Remaining challenges

4.10. Conclusions

References

¹
NPS^®, Net Promoter^® and Net Promoter^® Score are registered trademarks of Satmetrix Systems, Inc., Bain and Company and Fred Reichheld.

Table 4
The sparsity of opinion table before and after modifications of the opinion mining algorithm – the case for the dataset of company 16

Sentiment Sp-b Sp-a SpT-b SpT-a

−2 0% 4% 0% 0%

−1 0.9% 4% 0.1% 0.3%

1 12.3% 42.5% 0.8% 2.8%

2 0% 4% 0% 0%

All 13.2% 46.9% 0.9% 3.1%