A review of feature selection techniques in sentiment analysis

Abstract

The rapid growth in web development has transformed today’s communication. The combination of features and corresponding sentiment words (SWs) can help produce accurate, meaningful, and high-quality sentiment analysis (SA) results. There are some basic matters in the study of SA that must be understood, namely, the objects or entities that form a key part of the discussion, the characteristics or features of the object, the SWs, and the connection between the features of the object and the SWs. Failure to identify these basic matters can reduce the accuracy and meaning of the SA results. The main objective of this review is to offer an overview of the role and techniques of feature selection (FS), SWs detection, and the identification of the relationship between features and SWs. The main contributions of this review are its sophisticated categorisations of a large number of recent articles related to FS techniques and the detection of SWs. It also highlights the recent trends in the field of SA research. This review will also look at the metaheuristic approach as a FS technique in SA, identify the strengths and weaknesses of existing FS techniques, and analyse the potential of the metaheuristic approach for solving problems that exist in the selection of features in SA.

Keywords

Sentiment analysis feature selection sentiment word ant colony optimization metaheuristic algorithm

1. Introduction

Analysis of consumer comments on a website is important in helping consumers find information or for companies to reformulate ideas for marketing products. Nowadays, most consumers buy groceries and daily necessities using online shopping portals. Consumers also express their satisfaction or dissatisfaction about purchased products through websites, such as Facebook and Twitter, or in forum discussions or blogs. This leads to a collection of information that contains consumers’ opinions, and this collection is increasing in size. There should be a method for collecting consumers’ opinions or comments that can summarise the expressed information so that the essence of a product and the advantages or disadvantages expressed by consumers can be identified. In this way, manufacturers or producers can gain knowledge of the advantages and shortcomings of their products. Such information would enable them to improve the quality of their products. In the past, if an individual is curious about the advantages or disadvantages of a product, they would have to ask their friends or family who had purchased the product. Meanwhile, producers or manufacturers who wanted to get consumer feedback about their products would have to conduct surveys. However, things have currently changed with the development of sophisticated information technology. Social media, such as Facebook, blogs, Twitter, and forum discussions play various roles in providing information to both consumers and manufacturers. Thus, there is a need for a method that can analyse the information available in the said social media. However, due to the large volume of information, it may be difficult to gather accurate information that can assist in decision making. Sentiment analysis (SA) attempts to address this need. SA is a type of text analysis that falls under the broad heading of text mining, which involves natural language processing (NLP) and computational intelligence. However, three fundamental problems must be considered when developing an effective SA method, namely, feature selection (FS), identification of sentiment words (SWs), and sentiment classification [1, 2].

Therefore, the purpose of this review is to present recent studies in this field that would be useful for researchers in developing effective methods for SA. This review is important for the following reasons:

1.
This review provides a sophisticated categorisation of a large number of recent articles based on FS techniques, and in identifying SWs and the relationships between the features and the SWs. These categories can be useful for researchers who are familiar with certain SA techniques, to use and choose the most applicable technique for certain applications.
2.
The various SA techniques are categorised with brief details of the algorithms and their originating references. This review offers a panoramic view of the current state of play in this entire field of research to researchers who are new to the SA field.
3.
This review highlights the role of different FS methods in identifying SWs, as well as the advantages and disadvantages of each technique.
4.
It identifies the limitations in the existing FS techniques. Finally, this review discusses the challenges in identifying features and SWs in SA, and the more specific challenges that are encountered in the SA research. Therefore, the overall aim of this study is to provide information and guidelines on the components of SA, namely, FS, identification of SWs, and analysis of the relationships between words that have been derived from previous researches, including the latest work in this field.

This research field has grown, as shown by the publication of several survey papers in the past few years. In this review, these papers have been categorised according to the sub-tasks in SA. These categories include FS, identifying SWs and the relationships between them, and the challenges faced in identifying features and SWs in SA. Each section of this article presents the results of a comprehensive literature review on each sub-task.

To achieve the above-mentioned goals and identify the methods selected by previous researchers, the result assessment method, and the advantages and limitations of each technique used for selection of features in SA were analysed based on the following research questions (RQs):

RQ 1: What are the main goals of the researcher being reviewed? RQ 2: What is the proposed and appropriate approach for FS in SA? RQ 3: What are the proposed and appropriate approaches for identifying SWs? RQ 4: What are the proposed approaches to identify pairing between feature and SW in sentences?

The remainder of this article is organised as follows: Section 2 discusses the previous studies on FS in SA. In Section 3, the definition and components of SA are explained, and several related articles on FS techniques in SA are discussed. In Section 4, the identification of SWs is defined and the methods used to identify them are explained. A discussion on the challenges in identifying features and SWs is also presented in this section. Section 5 discusses several suggestions to improve the limitations of the current method. Finally, Section 6 concludes this review and offers several directions for future work.
2. Related work

Previous literature has conducted explicit systematic studies on the overall descriptive observations and challenges of SA. In [3], an overview of the fundamental features of SA was reported, which explained the techniques and approaches related to the process of classification, extraction, and summarisation. They also assessed issues pertaining to privacy, manipulation, and economic impacts on developing an information database service with customer-review-oriented information. Additionally, they offered a guideline on the dataset used for experiments, campaign assessments, lexical sources and tutorials, and bibliographies in their study.

In [4], the authors discussed, in general, the selection techniques, semantic orientations of texts, sentiment classifications, and common issues in SA. A study by [5] presented a more elaborated discourse on sentiment classification techniques, FS techniques, and sentiment identification tools in social networking sites. Their study had also analysed the functions of negative words in customer review, apart from reviewing the effectiveness of the suggested techniques based on the sources of dataset. Their findings led to the conclusion that a combined classification approach could overcome the limitations of singular approaches in sentiment classification, where a longitudinal research is necessary. In [6], the components of SA are explained by highlighting subjectivity and polarity classification, opinion target extraction, opinion source identification, opinion summarisation, and the issues and challenges encountered in SA. Extensive explanations on the techniques used in each component were also presented. In terms of data collection, they collected the personal opinions of social media and web blog users. The data were used to test the validity of the opinion representation model, to develop new models and components of SA, and to identify problems related to SA.

A survey was conducted by [7] on relevant literature published from 2010 to 2013, to get an overall view of the techniques used in SA. Among these techniques were the FS technique, sentiment classification technique, which includes machine learning technique and lexicon approach, and other relevant techniques. Their paper also included a narrative review on external fields related to SA, such as emotion detection, building resources, and transfer learning. An analysis on different algorithmic patterns, dataset representations, and analytical methods used in SA were also conducted by covering the aspects of classification techniques, polarity strength calculations, and other analytical methodologies. Several graphs were extrapolated based on certain criteria, such as the algorithmic categorisation of articles based on sentiment counts, numerical analysis of sentiments in different domains (i.e., customer reviews, web blogs, and news), and numerical analysis of sentiments in different languages. They also considered the research gaps in previous SA studies, which included errors in data sets, barriers in language, and inappropriate tools for natural language processing (NLP).

The main purpose of this current review is to determine the prospects of using metaheuristic approaches as a FS technique. Additionally, this review paper is also aimed at identifying the relationship between feature and SWs, with its related problems. Thus, published studies from 2003 to 2016 were reviewed and can be summarised as follows:

1.
This paper put forth extensive discussions on FS techniques in SA studied by previous scholars. The fundamental objective of this review was to identify the limitations, advantages, problems, and gaps of each technique. In addition, the frequency in which a metaheuristic approach was used in this topic was also determined. The appropriateness of metaheuristic approach for FS technique in SA was also reviewed, apart from conducting comparative analysis with traditional text classification. To note, both sentiment classification and traditional text classification apply text dataset. Detailed explanations are provided in Sections 3.4 and 3.5.
2.
This paper has also discussed different techniques for identifying the relationships between features and SWs. The advantages and limitations of each technique were also discussed. Additionally, the relationship between feature and SWs in customer reviews was thoroughly examined. This could provide clearer directions for future researches on what particulars should be given attention to while implementing the process of feature and SW matching.
3.
This review was only focused on opinion sentences (review sentences that contain opinions for product features).

3. Sentiment analysis (SA)

3.1 Definition of sentiment analysis

SA is an increasingly important technology for analysing consumers’ opinions and for producing simple information that can represent these opinions as a whole. According to [8], SA is also known as opinion mining, which is defined as a type of research that analyses opinions, comments, thoughts, attitudes, and assessments of human emotions towards entities, such as products, services, organisations, politics, current issues, events and people, and the features that exist in these entities.

3.2 Components of sentiment analysis

Based on previous researches [1, 8, 9, 10, 11, 12], SA is composed of the following six main components (see Fig. 1):

Figure 1.

Components of SA.

Data preprocessing is a process of removing words that do not give any meaning or are not required, which improves the accuracy of the search for important word(s) in each sentence. Available ways to preprocess data based on text include removing stop words, tokenization, sentence splitting, word stemming, misspelled words, and part of speech tagging (POST). Data transformation is a process of transforming text data into feature vectors. It is used because SA focuses on text documents. These data cannot be directly interpreted by a classifier. Therefore, text documents must be transformed into a format that the computer can identify, and the vector space model is the favoured method to do this. This model transforms a document into a multi-dimension vector, and the features selected from the dataset are dimensions of this vector. Feature vectors represent the data objects in the feature space. FS is a process of identifying and eliminating redundant and irrelevant features from a list of features to reduce feature dimensions and to improve classification accuracy. FS is important in SA as the selected feature subsets need to accurately represent the features of the objects commented on by consumers. Sentiment word identification is a step to identify sentiment words that can be linked to a feature in a sentence. Based on several studies [1, 11, 12, 13, 14, 15], the word in a sentence that is tagged with JJ/JJR/JJS is an adjective, which usually represents a SW. Determining the feature-sentiment word relationship is an important process because such relationship(s) could be rather complex, if the sentence contains more than one feature or SW. Sentiment classification is a process that looks at the document, sentence, or feature level to determine which documents, sentences, or features express a positive, negative, or neutral sentiment. Testing and Evaluation is a process of testing the accuracy of the relationship between the feature and the SW. Evaluation is conducted to test the accuracy of the relationship between the feature and the SW, as well as the accuracy of different types of sentiments for comments that are either positive or negative.

3.3 Definition of a feature

A feature is an aspect that users comment about in relation to products, politics, services, organisations, events, and/or individuals. SA deals with information in the form of documents that contain a group of sentences in text format. When dealing with data in text format, it is usually represented by a feature vector, which is a group of words, or also known as the set-of-words approach [16]. Features are categorised into two main types [17]:

1.
A list of words in a document, which can be in the form of a unigram, bi-gram, or tri-gram.
2.
As POS tags, which are used to identify each word in the document, i.e., whether it is a noun, adjective, verb, adverb, preposition, or determiner.

The type of feature commonly used in SA is the n-gram [18, 19]. According to [19], the large size of the n-gram feature set requires a suitable FS technique, and they suggested two categories for the n-gram feature, namely, fixed and variable. The fixed n-gram is a sequence occurring at the character, or at the token level, whereas the variable n-gram is a pattern extraction that represents a more advanced linguistic phenomenon. There are various types of fixed variables that can be used in SA, such as word, POS, character, legomena, syntactic, and semantic n-grams. Examples in the form of words include bag-of-words (BOW), bi-gram, and tri-gram. An n-gram character is a sequence of letters. For example, the word “like” can be represented by two or three sequences of letters, such as “li”, “ik”, “ke”, “lik”, and “ike”. Earlier studies have used the n-gram character to classify emotions in texts [20]. The n-gram legomena refers to collocations that are used to replace words that only occur once in a corpus. For example, the tri-gram sentence, “I hate JIM” can be replaced with “I hate HAPAX”, provided that “JIM” occurs only once in the corpus, or it is a unique word [21]. There are two ways of handling a feature in SA, namely, feature extraction and FS.

A. Feature extraction

Researches by [8, 11, 12, 22, 23, 24] have extensively used feature extraction. According to [25, 26], feature extraction is the identification process of the features of a product as commented by users. Meanwhile, [27] stated that feature extraction functions by selecting candidate features from noun phrases in sentences and extracting relevant opinions as information. Additionally, [8] argued that feature extraction can be seen as an information task.

B. Feature selection (FS)

FS has been differently defined to fit the different perspectives of the authors concerned. The problem of selecting the right features is often discussed in relation to its functions involving supervised and unsupervised machine learning, such as classification, clustering, regression, and time-series prediction. The definition by [28] stressed that the aim of a FS technique is to improve the accuracy of the prediction, or to reduce the size of the structure by selecting a feature subset without significantly reducing the accuracy of the classifier prediction that was built using the selected features. According to [29], FS is a set of feature lists from which the classification system can select a subset of the best features. Meanwhile, [30] showed that the problem with FS lies in selecting an $m$ size subset from a set of features, $d$ , which could cause a small classification error. Most approaches to this problem would require:

1.
Consideration of all the subsets possible for $m$ size; and
2.
Selecting the subset with a big value for the classification step.

This is contrary to the definition by [15], whereby FS was defined as a process to select a minimum feature subset from a list of original features, based on a number of specific selection criteria. Similarly, according to [31], FS can be done by selecting features based on a specific metric measurement, and irrelevant features are removed based on a specified threshold value. The reduced number of features might increase the effectiveness of the training and assessment procedures, and enhances the classification performance. In [32], the authors stated that FS involves selecting a suitable feature subset, which could help create a good prediction model. Several studies [33, 34, 35] have defined that FS is for selecting irrelevant and excessive features. They also found that the best FS technique would be to apply an algorithm that interacts with large-sized data, including irrelevant features, to create a subset of relevant features with a suitable target class. Meanwhile, according to [35], FS is a process of identifying and eliminating excessive and irrelevant features that could be present in the feature list.

The definitions by [17, 36] suggested that FS could reduce the dimension space of features. According to [36], FS can be defined as a method of reducing a large-sized feature space, for example, by eliminating the less relevant features to create a set of suitable features. According to [17], FS is a way of selecting important features by removing irrelevant features. Only a few published studies [17, 19, 37] have examined how reducing feature vectors containing only irrelevant features could speed up the calculation process and improve classification accuracy.

To avoid confusion and to standardise the terminologies used in this paper, the term “feature selection” is generally used to explain the techniques in feature extraction. Additionally, the term “feature selection” is also understood as it was used in previous studies [31, 38, 39], with regards to SA. Thus, in the context of this review, FS is a process of identifying and eliminating excessive and irrelevant features from the feature list to reduce the size of the feature dimension space and helps to improve the accuracy of sentiment classification.
3.4 Feature selection in sentiment analysis

This study was focused on two types of FS techniques. The first type was FS techniques using the NLP approach that refers to feature extraction. The second was FS techniques that use combinations of NLP and non-NLP approaches (which will eventually use machine learning algorithms). NLP and text mining are investigative approaches that aim to gather rich knowledge resources through the abstraction and retrieval of thoughts or features from unstructured text. Figure 2 shows the FS categories in SA.

Figure 2.

FS categories in SA.

3.4.1 FS techniques using NLP approach

The following are some examples of FS techniques using the NLP approach:

1.
Parts of speech tagging (POST): Known as grammatical tagging, or word-category disambiguation, POST is the process of marking up a word in a text (corpus) as equivalent to a specific part of speech. In POST, each term in a document or in sentences is allocated a tag or label that represents its position in the grammatical context. For example, “This camera is very good and perfect,” becomes, “This/DT camera/NN is/VBZ very/RB good/JJ and/CC perfect/JJ,” after POST. The nouns and adjectives in this sentence can be identified from this tagging process, and these can be used for selecting features and as sentiment indicators.
2.
Opinion words and target relations: The relationship between the feature and the opinion words can be exploited to extract the feature. This relationship can be shown by the dependency parser, which is used to identify the dependency relation for a selection of features.
3.
Topic modelling: In this method, a generative probability model that uses a distribution of vocabulary was applied to discover topics from a large collection of text documents [40]. Topic modelling is an unsupervised learning method and its function is to identify the mixture of topics in each document [8]. The output from this method is a group of words. For example, in a collection of documents on comments about a camera, the relevant topics would include memory card, battery, megapixels, viewfinder, weight, and price.
4.
Negation word: This is also an important feature to take into consideration because it can change the sentiment’s orientation. For example, “not bad” corresponds to “good”.
5.
Rules of opinions: These are terminologies or language structures that can be used to describe sentiments and opinions.

In [12, 14], the NLP approach was applied to identify the features in comment sentences. POST was used to assess each sentence, and the words in the sentences that were tagged as NN/NNS/NNP/NNPS were identified as nouns or noun phrases, namely, features. The association rule mining [41] was used to determine frequent features, which are sets of commonly used words or phrases. In their studies, [12] assumed that features can be divided into two types: frequent and infrequent. They argued that an infrequent feature could lead to problems in identifying SWs because it might contain nouns or noun phrases that have no connection to the product. They also argued that infrequent features only have a minor impact of 15%–20% on their proposed system. Therefore, only the frequent features were considered in their studies. They used the concept of the “nearby adjective” in identifying SWs. Thus, if a word is tagged with JJ/JJR/JJS, the word is an adjective, thereby, it represents a sentiment word that is the nearest to the frequent feature. In an investigation into FS, [15] used POST to generate the POS tag of each word (to identify nouns, verbs, adverbs, and adjectives), and an n-gram was used to produce shorter segments from long ones.

In a more recent study, [23] used POST to parse each sentence and generate a POS tag for each word in the sentence. After the POS tagging process was completed, they extracted and identified the nouns and noun phrases in the sentences using the concept of “pattern knowledge”. Eight patterns of knowledge were generated in identifying features in sentences. Meanwhile, the extracted SWs had tagging of adjective/JJ, or adverb/RB and were the closest to the extracted feature. According to [24], there are three steps in the process of extracting features of sentences. In the first step, input documents are converted into tagged documents using the POS tagging software. In the second step, evaluative expressions are extracted using the combined pattern-based noun phrases (cBNP) technique. Lastly, this algorithm extracts product features that are usually represented by noun phrases. In their study, [24] used a list of opinion lexicons for identifying sentiment-hood.

Meanwhile, [11] proposed the “Know-It-All” system to extract explicit features from parsed review data. This system is known as OPINE and works recursively in identifying the “parts” and “properties” of product classes. The process stops if there is no “candidate” to be identified. The system searches a concept that has a connection and extracts the said “parts” as well as the “properties”. The system would also extract noun phrases and retains the noun phrase that has a frequency value of greater than the specified threshold value. The process of evaluating the acquired noun phrase is based on a Pointwise Mutual Information (PMI) score between a phrase and a meronymy discriminator that has a connection with the product class. The process of identifying SWs uses explicit features, but is based on syntactic dependencies identified by a MINIPAR parser. Based on the extracted rules, OPINE will search the lexical heads for potential SW phrases. The system identifies the semantic orientation of each lexical head that has a semantic orientation with a positive or negative feature and retains this as the opinion phrase.

In their study, [18] had successfully used term presence compared to term frequency in their sentiment classification of movie reviews. They applied a unigram (BOW) as the feature type to classify movie reviews into two classes, namely, positive and negative. In [42], the authors investigated the differential impact of classifying product reviews using n-grams, bi-grams, and tri-grams. In their research, they obtained good performances for classification using bi-grams and tri-grams.

In [9], “linguistic filtering patterns” and a “general inquirer dictionary” were used for extracting a list of product features. Their process of identifying SWs was based on dependency analysis, which was divided into a dependency tree and a dependency path. Dependency analysis involves identifying an asymmetric binary relationship between words, known as the “head or governor”, or in other words, the “modifier or dependent”. The authors introduced and adopted six types of syntactic relationships to identify the connection between features and SWs.

In [1], it was pointed out that an “unsupervised model” and language-independent model can identify explicit and implicit features from reviews. An unsupervised model does not require a set of pre-labelled training data. Therefore, this was to their advantage because the model can easily be transferred between domains or languages. They used a graph-based approach to identify implicit features in comments. The graph was drawn based on the use of a polarity lexicon and a list of specified features. SWs from the polarity lexicon were used as nodes connected to each feature that had a connection. The authors gave an initial weight to each connection and to add more precision to the identification of implicit features, they defined a function to measure the strength of the connection between the features and SWs. Table 1 provides a summary of the FS and SW identification techniques.

Table 1
FS (NLP approach) and SWs techniques

Author and publication Feature selection (NLP) Sentiment word identification

[12, 14] POS tagging Nearby adjective

[11] “Know-It-All” system Dependency relationship

[9] Linguistic filtering patterns and general inquirer dictionary Dependency relationship

[23] POS tagging and generation of pattern knowledge to identify feature Concept of nearest adjective with product feature in a sentence

[24] POS tagging and combined pattern-based noun phrases (cBNP) Opinion lexicon

[1] Unsupervised model and language independent model None

3.4.2 FS techniques using a combination of NLP approach and non-NLP approach

Author and publication	Feature selection (NLP)	Sentiment word identification
[12, 14]	POS tagging	Nearby adjective
[11]	“Know-It-All” system	Dependency relationship
[9]	Linguistic filtering patterns and general inquirer dictionary	Dependency relationship
[23]	POS tagging and generation of pattern knowledge to identify feature	Concept of nearest adjective with product feature in a sentence
[24]	POS tagging and combined pattern-based noun phrases (cBNP)	Opinion lexicon
[1]	Unsupervised model and language independent model	None

Non-NLP is an approach that is closely related to the machine learning approach. Previous studies have reported using a hybrid FS technique, which is a combination of an NLP approach with a non-NLP approach. Non-NLP approaches include the genetic algorithm (GA), information gain (IG), rough set theory (RST), decision tree, and minimum redundancy maximum relevance (mRMR). FS techniques can be generally categorised as either univariate methods or multivariate methods based on the conditions for evaluating the features. Univariate methods analyse a single variable at a time, while multivariate methods analyse more than one variable at a time. Wrapper and hybrid techniques are included in the multivariate category. There are generally three FS techniques for machine learning tasks, namely, filter, wrapper, and hybrid [34, 43, 44], which are briefly described as follows:

1.
The filter technique is independent of any machine learning algorithm. During the filtering process, the relevant score for each feature is calculated, and the features with low scores are removed. The resulting feature subset becomes the input for the classification algorithm [45].
2.
The wrapper technique needs a machine learning algorithm and uses it as part of the evaluation function.
3.
The hybrid technique, as the name implies, is a combination of the filter and the wrapper techniques.

A. Filtering techniques

A few conclusions can be drawn from the review of previous studies. The advantage of utilising a univariate filtering technique is that it is simple and fast, with a short processing time. However, there are also limitations to this technique; there is no interaction or dependency between features and no interaction with the machine learning algorithm. Feature evaluation is performed individually and usually, the features are listed based on their predictability. As such, the dependency between features would be ignored. Table 2 provides a summary of the advantages and limitations identified in previous studies that used the univariate and multivariate methods.

Table 2
Summary of advantages and limitations of the univariate and multivariate methods

Author and publication Technique Advantage (A)/Limitation (L)

Univariate techniques

[10, 17, 37, 46, 47] Information gain A: Simple to implement and fast.

L: No interaction between the attributes, does not consider overlapping features, and threshold value needs to be set in advance.

[48, 49, 50] Log likelihood ratio A: Simple to implement and fast.

L: Complicated when involving more than one feature.

[47, 51] Chi-squared A: Simple to implement and fast.

L: Does not offer much information about the relationship between features.

[47] Gain ration A: Simple to implement and fast.

L: A low feature count could affect the accuracy of the classification and make it unsatisfactory.

[47] Relief-F A: Simple to implement and fast.

L: The accuracy value for sentiment classification is unsatisfactory.

Multivariate methods

[52] Feature subsumption A: Improves classification accuracy.

hierarchy L: The subsumption process responds to the negative result of adding more complex features.

B. Hybrid techniques

The hybrid technique is a combination of the filter and wrapper techniques that was used by [15, 53]. The hybrid technique can be used to manage a large-sized dataset [44, 53, 54]. This combination of two techniques could create an optimum feature subset by using standard optimising techniques, or a metaheuristic approach, such as the GA, particle swarm optimization (PSO), and ant colony optimization (ACO) [55]. Table 3 provides a summary of the advantages and limitations of using hybrid methods for SA, as identified from the literature review.

Table 3
Summary of the advantages and limitations of hybrid techniques

Author and literature Technique Advantage (A)/Limitation (L)

[10, 56] IG $+$ Entropy Weighted Genetic Algorithm (EWGA) A: Increases the classification accuracy value and is able to identify the main features in the sentiment class.

L: Less efficient in determining the ranking of features.

The FS technique and SWs are the main determinants of the accuracy of sentiment classification. SA is based on the machine learning approach and there is normally a large-sized feature space. A number of studies on SA have combined filtering and wrapper techniques to overcome the limitations that exist in each individual technique. For example, [10, 37] used the IG technique to identify important features for sentiment classification. According to [37], IG was used to measure uncertainty reduction when identifying the feature class character once the feature value has been identified. The most important features were selected to reduce the size of the feature vector and come up with a better classification. They also reported that IG is the best filtering technique because it is able to determine the importance of the features in a document. However, the IG has several limitations, such as the threshold value has to be set in advance; the method does not take into consideration the problem of excessive features; and there is no communication among the features [37, 45].

In [10], the IG filtering technique was combined with the Entropy Weighted Genetic Algorithm (EWGA) metaheuristic technique, which had successfully increased sentiment classification accuracy and produced an optimum feature subset. However, the study has its limitation in that the data was in the form of document data. Document data only considers sentiments based on the whole document without refining the content. In [57], the authors also found that SA at the feature level would thoroughly consider the feature and SWs that exist in a document. The initial stage involved conducting the process of extracting the unigram features of the document. The next stage was the process of FS using the IG technique to determine the importance of the features in the document. However, one weakness of the IG technique is that it must determine the threshold value in advance and it does not consider features that overlap [37].

To overcome this problem, [37] combined IG with RST. This technique can reduce the number of noisy and irrelevant features. The advantage of this hybrid approach is that it can consider overlapping features, as well as obtain minimum feature sets that can reduce time complexity during sentiment classification. However, [10, 37] did not clearly state the methods they used to identify SWs. However, the support vector machine (SVM) technique was used for the sentiment classification process by [10], while [37] utilised the SVM and naive bayes (NB) in the sentiment classification process to identify positive and negative sentiments in the documents. In [56], the authors proposed using a GA to extract feature collections from semantic features of emotional collections. In their work, a conditional random field (CRF) was developed for labelling web-page classifications for different types of comments, such as positive comments, negative comments, and objective comments. They did not mention the identification and the relationship between SW and feature.

Table 4 provides an overview of the available techniques for FS and SW identification.

Table 4
Techniques for FS (NLP $+$ non-NLP) and SW identification

Author and publication Feature selection (NLP $+$ non-NLP) Sentiment word identification

[10] NLP approach $+$ IG $+$ GA Not mentioned

[56] GA $+$ CRF Not mentioned

[37] NLP approach $+$ IG $+$ RST Not mentioned

C. Other techniques

i. Rough set theory with filtering technique

In [37], the IG technique was combined with the RST technique. The RST was used to reduce the number of irrelevant, excessive, and noisy features. The advantage of using RST is that it can take into account the combined dependency among the features [58]. However, the RST has two limitations, one of which was the difficulty of obtaining an optimum reduction in the feature subset. This was a non-deterministic polynomial-time hard (NP-hard) problem, thus a heuristic algorithm was suggested to resolve it [37]. The second limitation was its long processing time [37, 59, 60]. Thus, only a few studies have explored the metaheuristic approach for FS in SA research. Table 5 provides a summary of the advantages and limitations of using RST combined with a univariate filtering technique.

Table 5
Summary of advantages and limitations of the RST method, combined with a univariate filtering technique

Author and Technique Advantage (A)/Limitation (L)

publication

[37] IG $+$ RST IG

A: Determines the importance of a feature in a document.

L: Requires a threshold value at the initial stage and does not take into account overlapping

features.

RST

A: Could reduce irrelevant features, noise, and takes into account overlapping features.

L: Difficult to obtain an optimum subset feature and requires more processing time.

ii. Neural network

In [61], the authors compared three types of neural networks, namely, the probabilistic neural network (PNN), the back propagation neural network (BPN), and the homogeneous ensemble of PNN (HEN). The comparison was performed using varying levels of word granularity as a form of feature for feature level sentiment classification based on a dataset of product reviews collected from the Amazon review website. The hybrid combination of sentiment classification methods, which were based on the PNN and the principal component analysis (PCA), acted as a feature reduction analysis to reduce training time and increase the performance of the classification process. These neural networks were developed to effectively incorporate the supervision from sentiment polarity of text.

A research by [62] proposed a method to learn sentiment-specific word embedding (SSWE) by integrating sentiment information into the loss functions of the three neural networks. Additionally, large scale training corpora were acquired by learning SSWE from massive distant-supervised tweets composed by positive and negative emoticons. To evaluate the effectiveness of SSWE, a dataset from SemEval 2013 was used as the benchmark. The verification process was done by computing word similarities in the embedding space for sentiment lexicons. The results indicated that the integration of SSWE with sentiment information of sentences was able to achieve a good performance in the experiments.

Meanwhile, [63] proposed the use of recursive neural tensor networks (RNTN) to calculate the representations of compositional vector for phrases of variable lengths and syntactic types. They had also developed the Stanford Sentiment Treebank corpus, which contains label parse trees. The label parse trees allow a complete analysis on the compositional effects of sentiments in language. Ultimately, their experiment showed that compared to previous models, the RNTN was able to achieve 80.7% accuracy when applied on fine-grained sentiment prediction across all phrases. RNTN can also capture negations of different sentiments.

iii. Metaheuristic

Several metaheuristic approaches were also applied as FS methods in SA. Table 10 displays a summary of metaheuristic approaches that can be used for selecting features in SA based on dataset, and performance evaluation (accuracy, precision (P), recall (R), and F1 Score).

In [64], a two-stage prediction algorithm was presented. In the first stage, the classifier learned the conditional dependencies among words and encoded them into a Markov Blanket Directed Acyclic Graph for the sentiment variable. In the second stage, a metaheuristic strategy was used to fine-tune the algorithm to yield a higher cross-validated accuracy. Two collections of online movie reviews from IMDB and three collections of online news were used, whereby the algorithm in the dataset was then compared with SVM, NB, and maximum entropy (ME). It was illustrated that this method, in comparison to other methods, was able to identify a parsimonious set of predictive features and obtained better prediction results on sentiment orientations. Results from the experiments suggested that sentiments were generally captured by conditional dependencies among words, keywords, or high-frequency words.

In another context, [65] implemented genetic-based machine learning (GBML) for the purpose of subjectivity detection. In their study, GBML was tested with both English and Bengali news, movie reviews, and blog domains. Results displayed the precision values of 90.22% and 93.00%, respectively, for English news and movie review corpora. Meanwhile, the precision values for Bengali news and blog corpora were 87.65% and 90.6%, respectively. These experiments have proven that GBML can automatically identify the best feature set based on the principle of natural selection and survival of the fittest. In a comparative study by [66], information gain and genetic algorithm were proposed as feature reduction analysis. In their experiment, multidomain and movie review datasets were used for opinion mining. Five classification algorithms, which consisted of naive bayes (NB), logistic regression (LR), support vector machine (SVM), and two ensemble methods (bagged SVM (BSVM) and bayesian NB (BNB)) were used to compare the performance with the proposed hybrid method. Additionally, McNemar’s test was used to compare the level of significance of the classifiers. The results showed that the hybrid method had outperformed other approaches.

The authors in [67] were engaged in a comparative study on different FS techniques in SA. They found that metaheuristic techniques have been widely used for selecting features in text classification problems. Based on their review, metaheuristic techniques were able to obtain optimum feature subsets and increase the performance of sentiment classification. Therefore, they believed that metaheuristic can be potentially used as an FS technique in SA. Additionally, [68] proposed a two-step technique for text classification sentiment. During the first step of this technique, co-clustering was used on words and documents to reduce the dimensional space of SWs. During the second step, genetic algorithm was used in text classification to calculate the weights of the SWs in the text. Reviews of movies, books, and cameras were used from the ROMIP_2011 seminar as datasets. Thus, the findings from these two studies serve as proofs that the current proposed method is better than other classifiers, such SVM and lexical.

In [69], a novel multi-swarm particle swarm optimization (MPSO) algorithm was proposed to be used as an FS technique in selecting emotional features in course reviews. The researchers used a sentiment recognition concept to understand the emotions and feelings of learners. The effectiveness of the MPSO algorithm was evaluated using baseline algorithms, such as IG, mutual information (MI), Chi-square statistic, GA, and single swarm-particle swarm optimization (SSPO). From the experimental results, the MPSO algorithm was found to be effective in reducing the redundancy of text features and in recognising discriminative features. The MPSO algorithm obtained over 88% of micro F-measure and was able to reduce the dimension of initial feature space from 10,000 to 3,000 dimensions.

Table 6
FS in SA using the metaheuristic approach

Author Metaheuristic approach Datasets Accuracy Precision (P), Recall (R), F1 Score

[10] Entropy weight genetic algorithm Movie review, U.S. forum, and Middle Eastern forum. Movie review (91.70%) U.S. Forum (92.80%) Middle Eastern Forum (93.60%) –

[56] Genetic algorithm Product review and People’s Daily of 1998. 95.1% –

[65] Genetic-based machine learning English news, movie review, Bengali news, and Bengali blog domains. – English news, P (90.22%), R (96.01%); Movie review P (93%), R (98.55%) Bengali news, P (87.65%), R (89.06%), Bengali blogs P (90.6%), R (92.4%)

[64] Markov blanket classifier tabu search Two collections of online movie review from IMDB, and three collections of online news provided by Infonic Ltd. 87.52% (5-fold cross validation) 92.70% (unigrams, bigrams, semantic features) –

[66] Information gain and genetic algorithm Multi domain product and movie review. SVM (classifier) – (77%) –

[68] Word and document co-clustering genetic algorithm Russian information retrieval evaluation seminar (ROMIP). – Movies, P (67.78%), R (65.02%), F1 Score (66.10%), Books, P (67.26%), R (65.54%), F1 Score (66.17%), Cameras, P (70.56%), R (67.91%), F1 Score (68.61%)

[69] Multi-swarm particle swarm optimization Massive open online course.

3.5 Text FS based on traditional methods using metaheuristic approach

Author and literature	Technique	Advantage (A)/Limitation (L)
[10, 56]	IG $+$ Entropy Weighted Genetic Algorithm (EWGA)	A: Increases the classification accuracy value and is able to identify the main features in the sentiment class.
		L: Less efficient in determining the ranking of features.

Author and publication	Feature selection (NLP $+$ non-NLP)	Sentiment word identification
[10]	NLP approach $+$ IG $+$ GA	Not mentioned
[56]	GA $+$ CRF	Not mentioned
[37]	NLP approach $+$ IG $+$ RST	Not mentioned

Author	Metaheuristic approach	Datasets	Accuracy	Precision (P), Recall (R), F1 Score
[10]	Entropy weight genetic algorithm	Movie review, U.S. forum, and Middle Eastern forum.	Movie review (91.70%) U.S. Forum (92.80%) Middle Eastern Forum (93.60%)	–
[56]	Genetic algorithm	Product review and People’s Daily of 1998.	95.1%	–
[65]	Genetic-based machine learning	English news, movie review, Bengali news, and Bengali blog domains.	–	English news, P (90.22%), R (96.01%); Movie review P (93%), R (98.55%) Bengali news, P (87.65%), R (89.06%), Bengali blogs P (90.6%), R (92.4%)
[64]	Markov blanket classifier tabu search	Two collections of online movie review from IMDB, and three collections of online news provided by Infonic Ltd.	87.52% (5-fold cross validation) 92.70% (unigrams, bigrams, semantic features)	–
[66]	Information gain and genetic algorithm	Multi domain product and movie review.	SVM (classifier) – (77%)	–
[68]	Word and document co-clustering genetic algorithm	Russian information retrieval evaluation seminar (ROMIP).	–	Movies, P (67.78%), R (65.02%), F1 Score (66.10%), Books, P (67.26%), R (65.54%), F1 Score (66.17%), Cameras, P (70.56%), R (67.91%), F1 Score (68.61%)
[69]	Multi-swarm particle swarm optimization	Massive open online course.

As previously mentioned, FS is an NP-hard problem, so it requires an efficient algorithm to solve it, such as a metaheuristic algorithm [43, 70, 71, 72]. Researches on SA require a realistic application for effective and accurate analysis as SA is a field that analyses various opinions or feelings about a variety of topics, and these can be derived from different types of information sources. Therefore, an optimum solution would be impossible to find, unless the search process was conducted thoroughly in the solution space. The metaheuristic approach could be used to find an excellent solution without having to explore the whole solution space. As in any approach, the quality of the solution depends on the methods used. Metaheuristic-based methods have already been shown to be able to solve optimization problems. However, in reality, an application would only need to find a good solution within an appropriate time frame rather than finding the optimum solution.

According to [8, 37], sentiment classification is also a text classification problem. Traditional text classification involves classifying documents into various topics, such as politics, science, and sports. It is done according to topics related to features, with words related to important topics. In contrast, classification in SA is conducted based on SWs in the document or sentences. Examples of SWs are “good”, “bad”, “excellent”, and “amazing”. These SWs denote the user’s opinion about certain objects, such as products and services, or other matters and show whether the user has made a positive, or a negative comment. In this review, the use of FS techniques based on metaheuristics in previous SA studies was compared with the use of metaheuristics in normal text classification studies. This comparison was made because both SA and normal text classification deal with the same data format, namely, text data. Thus, the advantages and limitations of each metaheuristic technique were identified in this review, for resolving the FS problem.

The metaheuristic approach is a high-level strategy and an iterative generation process, which guides the exploration of the search space by using different techniques, such as ACO, GA, and PSO [73, 74]. Table 7 provides a summary of several studies that applied metaheuristic-based FS methods to classify traditional text and sentiments. The advantages and disadvantages of these methods are summarised in Table 8.

Table 7
FS techniques in text classification (traditional) using the metaheuristic approach

Author and publication	Method
[75, 76, 77, 78, 79, 80, 81, 82]	Ant Colony Optimization (ACO), Genetic Algorithm (GA)
[83, 84, 85]	Particle Swarm Optimization (PSO)
[82]	Chaos Optimization (CO)

Table 8

The advantages and disadvantages of metaheuristic approaches

Method	Advantage	Disadvantage
Genetic Algorithm (GA)	Can solve optimization problems based on chromosome approach [86].	The large feature size affects the ability to obtain the optimum solution [77].
	Less complex and more straightforward [86].	GA does not continuously derive global optimum, especially when the overall solution has several populations [86].
	Has the capability to generate efficient solutions for complex and mathematically sophisticated environments [86].	GA is a complex algorithm [86].
Ant Colony Optimization (ACO)	Faster in searching for the optimum solution, even with tens of iterations. In a study by [77], the optimum solution was found at the 100 ${}^{\text{th}}$ iteration, even though the total number of dataset was 21,578, with tens of billions of subset data that needed to be processed.	Processing time might be affected by the dimension problem (total features) and data size [77].
	Has a powerful exploring capability; a gradual search process for approaching the optimum solution [77].
	Is efficient in getting the minimum feature subset.
	The ACO concept is very simple and could be executed with just a few lines of coding [77].
	The ACO needs a primitive mathematical operator [77].
	Not expensive in terms of memory usage and speed [77].
	Has a memory [77].
Particle Swarm Optimization (PSO)	Has a powerful exploring capability as it is a gradual search process for approaching the optimum solution [83].	Is suitable for processes involving a coordinate system [87].
		Could easily reach a partial optimum, which could affect speed .

Metaheuristic techniques, such as ACO and GA have been used as the FS techniques in traditional text classification process [76, 77, 88]. Particle swarm optimization has only been used for traditional text classification in the Chinese [84] and Arabic [85] languages. Research by [77] found that only ACO was capable of obtaining the optimum feature subset. The advantages of ACO are as follows [77]:

Speed of convergence,

Good search capability in the problem space, and

Efficient in searching for the minimum feature subset.

However, the ACO’s processing time could be affected by the dimension problem (total number of features) and data size [77]. This problem can be resolved by combining ACO and RST [89, 90]. RST is a technique that can reduce the size of the feature by eliminating all overlapping features [58]. In another SA study by [37], in which the RST was combined with IG, one disadvantage with RST is that it struggled to obtain the optimum feature subset. Therefore, it was suggested that RST should be combined with a metaheuristic algorithm to obtain the optimum feature reduction [59, 60]. As such, the metaheuristic algorithm is seen as a potential FS technique in SA to obtain an optimum feature set. Tables 9 and 10 show the results of the various FS methods described in this review. The summary is based on different datasets and the different performance criteria that were used in the previous studies.

Based on the discussions in Section 3.4 and the comparisons listed in Table 6, GA has been widely used as a FS technique in SA. Conversely, the study by [77] and the comparative analysis in Table 8 showed that ACO has more potential than GA in producing optimal subset of features and can improve the performance of sentiment classification.

Table 9

Different classifiers for FS in SA (% accuracy) adopted from Sharma and Dey [47]

Feature selection	Classifier
	Naïve bayes	SVM	Maximum entropy	Decision tree	K Nearest Neighbour	Winnow	Adaboost
Document frequency	82.85	85.15	72.65	68.8	63.15	66.05	66.35
Information gain	88.85	89.20	87.35	74.45	71.15	71.15	65.20
Gain ration	90.90	90.15	88.85	75.35	75.15	73.50	65.70
CHI	88.40	89.45	87.10	73.10	70.30	69.50	65.20
Relief-F	82.50	83.40	77.95	88.30	65.25	64.85	66.90

3.6 Research challenges in SA

Based on this review, SA technology has become a necessity because it provides many benefits to the public. SA has also become an interesting and challenging research field in this era. However, SA technology has some challenges to overcome due to the diverse information that exists on websites. Some of the factors that could be considered challenges in the field of SA are described as follows [6, 40, 57].

Accuracy: Numerous SA studies were conducted in relation to products, politics, and social media. However, SA services that are commercially available for simplifying consumers’ comments are still limited because such services require accurate and correct information to be channelled to the consumers. Failure to provide accurate information could lead to impaired quality of service. Most of the tasks in SA are done manually (i.e., with the help of humans). Companies that provide commercial services related to SA are concerned about having highly accurate information. However, there are difficulties in understanding the language used in the comments. The use of complex sentences and expressions often cause SA to generate inaccurate information [40]. This problem is literally and closely associated with gaining a more detailed and clear understanding of NLP, with regards to context dependency, semantic relatedness, and ambiguity [6]. To overcome the challenges in SA, it is obvious that each step of the process should be refined and improved [40]. A clear understanding of the process may help to improve the accuracy.

Scalability: SA technology is mostly implemented in the form of web applications. The main purpose of SA technology is to develop a fast, accurate, and efficient online information search engine that can provide sentiment summaries of the information provided by the consumers regarding products and services. It should be possible to access the search engine anytime and anywhere. Nevertheless, the huge size of data on websites, the increasing volume of information, the limited speed of the Internet, and the high data dimensionality are all challenges that need to be addressed to improve this technology. These can be overcome by applying a more complex version of NLP. In [40], the authors proposed that algorithm development is done in parallel so that the segregation of duties within SA is balanced in order for text processing to become faster. They also suggested that cloud, or grid computing should be adapted for the SA web service due to their technological advantages in terms of scalability.

Standards for dataset and evaluation criteria: The use of datasets in SA should follow a certain predetermined standard to control its output quality. Some researchers have used their own datasets [12, 13, 14]. However, there are currently no proper specifications to outline the standards of datasets that should be

Table 10
Summary of survey on various FS methods

Survey	Feature selection	Datasets	Language	Performance:	Precision	Recall	F1 score
				Accuracy, precision,
				recall and F-score
[48]	Log likelihood ratio	Digital camera and music review articles	English	85.6%	87%	56%	–
[12]	Part-of-Speech tagging	Customer review	English	–	72%	80%	75.8%
[49]	Log likelihood ratio	Feedback item from Global Support Services survey and Knowledge Base survey.	English	85.47%	–	–	–
[50]	Weighted log-likelihood ratio	Movie review	English	89.53% (average)	–	–	–
[46]	Information gain	Corpus of English blogs and corpus of French review.	English, French	32%	–	–	–
[51]	Part-of-Speech tagging, Chi-square	Movie review	English	94.9%	–	–	–
[52]	Feature subsumption hierarchy	OP, Polarity, and MPQA	English
[9]	Dependency relation	Amazon reviews	English	–	72.6%	78.7%	75.4%
[23]	Part-of-Speech tagging	Customer review	English		73.3%	85.7%	79%
[1]	Part-of-Speech tagging	Customer review	English	–	82%	65.1%	–
	Multi-word aspect
	Heuristic rule
[24]	Combined pattern-based	Customer review	English	–	78.98%	71.77%	75.19%
	Noun phrases
[17] ${}^{*}$ Choose the best classifier	Minimum redundancy maximum relevance, Information gain	Movie review (M), Book (B), DVD, Electronics (E)	English	–	–	–	Multinomial Naïve Bayes (%)
							M 91.1
							B 92.5
							D 91.5
							E 91.8
[37]	Rough set theory and Information gain	Movie review, product (book, DVD, and electronics) review	English	–	–	–	SVM (classifier) Movie (87.7%) Book (80.2%) DVD (83.2%) Electronics (83.5%)
							NB (classifier) Movie (80.9%) Book (79.1%) DVD (78.1%) Electronics (78.1%)

used in this field of study [40]. The reason for this is because different standards exist in different studies for their respective assessment measurements and datasets. This often results in complicated decision-making to what would be the best methodology to be used. Previous studies [12, 49, 91, 92] used different datasets and measurements, which makes it difficult to judge which is the best technique. Since it is noticeable that each study had used different datasets and evaluation criteria, a standard specification for datasets and evaluation measurement should be introduced to enable a fair comparison between the different methods [40].

Quality of review data: Most of the data required in SA are obtained from various sources, such as the web, social media, forums, and Twitter. Most of these sources can be accessed by anyone. The information contained in these sources may also be unreliable because consumers might deliberately give misleading opinions. Thus, the quality of the comments or opinions about products can be questionable. In addition, there may be spamming problems with the information available on the website. This can affect the output quality generated by SA. Therefore, it is necessary to identify spam in customers’ reviews [93, 94, 95]. Thus, a solution that can evaluate the quality of the reviews that appear in social media must be developed to ensure that the results generated from SA are of quality, reliable, and highly accurate.

Short form words: Comments on websites may also contain words in their short form, such as “pics” for pictures, and “res” for resolution. This can make it difficult to interpret the actual meaning the user wished to convey. It could also have an adverse effect on feature identification and sentiment type, and cause a problem in the classification process. This could be resolved with the help of a linguistic expert, who could advise on the correct and precise language used.

Object identification: Identifying the object in each review is very important in SA. Failure to recognise the intended object in a user’s comment would make the comment useless. Researchers should be able to distinguish between related and unrelated objects in a user’s comment.

Feature extraction and FS: Based on previous studies, various approaches on feature extraction have been introduced in SA. Researchers need to understand the difference between feature extraction in SA and FS in machine learning. Lately, more researches have combined the process of feature extraction and FS to obtain good subset features and to improve the performance of sentiment classification.

Synonym words: Researchers should also consider the functions of synonym words in SA. In [96], it was argued that synonym words need to be identified and grouped together. Additionally, [97] extensively discussed the function of synonym words. They identified that a lot of distinctive features contain synonyms. Consequently, they concluded that the issue of synonym words in SA is a complex one. Thus, it requires a more detailed research. The author in [8] also argued that the process of grouping synonym words in SA is challenging. Meanwhile, [15] claimed that many synonyms are domain dependent. Different synonyms are used in different senses, and thus, not all synonym words can be generalised since this practice will promote more errors [15]. For example, for the domain “movie”, the words “movie” and “picture” are synonymous. However, for the domain “camera”, the word “picture” is synonymous to the word “photo”. Additionally, the word “movie” is more likely to be synonymous to the word “video”. Most feature expressions are multi-word phrases, and thus, they cannot be easily selected from dictionaries [8]. The author also argued that while most aspects of expressions are describing one similar feature, the case is not applicable in the same domain. For example, “expensive” and “cheap” can both indicate the feature “price”, but they are not synonymous (the two are antonyms).

The previously discussed challenges still prevail in the SA h field. However, this study has critically reviewed and discussed findings from previous studies to identify the advantages and limitations of the various FS techniques applied in SA. The advantages and limitations of FS techniques for normal text classification that apply metaheuristic approaches were also reviewed. The potential of using this type of approach as a FS technique in SA was then considered, and the limitations that could be overcome by using a metaheuristic approach were identified.

Classification of the cross-domain sentiment: The problem of cross-domain is equally challenging and need to be taken into consideration in terms of the sentiment classification process. The same words may have different sentiment polarities in different domains. For example, the word “hot” in “The room is very hot” relays a negative sentiment in the domain “house”. However, in the sentence “The shower had great hot water,” “hot” has a positive sentiment in the domain “hotel”. The problem with cross-domain can affect the performance of sentiment classification if it is not rectified. According to [98], the main four problems with cross-domain are:

Sparsity – occurs when words or phrases in the target domain do not exist in the source domain.

Polysemy – the meaning of a word changes depending on whether it is in the target domain or in the source domain. This condition makes it difficult to test the accuracy of feature representation.

Feature divergence or feature mismatch refers to the mismatch between the domain-specific word/feature [98].

Polarity divergence – a feature that has different sentiment polarity in different domains.

In [98], a combination of the deep learning method and word embedding was suggested to overcome the problems with cross-domain. In their study, [99] were able to develop a sentiment-related index that could determine the association between different lexical elements in specific domains. Then, the SentiRelated algorithm was developed based on this index. This algorithm functions by adding features from the target domain to feature vectors that were extracted from the source domain. This algorithm was capable of reducing the difference between the target domain and the source domain. Nonetheless, numerous solutions were proposed by previous studies to solve the problems with cross-domain. Several factors need to be taken into consideration, such as the different languages used, the cultural factor, linguistic variations, and the different contexts and noises in the data.

4. Sentiment words (SWs)

4.1 Definition of a SW

The most important elements to consider in SA are “sentiment words”, which can act as indicators of sentiment. SWs are also known as “opinion words”. These words can be used to express positive or negative sentiments. For example, words that convey a positive sentiment may include “good”, “great”, and “excellent”, while negative SWs may include “bad”, “annoying”, and “angry”. A SW can also be represented by phrases and idioms, e.g., “It costs me an arm and a leg”. The identification of SWs and phrases is crucial for the success of SA. A list of such words and phrases is called a “sentiment lexicon” or “opinion lexicon” [8].

4.2 The importance of SW in SA

SW is an integral element in sentiment classification because it enables words, sentences, or documents to be categorised into positive or negative sentiments. In [18], it was stated that on a documental level, the entire content of a document is identified with either a positive or a negative sentiment. Meanwhile, on a sentential level, the first step in classifying sentences is to classify them into either a subjective or an objective sentence. An objective sentence expresses factual information, while a subjective sentence expresses subjective views and opinions [8]. To note, both levels of analysis (documental and sentential) do not indicate users’ preferences. The author further argued that a finer-grained analysis can only be achieved through an analysis on a feature level. In a feature level analysis, object identification, identification of features in an object, and identification of expression sentiments from users on features are crucially important. Additionally, a SW is important in determining the expression sentiments from users regarding their level of satisfactions.

SWs can also help users assess an object, or a product for sale per day by skimming through the results of the sentiment classification analysis from social media. Nonetheless, the amount of information in social media is overwhelming, which makes it slow and difficult for users to assess a product. Therefore, the SA technology can assist users to automatically process SWs contained in the customer review datasets. Among the steps in SA are text preprocessing, FS, feature and SW matching process, and sentiment classification. Lastly, the outputs from the sentiment classification process could give an overall view on product features; whether it falls under the positive or negative category.

4.3 Methods to identify the relationship between feature and SW

A. Nearby adjective

The concept of nearby adjective was used by [14] to identify SWs or opinion words. The authors used adjective words as opinion words. A nearby adjective means that the adjacent adjective can amend nouns or noun phrases that are frequent features. This method was applied when the distance between the feature and the SW was close. But what if they are distant from each other? What if there are more than one feature and sentiment word in the same sentence? How should the matching process between feature and multiple sentiment words be performed? This condition will make it difficult to produce an exact match.

B. Pattern knowledge

Pattern knowledge was used by [23] to extract features and find the nearest SWs to an adjective or adverb. Table 11 shows the list of patterns of extracted phrases. The authors used the Stanford-POS tagger to parse each sentence to identify the noun, noun phrase, or verb group for feature extraction. They collected a set of SWs (adjectives) to determine the opinion orientation (negative or positive) of each sentence. If an adjective appeared to be close to a product feature in a sentence, then it was considered as an opinion word.

Table 11
Patterns of extracted phrases (adopted from [23])

Pattern	The first word	The second word	The third word
Pattern 1	JJ	NN/NNS	–
Pattern 2	JJ	NN/NNS	NN/NNS
Pattern 3	RB/RBR/RBS	JJ	–
Pattern 4	RB/RBR/RBS	JJ/RB/RBR/RBS	NN/NNS
Pattern 5	RB/RBR/RBS	VBN/VBD	–
Pattern 6	RB/RBR/RBS	RB/RBR/RBS	JJ
Pattern 7	VBN/VBD	NN/NNS	–
Pattern 8	VBN/VBD	RB/RBR/RBS	–

The following are some examples of sentences to illustrate how the pattern knowledge process works:

Sentence 1: This camera is perfect for an enthusiastic amateur photographer. Sentence 2: It is light enough to carry around all day without being cumbersome.

In Sentence 1, the feature “camera” is close to the SW “perfect”. In this sentence, the nearby adjective can be extracted as the SW because this sentence contains the feature. However, Sentence 2 does not contain a feature (camera), but it does contain the SW “light”. Thus, the feature in this sentence is known as an implicit feature. The weakness of this pattern knowledge process is that it cannot extract the SW in Sentence 2 because the sentence contains an implicit feature.

C. Dependency relationship

The authors in [9] proposed the concept of dependency analysis to extract product features and to identify SWs related to these features. The authors used the Stanford-typed dependencies to represent a simple description of the grammatical relation in a sentence. These typed dependencies contained 50 grammatical relations [100, 101]. The grammatical relations were represented in a hierarchy form that had a head and were dependent between words. Figure 3 shows an example of the dependency relationship for a sentence.

They used the Stanford lexicalised parser to compute the syntactic parse tree. There is a huge selection of linguistic structures that could express the relationships between features and SWs. The shortest dependency path and syntactic relationship were combined to develop six syntactic relationships between the product feature and the SW, as shown in Table 12.

Table 12

Six syntactic relationships (adopted from [9])

Type of relationship	Description
Parent	Opinion depends on the product feature.
Child	Product feature depends on the opinion.
Sibling	Both the opinion and the product feature depend on the same word.
Grandparent	Opinion depends on the word, which depends on the product feature.
Grandchild	Product feature depends on the word, which depends on the opinion.
Indirect	None of the above relations.

Figure 3.

Example of a dependency relationship in a sentence.

D. Typed dependency relations

In [102], the author used three types of typed dependency relations (TDR) parameters, namely, ACOMP, XCOMP, and ADVMOD, to identify sentiment sentences and their relations with related features. The author also attempted to draw the relationship between features and SWs in a customer review data set by showing that the number of TDR relations involved was more than three, which was more than the typical quantity used. According to [101], TDR in Stanford Parser has approximately 50 grammatical relations.

TDR is clear because it involves a direct link between one word and another in the same sentence. Apart from that, TDR has a closed link, with simple and easy to understand semantic relationships, which facilitate the next interpretation process [100, 103, 104]. TDR also uses Standard English grammatical relationships [103]. They ultimately suggested that this approach is capable of identifying features and SWs in longer and more complex sentences found in customer reviews. Thus, this approach should be explored further.

E. Dependency relations

Similarly, [105] used typed dependency relations in their study to identify the relationship between features and SWs. They were able to identify the two stages in the process of identifying the relationships between features and SWs, known as the word stage and the phrase stage. In this context, fuzzy measurements were used to calculate sentiment phrasal words and opinion degree intensifiers. Each SW has a weightage based on fuzzy measurement, while the weightage for customer review used fuzzy operation. The frequency of a feature SW occurring in each review and the fuzzy weightage value for each sentiment are the two main aspects in determining the weightage value for each customer review. Datasets from different products, namely, Canon camera, Casio watch, and Nike shoes were used to test the proposed algorithm. Based on the datasets, a test that consisted of five main stages (sentiment classification, orientation evaluation and prediction, FS, sentiment word extraction, and the extraction of feature and SW relation) were carried out to validate the method. Consequently, the proposed method has performed well compared to its equivalent algorithm. However, according to [105], this method has a few drawbacks, listed as follows:

The proposed algorithm could not generate dependency relation for long sentences and was unable to extract feature and sentiment words.

No strategies to accurately calculate the polarity value of ironic and subjunctive expressions in customer reviews.

The use of WordNet dictionary was the most appropriate solution to externally correct wrongly spelled words.

F. Walk and Learn

In [106], a novel two-stage method, named Walk and Learn was proposed. In the first stage, the authors proposed the Sentiment Graph Walking algorithm to cope with the problem of false opinion relation. The sentiment graph was combined with random walking to estimate the pattern of confidence. Therefore, the terms that have low confidence were the terms that were extracted using low-confidence patterns. This condition could improve the accuracy of the extraction. Based on the results from the first phase, the following problems had to be taken into consideration:

False opinion target – there are expressions of opinions that contain non-target terms, such as “good thing”, “nice people” in the review.

Low degree of long-tail opinion targets in a Sentiment Graph.

Hence, they used self-learning strategies during the second phase to filter false opinion target and extract long-tail opinion targets from the first stage.

G. A word vector and matrix factorisation

In [107], the author argued that most existing extraction methods use dictionaries. The main weakness of using diaries is the difficulty in identifying domains that rely on sentiment words. Meanwhile, a corpus-based method is dependent on sentiment seed words, but has limited sentiment information, without taking into account the context information. To address this problem, they developed a word vector and a matrix factorisation-based method for extracting opinion lexicon. The results showed that the proposed method had achieved the highest accuracy in identifying sentiment polarities of opinion words.

4.4 The challenges in identifying features and SWs in SA

The three main characteristics that need to be identified from consumers’ comments are entities or objects, characteristics or feature of an object, and SWs that are associated with each object feature. The process of identifying the actual feature in a document or sentence is very important because it plays a major role in determining the actual objects being reviewed or commented on by the consumers. Once the feature is identified in a sentence, the next step is to identify the SW that is associated with the feature of the sentence. However, if the identification of the actual feature and SW fails, this could result in an inaccurate SA output. Inaccurate analysis output could have a poor impact, and the results of the analysis may not help consumers to obtain the actual information regarding matters that concern them, such as products, services, and politics. The relationship between feature and SWs should be accurate so that the SA can generate accurate results. Nonetheless, problems may arise if there is more than one feature and SW in the same sentence. Appropriate methodologies are required to generate a corresponding relationship between feature and SW.

According to [8, 96, 105], the task of identifying features, the SWs they express, and the relationships between features and sentiment expressions from the sentence are very challenging. These challenges must be taken into account when developing a system related to SA, which are discussed in the following subsections.

A. SW has different meanings

A SW that has the same terminology can carry different meanings, polarity, and orientation depending on the context of use. Examples of sentences that have the same terminology, for example, “small” are:

Sentence 1: The camera has a small size; (size $\rightarrow$ feature; small $\rightarrow$ sentiment word). Sentence 2: The camera has a small battery life; (battery life $\rightarrow$ feature; small $\rightarrow$ sentiment word).

Sentence 1 has a positive orientation because it describes a small-sized camera that can be carried anywhere easily. However, Sentence 2 has a negative orientation because it describes the camera as having a short battery life, thus, the battery needs to be charged or replaced regularly. In addition, the use of negation words, such as “not” and “but” could also affect the orientation and polarity of a sentence, for example:

Sentence 1: The sound is clear. Sentence 2: The sound is not clear.

Sentence 1 illustrates a positive orientation, whereas Sentence 2 has a negative orientation. This difference is due to the use of the negation word, “not”, in Sentence 2. A negation word can modify the sentence from a positive to a negative orientation and vice versa. Several studies have discussed the use of negation words. In [108], the roles of negation words in SA and the various approaches to handle negation words were presented. Additionally, [3, 108] stated that negation words in ironic and sarcastic sentences are difficult to identify. The complications related to negation words are because negations are not only confined to common negation words, such as “not”, “never”, and “no”, they also encompass lexical units, such as phrasal verbs. To note, the word “lexical” denotes the meaning of relating to the vocabulary of a language. Consequently, to handle negation words, every researcher must be well-versed in the type of sentences in datasets, grammar structures, lexical knowledge, semantic relations, and syntactic patterns [109]. Additionally, the factor of reliable identification of genuine polar expressions in specific contexts must be considered in SA [108]. Thus, these conditions must be taken into account when generating a detailed and meaningful output.

B. Implicit SW

SW can also be categorised as explicit or implicit. The sentence, “Orange tastes great” clearly describes the explicit SW, “great”. In comparison, this sentence, “I bought the mattress a week ago, and a valley has formed” has implicit SWs. According to [8], it is difficult to identify an implicit SW, and most previous studies have focused only on sentences that have explicit SWs.

C. Different words to describe a feature

According to [110], consumers use different words to describe the features of a product. For example, both “picture” and “photo” refer to the same feature of a camera. A synonym is a word that has a similar meaning to another word. Synonyms can be found in some dictionaries. Researchers in the field of SA use a lexicon, which is similar to a dictionary. Examples of lexicons are WordNet and SentiWordNet, which have been used in various studies [14, 97, 111, 112] to check for synonyms that are used in the dataset. There are several constraints in the process of identifying a feature when synonyms are involved:

1.
Most of the non-synonymous words in the lexicon literally refer to the same feature in the application domain. “Appearance” and “design” are examples of non-synonymous words, but refer to the same feature of “design” [110].
2.
Many synonyms are dependent on the domain, for example, “movie” and “picture” are examples of synonyms used in a film review. However, the use of these words is different for a camera review, where “picture” is more synonymous with “photo”, whilst “movie” is more synonymous with “video”.

Therefore, according to [110], there is a need for SA to have a group of features that contains a list of the same features of an object or product. In addition, they also recommended the use of the Expectation-Maximisation (EM) algorithm as a semi-supervised learning method to classify all features into several topics according to consumers’ opinions. To meet the needs of consumers, they labelled the data for each topic. Then, the system functioned according to the semi-supervised learning method for categorizing the feature list. In [113], EM was used and two assumptions were established for producing good results:

1.
The lists of features that share words with similar terminologies are gathered in the same group, for example, “battery life” and “battery power”.
2.
The lists of features that have similar synonyms in the dictionary are considered as originating from the same group, for example, “movie” and “picture”.

These assumptions could facilitate the classification process when using a better EM technique. However, [2] used the concept of constrained-latent Dirichlet allocation (LDA) to categorise the features. In this technique, two expressions were used in identifying features in the sentences:

1.
If two expressions of a feature share one or more words, they are known as “Must-Link”, where the expressions appear in topics with similar categories, for example, “battery power” and “battery life”.
2.
If two expressions of a feature appear in one sentence, they are known as “Cannot-Link”. This is because there is no possibility of reiteration of the same feature in a sentence, for example, “I like the picture quality, battery life, and zoom of this camera”.

D. Implicit feature

According to [14], features can be divided into two categories, namely, explicit and implicit features. An explicit feature leads to a clearer expression in a sentence, for example, “The movie quality of this video is great”. This sentence has clearly described the “movie quality” (noun/noun phrase) as the feature, and “great” as the SW. However, if a sentence contains an implicit feature, the sentence does not have any noun or noun phrase for describing the feature of an object, such as a camera. An example of a sentence that contains an implicit feature is “This camera is light”. This sentence vaguely describes the weight, which is the feature of the camera. Only the adjective word “light” is used to represent the feature of the object, the camera. In addition, an adverb, verb, or verb phrase can be used to represent an implicit feature [8]. It is crucial to identify the actual implicit feature commented on by consumers because the feature in the sentence is implicit or hidden.

E. Pronouns

A word that replaces the noun in a sentence is called a pronoun. The use of pronouns makes it challenging to identify the actual feature that represents the noun in the sentences of users’ comments. In [114], the authors studied and analysed the importance of pronouns in sentences and their effect on sentiment classification. A mechanism or approach is needed to identify the feature represented by the pronouns in sentences. For example, “This one was rated very highly by several people, who checked out this site and epinions.com” has the word “one”, which was used as a pronoun to replace the noun “camera” as the feature.
5. Discussion

This review was conducted to determine the prospects of applying the metaheuristic approach as a FS technique in SA. To achieve this objective, research articles on FS techniques from 2003 to 2016 were studied. This review has concluded that FS techniques in sentiment classification can be divided into two categories: i) based on natural language processing; and ii) based on a combination of NLP and machine learning techniques. This review has also identified the research gaps in previous literature based on the positive and negative criteria of each FS technique.

Additionally, the differences in domains have also been reviewed in implementing metaheuristic approach as a FS technique. As explained in Section 3.5, a traditional text classification topically classifies the topics contained in a document, whereas a sentiment classification classifies the features, sentences, or documents into positive or negative sentiments. The similarity of these domains is that the datasets are presented in the form of textual data. It was observed that metaheuristic approaches have been extensively used as FS techniques in traditional text classification. Thus, a review of previous literature was conducted to identify the weaknesses and limitations of metaheuristic approaches. Based on Table 6, genetic algorithm is one of the widely used metaheuristic approaches as a FS technique in SA. However, findings from the literature in Sections 3.4 and 3.5 have shown that ACO possesses a better exploration capability, stronger search capability in problematic space, and higher efficiency in producing minimum subset feature compared to the GA approach [77]. Likewise, this review opines that metaheuristic approaches have the potential to be used as FS techniques in SA. It is inarguable that the domains for sentiment classification and traditional text classification are different. Nonetheless, both classification techniques are still closely linked to text classification.

The technique for detecting the relationships between features and SWs is an important aspect in SA. Failure to accurately detect feature, SW, and the relationship between the two could adversely affect the sentiment classification process. Consequently, the efficiency of the classification process would greatly decrease. The followings are a few suggestions to improve the limitations of the current method:

1.
A more systematic FS technique that is able to perform search processes more efficiently should be used, for example, the metaheuristic approach. In [77], the authors have proven that the metaheuristic approach is able to produce optimum and high-quality feature subset.
2.
In the initial stage, customer review datasets have to undergo a data cleaning process to fix spelling and grammatical errors. The use of word processing software, such as Microsoft Word, could help in ensuring that the words are grammatical and correctly spelled [115]. This process is important to ensure that speech tagging and building typed dependency relation, which will come in later stages, are not affected. Failure to execute this process might affect the performance of sentiment classification.
3.
The algorithm of feature-SW relation could directly interact with Stanford API to generate and produce POS tags for each word, and formulate typed dependency relations between one word and another in sentences. This situation could help create a dependency for long sentences and automate the process of labelling the grammar in sentences. This process could also help to identify features and sentiments that are present in sentences.
4.
Features represented by pronouns and implicit words would need a NLP approach to have them correctly identified.

6. Conclusion

This paper has critically reviewed the functions of FS and SWs, as well as the relationship between FS techniques and SW in SA. It has also looked at various FS techniques that have been implemented to select the features in SA. The functions of the feature and SW in SA are very important, as mistakes made in selecting features or the actual SWs expressed in user comments could lead to wrong interpretations when the sentiments in the sentence are analysed. This review has also identified a number of studies that have combined NLP and non-NLP techniques, where this combination was able to increase the accuracy of sentiment classification. The selection of suitable NLP or non-NLP techniques should be carefully considered and matched with the problems at hand so that the solution can improve the classification result. This review of the literature in the SA domain has also identified the metaheuristic approach as a potential FS technique for SA, but a limited number of studies have utilised it in their work. This finding was revealed by comparing metaheuristic approaches that were implemented for selecting features in SA, and with those that were used in normal text classification. After analysing numerous articles, this review found that there is clearly a lot of room for research on the FS implementation process and on improving the metaheuristic approach for utilisation in SA.

Footnotes

Acknowledgments

The authors gratefully acknowledge Universiti Pertahanan Nasional Malaysia, the Ministry of Education Malaysia, and the Fundamental Research Grant Scheme for supporting this research project through grant no. FRGS/1/2016/ICT02/UKM/01/2.

References

Bagheri

Saraee

and de Jong

, Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews, Knowledge-Based Systems 52 (2013), 201–213.

Zhai

Liu

and Peifa

, Constrained LDA for Grouping Product Features in Opinion Mining, in: Huang

Cao

Srivastava

(Eds.), Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Ddscovery and Data Mining, Springer, 2011, pp. 448–459. http://www.springerlink.com/index/Q73120K14T206727.pdf.

Pang

and Lee

, Opinion mining and sentiment analysis, Foundations and Trends® in Information Retrieval 2 (2008), 1–135. doi: 10.1561/1500000001.

Khan

Baharudin

B.B.

Khan

and e-Malik

, Mining opinion from text documents: A survey, in: 2009 3rd IEEE International Conference on Digital Ecosystems and Technologies, 2009.

Vinodhini

and Chandrasekaran

, Sentiment analysis and opinion mining: A survey, International Journal of Advanced Research in Computer Science and Software Engineering 2 (2012), 282–292.

Khan

Baharudin

Khan

and Ullah

, Mining opinion components from unstructured reviews: A review, Journal of King Saud University – Computer and Information Sciences 26 (2014), 258–275. https://www-sciencedirect-com.web.bisu.edu.cn/science/article/pii/S131915781400010X%5Cnhttps://dx-doi-org.web.bisu.edu.cn/10.1016/j.jksuci.2014.03.009.

Medhat

Hassan

and Korashy

, Sentiment analysis algorithms and applications: A survey, Ain Sham Engineering Journal 5 (2014), 1093–1113.

Liu

, Sentiment Analysis and Opinion Mining, Morgan & Claypool Publishers, 2012.

Somprasertsri

and Lalitrojwong

, Mining features-opinion in online customer reviews for opinion summarization, Journal of Universal Computer Science 16 (2010), 938–955.

10.

Abbasi

Chen

and Salem

, Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums, ACM Transactions on Information Systems 26 (2008), 1–34. doi: 10.1145/1361684.1361685.

11.

Popescu

A.-M.

and Etzioni

, Extracting Product Features and Opinions from Reviews, in: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing HLT 05, Association for Computational Linguistics, 2005, pp. 339–346. doi: 10.3115/1220575.1220618.

12.

and Liu

, Mining Opinion Features in Customer Reviews, in: Proceeding AAAI’04 Proceedings of the 19th National Conference on Artifical Intelligence, AAAI Press, San Jose, California, 2004, pp. 755–760.

13.

and Liu

14.

and Liu

, Mining and Summarizing Customer Reviews, in: Proceedings of the 2004 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD 04, 2004, pp. 168–177. doi: 10.1145/1014052.1014073.

15.

Liu

and Yu

, Toward integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering 17 (2005), 491–502. doi: 10.1109/TKDE.2005.66.

16.

Fürnkranz

, A study using n-gram features for text categorization, Austrian Research Institute for Artifical Intelligence, 1998, 1–10. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49.133&rep=rep1&type=pdf.

17.

Agarwal

and Mittal

, Optimal Feature Selection for Sentiment Analysis, in: Gelbukh

(Ed.), Computational Linguistics and Intelligent Text Processing, Springer Berlin Heidelberg, 2013, pp. 13–24.

18.

Pang

Lee

and Vaithyanathan

, Thumbs up? Sentiment Classification using Machine Learning Techniques, in: Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 2002, pp. 79–86. doi: 10.3115/1118693.1118704.

19.

Abbasi

France

Zhang

and Chen

, Selecting attributes for sentiment classification using feature relation networks, IEEE Transactions on Knowledge and Data Engineering 23 (2011), 447–462. doi: 10.1109/TKDE.2010.110.

20.

Abbasi

Chen

Thoms

and Fu

T.F.T.

, Affect analysis of web forums and blogs using correlation ensembles, IEEE Transactions on Knowledge and Data Engineering 20 (2008). doi: 10.1109/TKDE.2008.51.

21.

Wiebe

Wilson

Bruce

Bell

and Martin

, Learning subjective language, Journal Computational Linguistic 30 (2004), 277–308.

22.

Carenini

R.T.

and Zwart

, Extracting knowledge from evaluative text, in: KCAP 05 Proceedings of the 3rd International Conference on Knowledge Capture, ACM Press, 2005, pp. 11–18. doi: 10.1145/1088622.1088626.

23.

Htay

S.S.

and Lynn

K.T.

, Extracting product features and opinion words using pattern knowledge in customer reviews, The Scientific World Journal 2013 (2013).

24.

Khan

Baharudin

and Khan

, Identifying product features from customer reviews using hybrid patterns, International Arab Journal of Information Technology 11 (2014), 281–285.

25.

Siqueira

and Barros

, A Feature Extraction Process of Sentiment Analysis of Opinions on Services, in: Proceedings of International Workshop on Web and Text Intelligence, 2010.

26.

Asghar

M.Z.

Khan

Ahmad

and Kundi

F.M.

, A review of feature extraction in sentiment analysis, J. Basic. Appl. Sci. Res. 4 (2014), 181–186.

27.

Jeong

Shin

and Choi

, FEROM: Feature extraction and refinement for opinion mining, ETRI Journal 33 (2011), 720–730.

28.

John

G.H.

Kohavi

and Pfleger

, Irrelevant Features and the Subset Selection Problem, in: Machine Learning: Proceedings of the Eleventh International Conference, Morgan Kaufmann Publishers, 1994, pp. 121–129. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.3875.

29.

Jain

and Zongker

, Feature selection: Evaluation, application, and small sample performance, IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997). doi: 10.1109/34.574797.

30.

Jain

A.K.

Duin

R.P.W.

and Mao

, Statistical pattern recognition: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (2000), 4–37. doi: 10.1109/34.824819.

31.

Nicholls

and Song

, Comparison of Feature Selection Methods for Sentiment Analysis, in: AI’10 Proceedings of the 23rd Canadian Conference on Advances in Artificial Intelligence, Springer-Verlag Berlin, Heidelberg ©2010, 2010, pp. 286–289.

32.

Guyon

and Elisseeff

, An introduction to variable and feature selection, Journal of Machine Learning Research 3 (2003), 1157–1182. doi: 10.1162/153244303322753616.

33.

Blum

A.L.

and Langley

, Selection of relevant features and examples in machine learning, Artificial Intelligence 97 (1997), 245–271. doi: 10.1016/S0004-3702(97)00063-5.

34.

Dash

and Liu

, Feature selection for classification, Intelligent Data Analysis 1 (1997), 131–156. doi: 10.1016/S1088-467X(97)00008-5.

35.

and Liu

, Efficient feature selection via analysis of relevance and redundancy, Journal of Machine Learning Research 5 (2004), 1205–1224. doi: 10.1145/1014052.1014149.

36.

Koncz

and Paralic

, An Approach to Feature Selection for Sentiment Analysis, in: International Conference on Intelligent Engineering System (INES 2011), IEEE, 2011, pp. 357–362. doi: 10.1109/INES.2011.5954773.

37.

Agarwal

and Mittal

, Sentiment Classification using Rough Set based Hybrid Feature Selection, in: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2013), Association for Computational Linguistics, 2013, pp. 115–119.

38.

Adnan

and Song

, Feature Selection for Sentiment Analysis Based on Content and Syntax Models, in: Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, ACL-HLT 2011, Association for Computational Linguistics, 2011, pp. 96–103.

39.

Mejova

and Srinivasan

, Exploring Feature Definition and Selection for Sentiment Classifiers, in: Fifth International AAAI Conference on Weblogs and Social Media, 2011, pp. 546–549.

40.

Kim

H.D.

Ganesan

Sondhi

and Zhai

, Comprehensive Review of Opinion Summarization, Illinois Environment, 2011, 1–30.

41.

Agrawal

and Srikant

, Fast Algorithms for Mining Association Rules in Large Databases, in: Bocca

J.B.

Jarke

Zaniolo

(Eds.), Journal of Computer Science and Technology, Morgan Kaufmann Publishers Inc., 1994, pp. 487–499. doi: 10.1007/BF02948845.

42.

Dave

Lawrence

and Pennock

D.M.

, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, in: Proceedings of the 12th International Conference on World Wide Web, 2003, pp. 519–528. doi: 10.1145/775152.775226.

43.

Kohavi

and John

G.H.

, Wrappers for Feature Subset Selection, Artificial Intelligence 97 (1997), 273–324. doi: 10.1016/S0004-3702(97)00043-X.

44.

Das

, Filters, wrappers and a boosting-based hybrid for feature selection, Engineering, 2001, 74–81. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.5264&rep=rep1&type=pdf.

45.

Saeys

Inza

and Larrañaga

, A review of feature selection techniques in bioinformatics, Bioinformatics (Oxford, England) 23 (2007), 2507–2517. doi: 10.1093/bioinformatics/btm344.

46.

Généreux

and Santini

, Exploring the use of linguistic features in sentiment analysis, in: Proc. Corpus Linguistics Conference, 2007, pp. 27–30.

47.

Sharma

and Dey

, A Comparative Study of Feature Selection and Machine Learning Techniques for Sentiment Analysis, in: Proceeding RACS ’12 Proceedings of the 2012 ACM Research in Applied Computation Symposium, ACM New York, NY, USA ©2012, 2012, pp. 1–7.

48.

Nasukawa

Bunescu

and Niblack

, Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques, in: Proceeding of the Third IEEE International Conference on Data Mining, 2003, p. 427.

49.

Gamon

, Sentiment classification on customer feedback data: noisy data, large feature vectors, and the role of linguistic analysis, in: Proceedings of the 20th International Conference on Computational Linguistics COLING, Association for Computational Linguistics, 2004, pp. 611–617. doi: 10.3115/1220355.1220476.

50.

Dasgupta

and Arifin

S.M.N.

, Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews, in: Proceedings of the COLINGACL on Main Conference Poster Sessions, 2006, pp. 611–618. doi: 10.3115/1273073.1273152.

51.

Tsutsumi

Shimada

and Endo

, Movie Review Classification Based on Multiple Classifier, in: Proc. 21st Pacific Asia Conf. Language Information and Computation, 2007, pp. 481–488.

52.

Riloff

Patwardhan

and Wiebe

, Feature subsumption for opinion analysis, in: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing – EMNLP ’06, 2006, p. 440. doi: 10.3115/1610075.1610137.

53.

Nguyen

Fraken

and Petrovin

, Optimizing a Class of Feature Selection Measures, in: Proceedings of the NIPS 2009 Workshop on Discrete Optimization in Machine Learning: Submodularity, Sparsity & Polyhedra (DISCML), 2009.

54.

and Liu

, Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution, in: International Conference on Machine Learning (ICML), 2003, pp. 1–8. doi: citeulike-article-id:3398512.

55.

Rahman

S.A.

, Multivariate Filter with Particle Swarm Optimization Variants for Feature Selection in Complex Datasets, 2011.

56.

Zhu

Wang

and Mao

J.T.

, Sentiment Classification using Genetic Algorithm and Conditional Random Field, in: Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on, 2010, pp. 193–196.

57.

Liu

, Sentiment analysis: A multi-faceted problem, IEEE Intelligent Systems 25 (2010), 76–80.

58.

Jensen

and Shen

, Fuzzy-rough sets assisted attribute selection, IEEE Transactions on Fuzzy Systems 15 (2007). doi: 10.1109/TFUZZ.2006.889761.

59.

Jensen

and Shen

, Fuzzy-rough attribute reduction with application to web catagorization, In the Transaction on Fuzzy Sets and System 141 (2004), 469–485.

60.

Jensen

and Shen

, New approaches to fuzzy-rough feature selection, IEEE Transactions on Fuzzy Systems 17 (2009). doi: 10.1109/TFUZZ.2008.924209.

61.

Vinodhini

and Chandrasekaran

R.M.

, A comparative performance evaluation of neural network based approach for sentiment classification of online reviews, Journal of King Saud University – Computer and Information Sciences 28 (2016), 2–12.

62.

Tang

Wei

Yang

Zhou

Liu

and Qin

, Learning sentiment-specific word embedding for twitter sentiment classification, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014, 1555–1565.

63.

Socher

Perelygin

J.Y.

Chuang

Manning

C.D.

A.Y.

et al., Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the …2013, 1631–1642. http://nlp.stanford.edu/∼socherr/EMNLP2013_RNTN.pdf%5Cnhttp://www.aclweb.org/anthology/D13-1170%5Cnhttp://aclweb.org/supplementals/D/D13/D13-1170.Attachment.pdf%5Cnhttp://oldsite.aclweb.org/anthology-new/D/D13/D13-1170.pdf.

64.

Bai

, Predicting consumer sentiments from online text, Decision Support Systems 50 (2011), 732–742.

65.

Das

and Bandyopadhyay

, Subjectivity Detection using Genetic Algorithm, in: Proceedings of the 1st Workshop on Computational Approaches to Subjectivity and Sentiment Analysis, 2010, p. 14. http://www.amitavadas.com/Pub/GA.pdf.

66.

Kalaivani

and Shunmuganathan

K.L.

, Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining, Scientific Programming, 2015, 15. doi: 10.1155/2015/961454.

67.

Ahmad

S.R.

Abu Bakar

and Yaakub

M.R.

, Metaheuristic Algorithms for Feature Selection in Sentiment Analysis: A Review, in: Science and Information Conference (SAI), 2015, pp. 222–226. doi: CAIT-DMO-2014-1.

68.

Kotelnikov

E.V.

and Pletneva

M.V.

, Text sentiment classification based on a genetic algorithm and word and document co-clustering, Journal of Computer and Systems Sciences International 55 (2016), 106–114. https://link-springer-com.web.bisu.edu.cn/10.1134/S1064230715060106.

69.

Liu

Sun

Peng

and Wang

, Sentiment recognition of online course reviews using multi-swarm optimization-based selected features, Neurocomputing 185 (2016), 11–20.

70.

Yusta

S.C.

, Different metaheuristic strategies to solve the feature selection problem, Pattern Recognition Letters 30 (2009), 525–534. doi: 10.1016/j.patrec.2008.11.012.

71.

Unler

and Murat

, A discrete particle swarm optimization method for feature selection in binary classification problems, European Journal of Operational Research 206 (2010), 528–539. doi: 10.1016/j.ejor.2010.02.032.

72.

Mafarja

and Eleyan

, Ant colony optimization based feature selection in rough set theory, International Journal of Computer Science and Electronics Engineering (IJCSEE) 1 (2013), 244–247.

73.

Ibrahim H Osman

G.L.

, Metaheuristics: A bibiliography, Annals of Operation Research 63 (1996), 511–623.

74.

Blum

and Roli

, Metaheuristics in combinatorial optimization: Overview and conceptual comparison, ACM Computing Surveys 35 (2003), 268–308. doi: 10.1145/937503.937505.

75.

Patel

and Gandhi

, A Detailed Study on Text Mining using Genetic Algorithm, International Journal of Engineering Development and Research, 2004, 101–105.

76.

Aghdam

M.H.

Ghasem-Aghaee

and Basiri

M.E.

, Application of Ant Colony Optimization for Feature Selection in Text Categorization, in: Evolutionary Computation, 2008. CEC 2008. (IEEE World Congress on Computational Intelligence). IEEE Congress on, 2008, pp. 2867–2873.

77.

Aghdam

M.H.

Ghasem-Aghaee

and Basiri

M.E.

, Text feature selection using ant colony optimization, Journal Expert Systems with Applications: An International Journal 36 (2009), 6843–6853.

78.

Basiri

M.E.

Ghasem-Aghaee

and Aghdam

M.H.

, Using Ant Colony Optimization-Based Selected Features for Predicting Post-synaptic Activity in Proteins, in: Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2008, pp. 12–23.

79.

Alghamdi

H.S.

Tang

H.L.

and Alshomrani

, Hybrid ACO and TOFA Feature Selection Approach for Text Classification, in: WCCI 2012 IEEE World Congress on Computational Intelligence, 2012, pp. 1–6.

80.

Uğuz

, A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm, Knowledge-Based Systems 24 (2011), 1024–1032. doi: 10.1016/j.knosys.2011.04.014.

81.

Zhao

and Wang

, An Improved Genetic Algorithm For Text Feature Selection, in: International Conference on Intelligent Computing and Cognitive Informatics, 2010, pp. 7–10.

82.

Chen

Jiang

and Li

, A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm, Mathematical Problems in Engineering 2013 (2013), 1–6.

83.

Zahran

B.M.

and Kanaan

, Text Feature Selection using Particle Swarm Optimization Algorithm, World Applied Sciences Journal 7 (Special Issue on Computer & IT), 2009, 69–74.

84.

Jin

Xiong

and Wang

, Feature Selection for Chinese Text Categorization Based on Improved Particle Swarm Optimization, in: Natural Language Processing and Knowledge Engineering (NLP-KE), 2010, pp. 1–6.

85.

Chantar

H.K.

and Corne

D.W.

, Feature Subset Selection for Arabic Document Categorization using BPSO-KNN, in: Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress, 2011, pp. 546–551.

86.

Tabassum

and Mathew

, A genetic algorithm analysis towards optimization solutions, International Journal of Digital Information and Wireless Communications (IJDIWC) 4 (2014), 124–142. http://sdiwc.net/digital-library/a-genetic-algorithm-analysis-towards-optimization-solutions.html.

87.

Selvi

and Umarani

, Comparative analysis of ant colony and particle swarm optimization techniques, International Journal of Computer Applications 5 (2010), 1–6.

88.

Basiri

M.E.

and Nemati

, A Novel Hybrid ACO-GA Algorithm for Text Feature Selection, 2009 IEEE Congress on Evolutionary Computation, 2009. doi: 10.1109/CEC.2009.4983263.

89.

Jensen

and Shen

, Finding rough set reducts with ant colony optimization, Proceedings of the 2003 UK Workshop on 1 (2003), 15–22. http://users.aber.ac.uk/rkj/pubs/papers/antRoughSets.pdf.

90.

Jensen

and Shen

, Webpage Classification with ACO-Enhanced Fuzzy-Rough Feature Selection, in: In the Proceedings of the 5th International Conference on Rough Sets and Current Trends in Computing (RSCTC 2006), 2006, pp. 147–156.

91.

Melville

Gryc

and Lawrence

R.D.

, Sentiment analysis of blogs by combining lexical knowledge with text classification, in: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD ’09, Vol. 23, 2009, p. 1275. doi: 10.1145/1557019.1557156.

92.

Zong

and Xia

, Ensemble of feature sets and classification algorithms for sentiment classification, Information Sciences 181 (2011), 1138–1152. doi: 10.1016/j.ins.2010.11.023.

93.

Jindal

and Liu

, Analyzing and Detecting Review Spam, in: Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference, 2007, pp. 547–552.

94.

Jindal

and Liu

, Opinion spam and analysis, in: Proceedings of the International Conference on Web Search and Web Data Mining WSDM 08, 2008, p. 219. doi: 10.1145/1341531.1341560.

95.

Lim

E.-P.

Nguyen

V.-A.

Jindal

Liu

and Lauw

H.W.

, Detecting product review spammers using rating behaviors, Review Literature And Arts Of The Americas 5 (2010), 939–948. doi: 10.1145/1871437.1871557.

96.

Seerat

and Azam

, Opinion mining: Issues and challenges (A Survey), International Journal of Computer Applications 49 (2012). doi: 10.5120/7658-0762.

97.

Ding

Liu

and Yu

P.S.

, A Holistic Lexicon-based Approach to Opinion Mining, in: Proceedings of the International Conference on Web Search and Web Data Mining, ACM, 2008, pp. 231–240. http://portal.acm.org/citation.cfm?id=1341531.1341561.

98.

Al-Moslmi

Omar

Abdullah

and Albared

, Approaches to Cross-Domain Sentiment Analysis: A Systematic Literature Review, IEEE Access, 2017. doi: 10.1109/ACCESS.2017.2690342.

99.

Wang

Niu

Song

and Atiquzzaman

, SentiRelated: A cross-domain sentiment classification algorithm for short texts through sentiment related index, Journal of Network and Computer Applications, 2018. doi: 10.1016/j.jnca.2017.11.001.

100.

De Marneffe

M.-C.

and Manning

C.D.

, The Stanford typed dependencies representation, in: Coling 2008: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, Association for Computational Linguistics, 2008, pp. 1–8. http://dl.acm.org/citation.cfm?id=1608858.1608859.

101.

De Marneffe

C.D.

, Marie-catherine, manning, stanford typed dependencies manual, 20090110 Httpnlp Stanford 40 (2010), 1–22. http://nlp.stanford.edu/downloads/dependencies_manual.pdf.

102.

Qadir

, Detecting opinion sentences specific to product features in customer reviews using typed dependency relations, in: Proceeding eETTs ’09 Proceedings of the Workshop on Events in Emerging Text Types, 2009.

103.

de De Marneffe

M.-C.

Connor

Silveira

Bowman

S.R.

Dozat

and Manning

C.D.

, More Constructions, More Genres: Extending Stanford Dependencies, in: Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013), Charles University in Prague, Matfyzpress, Prague, Czech Republic, 2013, pp. 187–196. http://www.aclweb.org/anthology/W13-3721.

104.

Covington

M.a.

, A Fundamental Algorithm for Dependency Parsing, in: Proceedings of the 39th Annual ACM Southeast Conference, 2001, pp. 95–102.

105.

Zhang

Sekhari

Ouzrout

and Bouras

, Jointly identifying opinion mining elements and fuzzy measurement of opinion intensity to analyze product features, Engineering Applications of Artificial Intelligence 47 (2016), 122–139.

106.

Liu

Lai

Chen

and Zhao

, Walk and Learn: A Two-stage Approach for Opinion Words and Opinion Targets Co-extraction, in: Proceedings of the 22Nd International Conference on World Wide Web Companion, 2013.

107.

Lin

Wang

Jin

Liang

and Meng

, A Word Vector and Matrix Factorization Based Method for Opinion Lexicon Extraction, in: Proceedings of the 24th International Conference on World Wide Web, 2015. doi: 10.1145/2740908.2742713.

108.

Wiegand

Balahur

Roth

Klakow

and Montoyo

, A Survey on the Role of Negation in Sentiment Analysis, in: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, 2010, pp. 60–68. http://dl.acm.org/citation.cfm?id=1858959.1858970.

109.

Blanco

and Moldovan

, Some Issues on Detecting Negation from Text, in: Twenty-Fourth International FLAIRS Conference, 2011, pp. 228–233. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS11/paper/viewFile/2629/3031.

110.

Zhai

Liu

and Peifa

, Grouping Product Features Using Semi-Supervised Learning with Soft-Constraints, in: Proceedings of the 23rd International Conference on Computational Linguistics (COLING-2010), 2010.

111.

O’Keefe

and Koprinska

, Feature Selection and Weighting Methods in Sentiment Analysis, in: Proceedings of the Fourteenth Australasian Document Computing Simposium, 2009, pp. 67–81. doi: 10.1.1.184.3559.

112.

Singh

V.K.

Piryani

Uddin

and Waila

, Sentiment Analysis of Movie Reviews: A new Feature-based Heuristic for Aspect-Level Sentiment Classification, in: 2013 International Mutli-Conference on Automation, Computing, Communication, Control and Compressed Sensing (iMac4s), IEEE, 2013, pp. 712–717. doi: 10.1109/iMac4s.2013.6526500.

113.

Nigam

McCallum

A.K.

Thrun

and Mitchell

, Text classification from labeled and unlabeled documents using EM, Machine Learning 39 (2000), 103–134.

114.

Ofek

Rokach

Caragea

and Yen

, The Importance of Pronouns to Sentiment Analysis: Online Cancer Survivor Network Case Study, in: Proceedings of the 24th International Conference on World Wide Web Companion, 2015, pp. 83–84.

115.

Dalal

M.K.

and Zaveri

M.A.

, Opinion mining from online user reviews using fuzzy linguistic hedges, Applied Computational Intelligence and Soft Computing 2014 (2014).

Author and	Technique	Advantage (A)/Limitation (L)
publication
[37]	IG $+$ RST	IG
		A: Determines the importance of a feature in a document.
		L: Requires a threshold value at the initial stage and does not take into account overlapping
		features.
		RST
		A: Could reduce irrelevant features, noise, and takes into account overlapping features.
		L: Difficult to obtain an optimum subset feature and requires more processing time.

A review of feature selection techniques in sentiment analysis

Abstract

Keywords

1. Introduction

3.1 Definition of sentiment analysis

3.2 Components of sentiment analysis

A. Feature extraction

B. Feature selection (FS)

A. Filtering techniques

B. Hybrid techniques

C. Other techniques

i. Rough set theory with filtering technique

ii. Neural network

iii. Metaheuristic

Table 7 FS techniques in text classification (traditional) using the metaheuristic approach

Table 10 Summary of survey on various FS methods

4.1 Definition of a SW

4.2 The importance of SW in SA

4.3 Methods to identify the relationship between feature and SW

A. Nearby adjective

B. Pattern knowledge

Table 11 Patterns of extracted phrases (adopted from [23])

C. Dependency relationship

D. Typed dependency relations

E. Dependency relations

F. Walk and Learn

G. A word vector and matrix factorisation

A. SW has different meanings

B. Implicit SW

C. Different words to describe a feature

D. Implicit feature

E. Pronouns

Footnotes

Acknowledgments

References

Table 7
FS techniques in text classification (traditional) using the metaheuristic approach

Table 10
Summary of survey on various FS methods

Table 11
Patterns of extracted phrases (adopted from [23])