Abstract
In this work, we evaluate the impact of changing the semantic text representation on the performance of the AR-SVS (extended association rules in semantic vector spaces) algorithm on the sentiment polarity classification task on a paper reviews dataset. To do this, we use natural language processing techniques in conjunction with machine learning classifiers. In particular, we report the classification performance using the
Keywords
Introduction
In recent years, research carried out on natural language processing has given rise to numerous applications [50, 53, 23, 43, 37] that have allowed advances in human-computer interaction. For example, current technology allows a person to speak to a mobile device, and for it to perform specific actions, thus understanding instructions at a purely objective levels [20, 4]. However, when it comes to the problem of natural language processing, there is a lot of content in uttered speech acts that is beyond the merely objective [32, 35, 34]. Specifically, communication between people involves delivering information that is not directly encoded in the message in question, which implies a capacity for inference on the part of the receiver (i.e., a machine in the case of computing).
The area that deals with the understanding of the subjective aspects of texts by machines is called sentiment analysis [8, 50]. In particular, sentiment analysis aims to determine the sentiment polarity, opinion, or points of view, which are latent in text sources, which may correspond to tweets [5, 51], product reviews [12, 57], opinions about government entities [60], among others.
In the literature, several algorithms have been proposed to perform sentiment classification based on supervised machine learning, one of these algorithms is AR-SVS, which stands for extended Association Rules in Semantic Vector Spaces [29]. The AR-SVS algorithm makes use of the concept of extended association rules, which come from the classic association rules of data mining [3, 2, 31]; and semantic vector spaces [56, 18, 29, 17], which we will refer to as SVS. SVS represent words in a vector space, in such a way that those that have common semantics are close to each other, forming clusters that can represent concepts in a more general way.
The AR-SVS algorithm proved to be competitive in the sentiment classification task [28, 29] with respect to the state of the art. However, the performance of the algorithm was measured using only one text representation: word2vec. Thus, its performance using different text representations other than word2vec is unknown. Therefore, the evaluation of the algorithm using different text representations contributes to the knowledge of a scientifically validated algorithm in an area of research that has experienced a significant increase in interest in recent years.
If the algorithm is significantly sensitive to its text representation, this would mean that it has a high degree of dependency between its performance and the underlying semantic representation. Thus, future efforts should focus on, for example, making use of the text representation that provides the best performance for the algorithm. On the other hand, if the algorithm does not present significant variations when using different text representations, we may assume that it is a robust algorithm with respect to its underlying semantic representation. Thus, later studies could focus on, for example, finding the representation that makes using this method computationally cheaper. In any case, this work presents an exploratory analysis as well as a replication of the original method and results.
Background and related work
Sentiment analysis
Sentiment analysis, also called opinion mining, is a subfield of natural language processing. Sentiment analysis comprises the tasks of detecting, extracting, and classifying opinions, sentiments, and attitudes concerning different topics, expressed as text [31, 50]. Sentiment analysis contributes to the observation of a population’s attitudes towards political movements, market intelligence, the level of consumer satisfaction with a product or service, box office prediction for feature films, among others [50]. The availability of opinions and evaluations, in general, has seen an increase due to e-commerce and social networks [16, 36]. Currently, e-commerce consumers rely primarily on reviews of products posted by previous consumers to buy, while producers and service providers improve their market propositions by obtaining consumer feedback. For example, analysis of product reviews published on Amazon can influence the decision to purchase these [10, 19, 6].
There are numerous challenges that sentiment analysis must face [50, 8]; for example, the same word can have negative connotations in some contexts, and positive in others [39]; on the other hand, there is a great variety of ways in which people express their opinion, this means that small changes in the messages communicated can cause an important difference in the underlying opinions; this can be seen in the phrases “the movie was good” and “the movie was not good” [22]. Furthermore, the opinions expressed are not purely composed of a particular type of assessment, since they may be composed of sentences that show a positive opinion on the subject, and others a negative one. Altogether, these conditions envision the inherent difficulty in the task of sentiment analysis, since it is challenging even for human beings [33].
More specifically, the polarity classification task [50] seeks to determine the sentiment polarity of a document (e.g., positive or negative sentiment). Different sources of texts have been used by studies to determine the polarity of texts, such as tweets, medical texts, and news articles [13, 44, 41, 40, 7]. In essence, polarity classification deals with establishing a positive or negative sentiment orientation, However, there might be different levels of intensity with which sentiments are expressed in a text. For example, the sentences “I don’t like you” and “I hate you” share the same negative semantics, but the latter should be assigned a higher intensity than the first [9]. Several classification methods have been used to carry out the task, however, support vector machines have presented better performance than other classifiers, including decision trees, Bayesian classifiers, and others [50].
Common approaches to solve the polarity classification problem rely on supervised machine learning algorithms to label documents. In particular, support vector machines and simple Bayesian classifiers are both popular approaches [50]. Other supervised machine learning classification methods used in this task are decision trees, neural networks, and maximum entropy classifiers [47].
Semantic vector spaces
The main idea behind semantic vector spaces [18, 17, 30] is to represent each document in a collection as a point in space, which is equivalent to a vector in vector space. Thus, the points that are close to each other in this space share similar semantics, and those that are far away tend to be semantically different. The term “semantics” refers to the meaning of a word, a phrase or sentence, or any text in human language, in addition to the study of such meaning [56]. Once the semantic vector space has been built, a recurring task is to check how semantically similar are specific documents with respect to a given query, also called a “pseudo-document”.
Vectors are common structures in the study of artificial intelligence and cognitive science, considering this fact, the most basic form of semantic vector spaces consisted of making use of the frequencies of word occurrences in a body of text to obtain information about its semantics [56]. Thus, in its most basic form, the first step to determine the semantic vector space is to scan the corpus and count the occurrence of some object (a word or pair of words) in a certain situation (document, context, or pattern), and store the result in the corresponding entry in the occurrence matrix, such as in the TF-IDF representation [14]. However, recent advances have created context-based semantic representations such as word2vec [38] and GloVe [49], among others.
Association rules
Association rules are an important type of pattern that can be found within data. The goal of association rule mining is to find all the co-occurrence relationships, called associations, between the various objects. The following is the problem of mining association rules initially proposed by [3].
We consider the problem of analyzing a data set made up of transactions. Let
where
We say that transaction
The support
The confidence
Taking support and confidence into consideration, the task of mining association rules in a set of
It is worth mentioning that both
Extended association rules: AR-SVS algorithm
The AR-SVS algorithm (extended Association Rules – Semantic Vector Space) [29, 28], exploits the semantic similarities of words represented in semantic vector spaces to generate extended association rules that are used by classifiers in sentiment polarity classification tasks. One of the motivations of this algorithm is to represent associations between regions of semantic space, and not just between specific words. Figure 1 presents the intuition behind the AR-SVS algorithm. Note that Fig. 1 uses two dimensions for the semantic vector space as an example, but in practice, semantic vector spaces usually have hundreds of dimensions [56, 18, 17].
Association rules in semantic vector spaces (based in Fig. 1 from [29]). The left side shows how basic association rules would be represented in a semantic vector space as simple point (
Although an association rule of this type can have multiple words in both its LHS and RHS, in the case of using association rules for classification, the RHS only has the class label, so the construction of the semantically similar regions is carried out considering the terms of the LHS. To determine the similarity of the terms, we use cosine similarity once the vectors have been normalized. This method allows capturing semantic associations in the text as well as allowing inferences to be made about what each document really means. Intuitively, the neighboring terms of terms that indicate a positive (or negative) polarity in the semantic vector space, have a similar orientation to these.
[h] : AR-SVS
The Algorithm 1 shows the AR-SVS pseudocode. The functions
It should be noted that the extended association rules generated by the AR-SVS algorithm are used to determine the semantic orientation of a document through a classifier; For this, it must be provided with a representation of each document, which is generated by a scoring function called RBS (Rule Based Scoring), which assigns a score for the class of positive polarity, and another for the negative, in each document.
In summary, once the association rules are generated using the Apriori algorithm, they are extended using the AR-SVS algorithm. Then, using the extended association rules, a useful representation for a classifier is generated for each document, through the RBS function. The pseudocode for this function is presented in Algorithm 1.
[h] : RBS (Rule Based Scoring)
Problem definition and objectives
Having provided an overview of the related work and preliminary concepts, we now state the problem and objective of this work. The general objective of this work is to evaluate the performance of the AR-SVS algorithm [29] in the task of sentiment polarity classification. In particular, we focus on the domain of scientific paper reviews [26, 25, 1, 27], as our work is an extension of the original works in which the AR-SVS algorithm was presented [28, 29] we also use the same dataset for comparison purposes.
The core idea of the AR-SVS method is that the itemsets of an association rule can be modified by replacing elements with other similar elements from the underlying vector space without changing the validity of the rule, rather than simply applying set operations to extend the itemsets. Our work seeks to provide an in-depth evaluation of the existing AR-SVS model with respect to the SVS component of this algorithm. A similar approach has been explored by Ozaki [45], who also uses vectorial representations to extend association rules. However, being a relatively new algorithm and concept, there are still multiple aspects of extended association rules based on vectorial representations that must be explored, such as how influential the underlying vector representations are to the final performance of the methods. Thus, we sought to evaluate the effects of the choice of semantic representation of text on classification performance.
The work of [29] presented the AR-SVS algorithm, which proposed the extension of association rules for the task of sentiment analysis, and more precisely, sentiment classification. The extension of the association rules is conducted through the construction of regions in a vector space containing word representations as points, using some vector similarity measure for the determination of the neighborhoods, and a natural number that establishes the number of close vectors to be considered as part of the same region. Once the extended association rules are obtained, they are used to generate a score vector for each document, using a separate scoring algorithm. Subsequently, these score vectors and their respective class labels are fed to supervised machine learning classifiers, such as Naïve Bayes [52] or Support Vector Machines [24], to perform sentiment classification over unseen documents.
As the classification procedure relies on the word representations to determine the regions that will generate the extended association rules, the technique to generate these word representations (i.e., word embeddings) is crucial. The works of [29], generated the word embeddings using word2vec [38], and the impact on the performance of the classification task caused by the use of that specific representation was unknown. In this line, the purpose of our work is to determine how the adoption of a specific model to generate word embeddings (i.e. how the word vectors are distributed in the vector space) affects the global performance of the classification task. It is known that word embeddings capture both semantic and syntactic properties of words within a document [11], so the distribution of words in the semantic vector space changes according to the way a model sets the vectors, capturing these patterns in a different fashion for each model.
To carry out this analysis, we selected three different models to generate word embeddings, in addition to the original word2vec, from a state-of-the-art study in this subject [17] and performed the classification task, reporting the classification metrics. The AR-SVS algorithm and the word embeddings used in the experiments are detailed in future sections.
Paper reviews data set
Various data sets have been used in the task of classifying sentiments, such as tweets [46], movie reviews [47], and consumer feedback [15].
In carrying out this work, we perform a sentiment classification task using the data set presented in [25], which consists of peer reviews of scientific articles, mostly in Spanish, together with the latent appreciation they possess, which can range from very negative to very positive. In particular, a data set consisting of reviews of scientific articles was used. The data set consists of a total of 405 reviews, most of them written in Spanish. 17 reviews were excluded for being written in English, and 6 for being empty, leaving a total of 382. It should be noted that this data set is identical to the one used in the original article by AR-SVS [28].
Each review has the attributes Evaluation and Orientation. Evaluation refers to the actual evaluation of a reviewer against a scientific article. Orientation represents the subjective perception of each review by the authors of the original work; that is, how negative or positive the review is perceived when it is read by someone else. Both attributes are measured on an integer scale ranging from
Highlighted articles from the systematic literature review of [17]. We use these articles to select the semantic vector space representations used in our experiments
Highlighted articles from the systematic literature review of [17]. We use these articles to select the semantic vector space representations used in our experiments
Based on the systematic literature review done by [17], Table 1 shows the Word Embedding representations to use in our evaluation. In general, dense vector representations of words have been widely adopted with satisfactory results in general natural language processing tasks and other domains with good results. In particular, word embeddings based on neural models have been reported as superior to those generated by matrix factorization [17]. In particular, multiple embedding methods are variants of a specific group of neural models: word2vec. Studies have been conducted on the impact of including other techniques together with dense vector representations to perform natural language processing tasks, reporting good results in general. Despite the good performance of dense vector representations, they also have some drawbacks, such as the lack of interpretability of the real values that make up the embedded vectors. For the purposes of this particular work, the text representations proposed in [49] – GloVe, in [21] – FastText, and in [42] – LDA2vec have been selected.
These representations were selected due to the advantages that are detailed in their respective articles, such as: for FastText, we have the most efficient training out of the reviewed semantic representations; for GloVe we have the usage and inclusion of global word information a text; for LDA2vec we have the inclusion of additional information beyond the occurrence of words when generating the representations.
Parametrization
Text representation
The parameterizations of the text representations were made as follows.
For the generation of base association rules and their extension in the semantic vector space, the following parameters were used:
Generally, sentiment rating performance is evaluated using four metrics: accuracy, precision, recall, and
Accuracy is the ratio of all true predicted instances to all predicted instances. An accuracy of 100% indicates that the predicted instances are the same as the actual instances. Precision is the ratio of true positives to all positively predicted instances. Recall is the ratio of predicted true positives to all true positive instances. The
Classifiers and evaluation
Both Naive Bayes and Support Vector Machines, were parameterized with the values determined in the original work of AR-SVS [28], in order to replicate the baseline classification results as closely as possible.
For each of the four text representations (the original word2vec, GloVe [49], LDA2vec [42], and FastText [21]), we performed ten experimental runs, which consisted of dividing the data set using the holdout technique at a ratio of 70:30 for the training set and test set, respectively. Then two classifiers (SVM and NB) were trained, using the training set, to later report the metrics
The final report consists of the average of the averages mentioned above, in addition to their standard deviation, for each possible combination between classifier type, metric, text representation, and set of association rules used in the classification, whether they are the basal rules. (RBS-B), extended (RBS-X), or the union between both (RBS-BX).
We executed ten experiments to take advantage of the variability of the semantic vector spaces generated by representations that would allow their viable reconstruction in each execution; despite this, the analysis of the results suggests that the variability in the learning of the semantic vector space does not significantly affect the performance of the AR-SVS method in the sentiment polarity classification task.
Results and discussion
In this section, the final results of the work and their respective discussion are presented.
Quantitative results
Table 2 shows the performance metrics of the NB classifier under the different configurations of the AR-SVS algorithm. In general, the classification results align with previous research with this data set [26] and with previous evaluations of the AR-SVS approach [28, 29]. We also note that due to the complexities of the paper reviews data set that we used, the reported accuracy values are not as high as in other sentiment classification tasks which are usually simpler [26].
Regarding accuracy, both FastText and GloVe report a significant drop in performance using RBS-X, while LDA2vec does not show considerable differences within the same category of rules. The values of accuracy when applying RBS-BX are maintained with differences that are within the standard deviations of the values (i.e., marginal differences), except in the case of FastText, where there is a decrease that escapes 0.01 range of the standard deviation.
As with accuracy, the results with
Table showing the results obtained with the NB classifier. The first three rows show accuracy results for the set of base rules, the set of extended rules, and the union of both sets. The last three rows show the results in terms of the
metric for each set of rules
Table showing the results obtained with the NB classifier. The first three rows show accuracy results for the set of base rules, the set of extended rules, and the union of both sets. The last three rows show the results in terms of the
Table 3 shows the performance metrics obtained with the SVM classifier. Regarding accuracy, a stable classification performance is observed, with only marginal differences in RBS-X and RBS-BX using FastText, GloVe, and LDA2vec.
The reported values of
Table showing the results obtained with the SVM classifier. The first three rows show accuracy results for the set of base rules, the set of extended rules, and the union of both sets. The last three rows show the results in terms of the
We now discuss the main results, considering that the purpose of this work is to evaluate the performance of the classifiers using the text representations other than word2vec and the association rules in the semantic vector space.
In particular, Support Vector Machines maintain an accuracy superior to Naïve Bayes and closer to the original baseline with word2vec. When looking at the case of
It should be noted that all the metrics reported for the RBS-B set, both for Naïve Bayes and for Support Vector Machines, are generated by performing classification on a set of association rules produced by the Apriori algorithm, without the use of the AR-SVS algorithm, and therefore without the construction of a semantic vector space. The differences in these values are due to the variability of the association rules produced by the Apriori algorithm due to the different training sets used.
In all the cases of the text representations, the trend in the values of the original work is maintained according to the different association rules [28], where there is a general drop in the metrics compared to the RBS-X set, with an increase when using the RSB-BX set, but not exceeding the RBS-B values.
Sample of extended association rules obtained with word2vec. The text in these rules was obtained by doing literal translations of the original text in Spanish
Sample of extended association rules obtained with FastText. The text in these rules was obtained by doing literal translations of the original text in Spanish
Along with the quantitative results and analyzes, qualitative results and analyzes are also presented. Table 5, Table 6, Table 7, and Table 4, exhibit examples of extended association rules using semantic vector spaces generated by each text representation, in addition to the association rule that originates them. Figure 2 illustrates a directed graph with the relationships found between the terms of the sample association rules for the text representations, for compatibility reasons the accents have been omitted. Each node of the graph corresponds to a term present in the antecedent of some association rule, while the arcs indicate the text representation whose semantic vector space generated the relationship between the terms. The arc labels were set as FT for FastText, GV for GloVe, L2V for LDA2vec, and W2V for word2vec.
Note that the effectiveness of the general AR-SVS model was evaluated in [28, 29] with the same data set. Thus, we do not seek to evaluate the effectiveness of the model, but rather focus on the qualitative differences that changing the semantic vector space produces on the extended rules.
Next, an analysis and comparison between the extended association rules and their root, according to text representation, is presented. The terms were rebuilt from their roots. In general, domain knowledge is required to properly analyze and understand the relationships described by the graph of Fig. 2. Furthermore, domain knowledge could be used to interpret the rules obtained by the different text representations with the AR-SVS algorithm.
Sample of extended association rules obtained with GloVe. The text in these rules was obtained by doing literal translations of the original text in Spanish
Sample of extended association rules obtained with GloVe. The text in these rules was obtained by doing literal translations of the original text in Spanish
Sample of extended association rules obtained with LDA2vec. The text in these rules was obtained by doing literal translations of the original text in Spanish
Graph showing the relationships between words obtained from the association rules with different semantic representations. A directed edge between node 
In the case of the WV1 rule, a negative class label is set to the term “Development”, with association rules extended to the terms “Lack” and “Model”. This may be due to the semantic characteristics that word2vec can extract, since “Development” and “Lack”, due to their semantics and the possible contexts of the documents, could indicate a sustained tendency of reviewers to target development of the articles as deficient, while the term “Model” can also be considered within the development of a work. Comparing WV1 with LV2, the latter determines the term “Development” with a positive class label, extending this rule to the term “Work”. The difference in polarity between the WV1 and LV2 rules could be due to the consideration of additional information by the LDA2vec model, which also includes information about the documents at a global level through the LDA component of the model.
The FT1 rule, generated with FastText, sets a negative class label for the term “Clear”, with association rules extended to “Clarity” and “To Clarify”. This can be interpreted in the same sense as WV1, where the adjective “Clear” and its variants are used to indicate a lack of clarity of the articles to make themselves understood to the readers. The WV2 rule has the same negative label and extends the relationship to the term “To Evaluate”, which could be for the same reasons discussed in this paragraph. On the other hand, the LV1 rule has the term “Clear” with a positive polarity, extending to the term “To Use”, similar to the case of LV2, this can be attributed to the consideration of the totality of documents to look for relationships between individual terms. The term “To Use” has a positive class label, and is extended to “Tools” with the GV1 rule. The interpretation of this rule is more evident, since a good use of tools is associated with a positive aspect.
The LV3 rule has a positive class label, and relates the term “Apply” by extending it to “Result”. This can be explained due to the positive appreciation of the reviewers for articles that may show an important application of the results of the same. The term “Result” is extended to “Understanding” and “Resolved” with a positive class label via the FT2 rule; on the other hand, it is also extended to “Article” by the WV4 rule with a positive polarity label. The polarity agreement between all the rules that include the term “Result” could be due to the fact that the text representations, despite being generated differently, rescue the same semantic values when the uses of certain terms appear in specific contexts, which is supported by the validity hypotheses of a semantic vector space.
The FT3 rule has a negative class label and extends the term “Problem” to the terms “Problematic” (as a noun) and “To Deepen”, which could evidence recurring indications by reviewers about a lack of depth in the problems or justification of the articles. The GV3 rule extends the term “Problem” to the term “Normal”, keeping the negative label; this can be explained by the recurrent appearance of problems in scientific articles, which implies that it is normal for problems to arise.
The WV3 rule relates the term “Study” by extending it to the terms “To Present”, and “Conclusion” with a negative class label. Keeping the same polarity, the GV4 rule extends “Study” to “Case”. For both WV3 and GV4, the polarity of the rules can be justified with the term “Study”, with the lack of rigor of the studies conducted in some articles, as well as some incomplete or irrelevant case studies for the central themes of each article, all this according to the judgment of the reviewers. The term “Conclusion” is extended by all the new representations used in conjunction with the AR-SVS algorithm; the FT4 rule extends “Conclusion” to “Confusion” and “Inclusion”, the GV2 rule to “Irrelevant”, and the LV4 rule to “Example”, all with a negative class label. This could be explained by the fact that, generally, the observations concerning the conclusions are made most of the time to point out their defects. It should be mentioned that this is a pattern found for all text representations.
The main purpose of this work was to carry out an exploratory analysis on the sentiment polarity classification performance of association rules produced by the AR-SVS algorithm with different semantic vector space, generated by different text representations (word2vec, FastText, LDA2Vec, and GloVe).
In terms of quantitative results, the use of the alternative text representations LDA2vec, FastText, and GloVe does not provide significant improvements in any of the relevant classification metrics compared to the baseline provided by word2vec. In qualitative terms, the semantic vector spaces created by the different text representations are useful to generate extended association rules that share close semantic relationships between the terms that involve them. These extended association rules are even similar between different text representations, which is consistent with the fact that the quantitative results do not show significant differences between them.
Finally, regarding limitations of this work and potential future work, we highlight hyperparameter optimization and the use of a different rule mining algorithm. Further hyperparameter optimization would be possible for the classification algorithms. Moreover, the hyperparameter tuning task could also be extended to the embedding models themselves. Furthermore, the generation of the non-extended association rules was carried out using the Apriori algorithm. The use of a different rule mining algorithm to generate the base association rules and its impact on the subsequent generation of extended association rules is proposed as another potential line of research.
Footnotes
Acknowledgments
This work was partially funded by the National Agency for Research and Development (ANID)/ Scholarship Program/DOCTORADO EXTRANJERO BECAS CHILE/2019 – 72200105.
