Abstract
The extraction of the relevant and debated opinions from online social media and commercial websites is an emerging task in the opinion mining research field. Its growing relevance is mainly due to the impact of exploiting such techniques in different application domains from social science analysis to personal advertising. In this paper, we present SMACk, our opinion summary system built on top of an argumentation framework with the aim to exchange, communicate and resolve possibly conflicting viewpoints. SMACk allows the user to extract debated opinions from a set of documents containing user-generated content from online commercial websites, and to automatically identify the mostly debated positive aspects of the issue of the debate, as well as the mostly debated negative ones. The key advantage of such a framework is the combination of different methods, i.e., formal argumentation theory and natural language processing, to support users in making more informed decisions, e.g., in the context of online purchases.
Introduction
Argumentation theory is a reasoning model based on the construction and evaluation of information pieces called arguments. Arguments are supposed to support, contradict, and explain statements, and they are used to support decision making [57]. Argumentation theory involves different ways for analyzing arguments and their relationships. A famous framework is the one called abstract argumentation proposed by Dung [22], which views each argument as an abstract entity and in which arguments are related to each other by means of attack relations. What distinguishes argumentation-based discussions from other approaches is that proposals can be supported by the arguments that justify, or oppose, them. This permits a greater flexibility than in other decision-making and communication schemes since, for instance, it makes it possible to persuade the other actors involved in the discussion to change their view of (or opinion about) a claim by identifying information or knowledge that is not being considered, or by introducing a new relevant factor in the middle of a negotiation or to resolve an impasse. Indeed, in the process of making a decision (or in a debate), we have to consider the opinion of all the people involved in the process. Such opinions are subjective statements that represent people’s sentiments, emotions, perceptions, and mood about a particular object or subject.
Opinion mining [38] aims at analyzing texts about a given subject or topic, written in a natural language, to classify them with respect to the opinion they are supposed to convey or more specifically, with respect to the polarity (negative, positive or neutral) or emotions of the authors of such texts. Such natural language documents can be analyzed at three different levels: (i) at a sentence level – in this case we are interested in extracting the opinion or sentiment behind one single sentence; (ii) at a document level – in this case we are interested in the overall opinion or sentiment behind the entire document, and (iii) at an aspect level – in this case we are interested in a more fine-grained opinion or sentiment associated to the most relevant aspects or features in the document.
Concerning the particular case of sentiment analysis [38], the analysis at document and sentence levels allows us to extract a global feeling behind the document or the sentence. However, as pointed out by Chinsha and Joseph [11] for example, the authors of reviews often convey more than one single feeling in a single review by describing their sentiment about different aspects of the same subject. Let us consider the following text from a restaurant review: “Good food but the restaurant’s location is too far from the city center”. In this example, we can distinguish two aspects, i.e., “food” and “location”, with a positive sentiment with respect to the food, and a negative sentiment with respect to the location of the restaurant. If we contextualize this sentence in a debate, the final overall sentiment of the author about the restaurant will depend on how she/he will be convinced by others about this subject, i.e., it will depend on the strength of the arguments for and against the two considered aspects. However, a single system combining all these components is currently lacking.
More precisely, a framework aiming at combining aspect-based opinion mining methods to deal with the extraction of the aspects and their polarities in natural language documents together with formal argumentation theory to deal with the efficient computation of the arguments’ strengths is needed. The advantage of this combined framework is twofold. On the one hand, we want to build an argumentation graph by extracting, from opinion content, triples structured as
In this paper, we present SMACk 1.0, our argumentation-based opinion mining framework which is focused on the analysis of online user-generated content. More precisely, such a framework is based on the use of abstract argumentation theory [22], and it supports the detection and extraction of relevant opinions from a set of textual documents.
The paper is organized as follows: Section 2 discusses the related work and compares it to the proposed approach; in Section 3 some basics about abstract argumentation theory and fuzzy set theory are provided together with the description of the linguistic resources we used; Section 4 introduces the combined SMACk framework; Section 5 presents the fuzzy labeling algorithm for abstract argumentation introduced in [15] and its empirical evaluation, and in Section 6 we detail the approach implemented for extracting aspects from text. Then, Section 7 presents the user interface we implemented for supporting researchers and experts in data analysis. Finally, the framework is evaluated in Section 8. Conclusions end the paper.
Related work
The proposed contribution embraces two research fields, argumentation theory and aspect-based opinion mining. To the best of our knowledge, it is the first time that these disciplines are combined for improving the effectiveness of a real-world sentiment analysis system. In this section, we give the context of each research field by highlighting the most relevant work which have inspired this contribution.
Aspect-based opinion mining. Mining people’s opinions may be included in the class of problems whose solution requires the processing of textual information. However, while techniques like information retrieval, for example, aim at processing facts or objective statements in order to extract useful information, opinion mining aims at extracting the views, sentiments, emotions, etc., from people’s judgements.
Generally, given a text, the goal of opinion mining techniques is to infer the overall polarity of such a text by summarizing user’s opinion [6,17,23,49,58]. Kumar and Ravi [34] survey published literature from 2002 to 2015 in which different approaches to opinion mining and sentiment analysis are described. In particular, we would like to stress the relevance with respect to what we are proposing here of the ones concerning the aspect-based sentence segmentation model and the ones concerning the aspect-based review summarization model.
Recently, the need of performing a more fine-grained analysis of the text emerged. Indeed, when we express our views about a restaurant, a movie, or the like, the reviews are not always global but express different sentiments about the different aspects we deem interesting enough to influence our global sentiment. This task, known as “aspect-based sentiment analysis” [1,60], aims at detecting each single entity feature mentioned in a document and to infer the polarity of the text with respect to it. The importance of this type of analysis task has grown significantly in the last years. Initiatives dedicated to the comparison of real-world systems have also emerged, such as SemEval [51]. The systems usually presented at SemEval implement supervised approaches trained with sample datasets provided to participants.
Different kinds of supervised and unsupervised learning approaches have been proposed for extracting opinion targets and the opinion or sentiment conveyed by texts: most are based on conditional random fields (CRF) [12,30,44,73]; others are based on hidden Markov models (HMM) [32], sequential rule mining [37], dependency tree kernels [70], graph propagation [20], fuzzy logic [20,21,50] and clustering [63].
Supervised systems suffer from the necessity of exploiting manually labeled training data for building their models. The consequence is the limitation of the system validation in a production environment where such systems have to be applied to several domains. A time-saving way for overcoming this issue is the adoption of unsupervised approaches. Unsupervised approaches are mainly based on topic modeling [5,35,43,45,46,64,69] and syntactic rules designed using dependency relations [55,68,70,72,76]. There are also word frequency-based methods [29,52,61,75], word alignment methods [40], and label propagation methods [74].
Contributions described above are focused on the analysis of single documents and do not support opinion aggregation. If we extend the application of opinion mining techniques to a repository of documents, open challenges are (i) the detection of the most relevant aspects mentioned by users, (ii) how such aspects are considered by them, and (iii) the implementation of a scalable approach for managing a huge amount of documents.
The extraction of relevant information from a text is an activity usually performed by text summarization systems [47]. Text summarization aims to extract key information from a text, which is then presented to users as summaries or, in some cases, as lists of relevant keywords. The extraction task can be done on a single document or on a set thereof [39]. The main advantage of applying text summarization techniques is the possibility to relieve users from reading entire documents containing irrelevant details with respect to the topic of the document and the final goal of users.
While text summarization is the most suitable solution when parts of the documents are not particularly informative from the user perspective, they are not a solution when the goal is to go beyond the extraction of interesting information from texts.
In the scenario proposed in this paper, the adoption of text summarization approaches would not be the most suitable solution for the following reasons. First of all, the considered documents already contain only relevant information. Indeed, generally, when a user writes a review, she does not spend a lot of time and space to contextualize the review, but she provides only the most important aspects she wants to share with the community. Therefore, it is not requested to summarize documents content, but to aggregate the different opinions they express. Second, the desired output of the analysis of user-generated content is not only the extraction of the most relevant aspects, but the detection of the most debated ones about a particular topic. The approach has to be able to extract all the aspects of the given topic and to rank them from several perspectives (user agreement, user disagreement, polarities, etc.).
Two real-world implementations of unsupervised systems are presented by Popescu and Etzioni [53], and Bjorkelund et al. [4]. The former presents OPINE, a review-mining system able to perform aspect-based sentiment analysis and opinion ranking in an unsupervised way using relaxation labeling. The latter exploits aspects extracted from hotel reviews to support user decisions and to suggest effective visualizations of customers’ reviews. This use of visualization is a remarkable common point with our work; however, we go steps beyond (i) by integrating argumentation theory in the aspect extraction task, (ii) by using external knowledge for clustering semantic similar aspects, (iii) by building a knowledge graph representing the connections between users and aspects extracted from a whole reviews’ repository, and (iv) by integrating a labeling algorithm of the knowledge graph in order to detect most relevant information for supporting users’ decision.
Argumentation and opinion mining. There exist only few approaches proposed in the literature coupling argumentation theory and opinion mining. Argumentation is applied to opinion mining by Grosse and colleagues [26]. In this article, the authors extract a particular version of arguments they called “opinions” based on incrementally generated queries. Given a query, they model an opinion supporting it as a set of aggregated tweets along with a prevailing sentiment, which can be attacked by alternative counter-opinions. As a final result, they obtain what they call a conflict tree, rooted in the first original query, in a way that resembles dialectical trees in argumentation. Their final goal is to detect conflicting elements in an opinion tree to avoid potentially inconsistent information.
Argumentation mining [36,48] certainly shares some analogies with opinion mining and sentiment analysis, as highlighted also by Habernal et al. [27], even if the goals of the two differ: the goal of opinion mining is to understand what people think about something, while the aim of argumentation mining is to understand why, which implies looking for causes and reasons rather than just for opinions and sentiments. In the context of argument mining techniques, opinionated claim analysis has been addressed by Rosenthal and McKeown [59]. The authors released two datasets of 2,000 sentences each, with the purpose of extracting so-called opinionated claims. These consist of 285 LiveJournal blogposts and 51 Wikipedia discussion forums, respectively. They address a problem that is closer to sentiment analysis, and their aim is to detect assertions containing some belief, of the truth of which a user is attempting to convince an audience. Argumentative opinion analysis has been addressed by Saint-Dizier and colleagues [67], with the TextCoop platform, which basically constructs arguments from opinions and supportive elements such as illustrations and evaluative expressions, by using a set of handcrafted rules that explicitly describe rhetorical structures.
Both the goals and the adopted methodologies of these contributions are different from the one we present in this paper. Actually, the main difference is that we do not exploit opinion mining for argument mining tasks, e.g., argument or relation extraction, but we use abstract argumentation theory together with the fuzzy labeling algorithm, to organize the information extracted and classified through opinion mining methods, and support users in decision making.
Preliminaries
In this section, we provide some insights about abstract argumentation theory and fuzzy sets.
Abstract argumentation theory
We provide the basics of Dung-like abstract argumentation theory [22].
(Abstract argumentation framework).
An abstract argumentation framework is a pair
Dung [22] presents several acceptability semantics which produce zero, one, or several sets of accepted arguments. These semantics are grounded on two main concepts, called conflict-freeness and defence.
(Conflict-free, Defence).
Let
(Acceptability semantics).
Let C be a conflict-free set of arguments, and let C is admissible if and only if C is a complete extension if and only if C is a grounded extension if and only if it is the smallest (w.r.t. set inclusion) complete extension. C is a preferred extension if and only if it is a maximal (w.r.t. set inclusion) complete extension. C is a stable extension if and only if it is a preferred extension that attacks all arguments in
The concepts of Dung’s semantics are originally stated in terms of sets of arguments. We can also express these concepts using argument labeling [8,31,66]. In a reinstatement labeling [8], an argument is labeled “in” if all its attackers are labeled “out” and it is labeled “out” if it has at least an attacker which is labeled “in”. Let Let grounded, iff preferred, iff stable, iff (AF-labeling [8]).
(Reinstatement labeling [8]).
A reinstatement labeling is called illegal if the above conditions do not hold. (Complete, grounded, preferred and stable labeling).
Fuzzy sets
Fuzzy sets [71] are a generalization of classical (crisp) sets obtained by replacing the characteristic function of a set A,
A fuzzy set is completely defined by its membership function. Therefore, it is useful to define a few terms describing various features of this function. Given a fuzzy set A, its core is the (conventional) set of all elements x such that
The usual set-theoretic operations of union, intersection, and complement can be defined as a generalization of their counterparts on classical sets by introducing two families of operators, called triangular norms and triangular co-norms. In practice, it is usual to employ the min norm for intersection and the max co-norm for union. Given two fuzzy sets A and B, and an element x,
Sentiment lexicons
Sentiment Lexicons are used for associating each term with a polarity value. Terms having such an association are called “opinion words” and they are used for estimating the polarity of a given sentence. Associating a polarity value to a specific word is a task that has been addressed by different perspectives. The results have been the availability of different resources that can be easily integrated within real-world systems. In SMACk, we decided to aggregate polarity values coming from three resources freely available: SenticNet [7], the General Inquirer vocabulary2
[62], and the MPQA dictionary3 [16].SenticNet is a publicly available resource for opinion mining that exploits both artificial intelligence and semantic Web techniques to infer the polarities associated with common-sense concepts and to represent them in a semantic-aware format. In particular, SenticNet uses dimensionality reduction to calculate the affective valence of a set of Open Mind4
concepts and it represents them in a machine accessible and processable format. The development of SenticNet was inspired by SentiWordNet [2], a lexical resource in which each WordNet synset is associated to three numerical scores describing how objective, positive, and negative the terms contained in each synset are. The differences between SenticNet and SentiWordNet are basically three: (i) in SentiWordNet, each synset is associated to a three-valued representation (the objectivity of the synset, its positiveness, and its negativeness), while in SenticNet there is only one value belonging to theThe General Inquirer is an English-language dictionary containing almost 12,000 elements associated with their polarities in different contexts. Such dictionary is the result of the integration between the “Harvard” and the “Lasswell” general-purpose dictionaries as well as a dictionary of categories defined by the dictionary creators. When necessary, for ambiguous words, specific polarity for each sense is specified. For every word, a set of tags is provided in the dictionary. Only a subset of them are relevant to the opinion mining topic and, thus, exploited in this work:
Valence categories: the two well-known “positive” and “negative” classifications. Semantic dimensions: these tags reflect semantic differential findings regarding basic language universals. These dimensions are: “hostile”, “strong”, “power”, “weak”, “submit”, “active”, and “passive”. A word may be tagged with more than one dimension, if appropriate. Words of pleasure: these tags are usually also classified positive or negative, with virtue indicating strength and vice indicating weakness. They provide more focus than the categories in the previous two bullets. Such categories are “pleasure”, “pain”, “feel”, “arousal”, “emotion”, “virtue”, “vice”. Words reflecting presence or lack of emotional expressiveness: these tags indicate the presence of overstatement and understatement; trivially, such tags are “overstated” and “understated”.
Other categories indicating ascriptive social tags rather than references to places have been considered out of the scope of the opinion mining topic and have not been considered in the implementation of the approach.
Finally, MPQA is a sentiment lexicon built for Multi-Perspective Question Answering purposes. The lexicon contains around 8,222 terms annotated with their polarity (“positive”, “negative”, and “neutral”) and with their intensity level (“strong” and “weak”) and a set of 10,000 sentences manually annotated through the proposed annotation scheme. Indeed, besides the classic association
The lists of terms contained in the resources presented above do not overlap completely. The strategy implemented within the SMACk platform considers words with a non-zero polarity value in at least one of the integrated resources. For example, the word “third” is not present neither in MPQA nor in SenticNet and has a polarity of 0 according to the General Inquirer. Consequently, it is not a valid opinion word. On the other hand, the word “huge” has a positive value of 0.069 in SenticNet, a negative value of
WordNet5
The preliminary textual analysis, consisting in converting the raw input text in an annotated and structured representation, is performed through the Stanford Core Natural Language Processing Library. Stanford CoreNLP is an integrated framework providing a wide range of natural language analysis tools. Each functionality is provided by a specific module. Below, we show the four modules of the CoreNLP library adopted within the SMACk system.
The Pos Tagger (Part Of Speech Tagger) is a software module aiming to assign a part of speech tag (such as noun, verb, adjective, etc.) to every word of a given sentence [33]. The Coref Annotator (Co-reference resolution Annotator) generates co-reference Chain Annotations representing groups of words referring to the same entity [13]. Chains are used to resolve pronoun references. The Parse Annotator (Parser Annotator) [28] provides full syntactic analysis generating a tree grammar dependencies structure. Finally, the Depparse Annotator (Dependency Parser Annotator) [10] provides a representation of grammatical relations between words in a sentence producing graphs.
SMACk framework
The SMACk framework is composed by three main elements:
The Argument module (presented in Section 5) is in charge of detecting and extracting relevant sentences from raw natural language text. As “relevant sentences”, we mean sentences containing mentions of entity aspects or opinion words. This module also has to extract the timeline with which aspects are mentioned. This information is very important for building the argumentation graph.
The Sentiment module (presented in Section 6) is in charge of inferring the polarity associated with each aspect extracted by the Argument module. Such an inference task is performed by taking into account the timeline detected by the Argumentation module. This way, it is possible to continuously update the polarity of each aspect within the discourse.
The Visualization module (presented in Section 7) is in charge of showing in real-time (i) how the extracted aspects are connected to each other, (ii) how much such aspects are interesting from the users’ community perspective, and (iii) how the polarity of each aspect changes during time.
In the remainder of the paper, we describe how these modules have been integrated within the platform, exploiting a real-world use case to clarify the goal and the results of SMACk. In particular, we consider a set of product reviews belonging to one of the categories used within the Amazon website. We analyze a set of 50,000 reviews extracted from the Dranziera dataset [19].6

The information workflow implemented within the SMACk system.
Figure 1 provides a graphical representation of the workflow executed for the analysis of each document set. As document set, we mean a time-ordered set of opinions expressed by users about the same domain, where, for each domain, many aspects can be mentioned and not all aspects have to be mentioned in all documents. The adoption of the time-ordered constraint is mandatory for detecting the attacks and the supports with respect to a specific aspect. Such information is used for building the argumentation graph.
Construction of the argumentation graph. The first task is the construction of the argumentation graph given a set of user-generated contents about the same kind of product (for instance, “laptops”). For each text, the set of aspects and the qualities associated with them are extracted and stored under the form of structured information into knowledge nodes. Each node contains the following information:
the average polarity expressed by users when they mention the specific aspect;
the number of mentions of the aspect in the whole document set;
the number of attacks to the opinion expressed with respect to a specific aspect; and,
the number of supports to the opinion expressed with respect to a specific aspect.
All such information is exploited also by the Visualization Module (described in Section 7) for properly showing the graph summarization to users.
The result of this task is an argumentation graph where we have a node for each aspect, an edge connecting two or more aspects if they are mentioned in the same context, and a set of attack/support edges built based on the discussions extracted from the document set.
Extracting most interesting aspects. After the construction of the argumentation graph, the second task is the application of the labeling algorithm to compute the most interesting aspects emerging from the argumentation graph we constructed. In particular, the integration of the fuzzy labeling algorithm helps in the identification of the most interesting aspects from the user perspective.
This kind of analysis enables the extraction of relevant information that can be used for different purposes. For example:
detecting which are the weaker aspects of a product (or a service) in order to drive future actions on possible product (or service) improvements;
the exploitation of weak and strong aspects may inspire the development of personalized advertisement tools where information about the user viewpoints may help in making advertisement campaigns more effective;
supporting users in making more informative decisions without the need to dig in the huge amount of (possibly technical) reviews about a product.
Besides the case study provided in this paper, the SMACk framework can be applied to several contexts with different levels of complexity. The main examples are in the social science domain, where a huge amount of texts needs to be analyzed for detecting the mood of people with respect to different debated topics, or the analysis of online user-generated content about products or services.
In this section, we recall the main features of the fuzzy labeling algorithm for abstract argumentation [15]. For a complete description of the algorithm and its convergence theorem as well as the comparison with the related approaches, we refer the reader to [15]. Moreover, we report about the implementation we developed to test the performance of the algorithm [14].
In order to account for the fact that arguments may originate from sources that are trusted only to a certain degree, the (crisp) abstract argumentation structure described in Section 3 may be extended by allowing gradual membership of arguments in the set of arguments
Here, we suppose that the agent is optimistic. To represent a pessimistic behaviour, we should use the min operator, for example.
Let
Such an α may also be regarded as (the membership function of) the fuzzy set of acceptable arguments:
(Fuzzy Reinstatement Labeling).
Let α be a fuzzy AF-labeling. We say that α is a fuzzy reinstatement labeling iff, for all arguments A,
The above definition combines two intuitive postulates of fuzzy labeling: (1) the acceptability of an argument should not be greater than the degree to which the arguments attacking it are unacceptable and (2) an argument cannot be more acceptable than the degree to which its sources are trusted:
We can verify that the fuzzy reinstatement labeling is a generalization of the crisp reinstatement labeling of Definition 3.1, whose in and out labels are particular cases corresponding, respectively, to
In order to compute, given a fuzzy argumentation framework, its fuzzy reinstatement labeling, we cast this problem as a problem of finding a solution to a system of n non-linear equations, where
We denote by
Let
This defines a sequence
Let
An extensive evaluation of the performance and scalability of the fuzzy labeling algorithm with respect to a benchmark of abstract argumentation frameworks have been carried out in [14], to which we refer the interested reader. We first selected three existing datasets for abstract argumentation tasks used in the literature, namely the datasets created by Bistarelli et al. [3], by Cerutti et al. [9], and by Vallati et al. [65]. Moreover, we generated our own dataset of abstract argumentation frameworks by randomly combining some well known graph patterns in argumentation theory into 20,000 bigger argumentation frameworks. Secondly, we studied the behaviour of the algorithm with respect to the frameworks in the benchmark, to check whether their performance are satisfiable even considering huge and complex networks as those represented in the datasets, e.g., presenting an increasing number of strongly connected components. The experimental results reported in [14] clearly indicate that the fuzzy labeling algorithm scales up nicely in all circumstances, and is thus a viable argumentation reasoning tool.

Example of argumentation graph showing the aspects extracted from four reviews about the topic display.
Let us now consider a real world example, based on the data exploited by the SMACk framework. We have the following 9 reviews published on a commercial website:
User1: The display on this computer is the best I have seen in a very long time.
User5: I hate the display screen and I have done everything I could do the change it.
User7: The display is beyond horrible.
User8: The display is awesome.
User10: The colors of this display are very brilliant. Overall it is an amazing monitor.
User11: This is the last time that I buy a display produced by them.
User14: I do not agree with most of the reviews. This display is very nice by considering also its cheap price.
User15: I like this display very much. I suggest it if you want to upgrade your hardware.
User18: This display is the best choice I’ve done.
From these reviews, we construct the argumentation graph visualized in Fig. 2 where the nodes represent the arguments (i.e., the aspects) and the edges represent attacks between arguments. Each node states whether a certain user expressed a positive or negative opinion about the product concerning a certain aspect, i.e., the display aspect of the computer. We assume that the reviews shown above have been published in the same temporal order. The reader may observe that the review provided by User8 is in contrast also with the one provided by User5, but the attack is not reported in Fig. 2. We must clarify that, in our approach, the argumentation graph is built by assuming that when a UserX provides a review, he/she attacks only the last review containing an opposite opinion with respect to the one provided by UserX. The rational behind this assumption is that our dataset does not contain details about the “topic-reply” relationship between messages. Thus, we decided to adopt this strategy to obviate this lack of knowledge. Part of our future work will focus on building datasets including also this information too.
We initialize each node in the argumentation graph to 1.0, meaning that at the beginning of the computation each user is considered to be trustworthy (as we do not have any prior knowledge about the trustworthiness degree of the users), and so holds for the information she conveys. This starting assumption is then confirmed or revised, depending on the attacks against the proposed arguments (i.e., aspects). Figure 3 shows the transition steps required to get to the final labeling, where

Transition steps of the fuzzy labeling algorithm on the argumentation framework visualized in Fig. 2.
Here, we present the approach implemented for extracting aspects from text. The overall aspect extraction approach relies on the natural language processing (NLP) pipeline shown in the middle layer of Fig. 4.

The natural language processing pipeline implemented within the SMACk platform aiming to extract aspects and compute their polarity from the analyzed textual resources.
As it is shown on the bottom layer, the implemented pipeline exploits the three linguistic resources introduced in Section 3: a stopwords list,8
The used stopwords list is available at http://www.lextek.com/manuals/onix/stopwords1.html.
The pipeline is composed by five phases described as follows:
Compound Noun Extraction. The first step consists in detecting the presence of compound names. This step is supported by the use of the POS-Tagger module provided by the Stanford Core NLP library and by WordNet (both introduced in Section 3). When two consecutive words are tagged as nouns by the POS-Tagger, their composition is searched within the WordNet dictionary. If the compound expression is found, it is tagged as compound name and used as a unique token, otherwise not.
Co-reference Resolution. This step consists in associating pronouns with the related noun (or compound noun). This is necessary for detecting all associations between opinion words and aspects. This operation is completely supported by the Coref Annotator. Refinements of the adopted algorithm are out of scope of this paper and they are part of future work.
Stopwords Removal. Once compound names have been detected and pronouns have been replaced with the right terms, the pipeline removes all stopwords from the text by exploiting the list mentioned above.
Aspect Extraction. This step is the most important one and it consists on detecting the correct aspects contained in the text and the associated opinion words. Details about this step are provided below where we introduce the extraction algorithm adopted for analyzing provided text.
Polarity Computation. Finally, this step is responsible of computing the polarity associated with each aspect extracted during the previous step. The overall polarity of an aspect A is computed by aggregating (as explained below) the single polarities of the opinion words associated with A. Single polarity values are extracted from the sentiment lexicons described in Section 3.
The aspect-extraction approach implemented within the SMACk platform is based on the Grammar-Dependencies-based algorithm described in [24]. Here, we briefly report how the algorithm works. For further details about the adopted strategy, we refer to [24].
The aspect-extraction algorithm analyzes the structure of the grammar dependency graph generated by the CoreNLP library for extracting the connections between aspects and opinions. Each dependency extracted by the CoreNLP library can be expressed by a triple:
The meaning of each element of the triple together with all the possible relation type, can be found in the official Stanford Document available at http://nlp.stanford.edu/software/dependencies_manual.pdf.
Given a dependency node n, the algorithm checks if one of the following rules subsists:
Rule 1: If the relation type is an adjectival modifier (“amod”), a connection between an aspect and an opinion word is created if and only if the governor is an aspect and the dependant has a polarity value in at least one of the sentiment lexicons.
Rule 2: If the relation type is a nominal subject (“nsubj”), a connection between an aspect and an opinion word is created if and only if the governor has a polarity value in at least one of the sentiment lexicons and the dependant is an aspect.
Rule 3: If the relation type is a direct object (“dobj”), a connection between an aspect and an opinion word is created if and only if the governor has a polarity value in at least one of the sentiment lexicons and the dependant is an aspect.
By applying in ordered sequence the three rules mentioned above on the sentence “I enjoyed the screen resolution, it’s amazing for such a cheap laptop”, we obtain the following list of “aspect-opinion” associations:
Afterwards, concerning the polarity computation task, given the associations above our system is able to infer, for both aspects “laptop” and “screen resolution”, a positive polarity.
The polarity computation phase considers polarities of opinion words represented by trapezoidal fuzzy numbers in the
The SMACk platform is equipped with a simple user interface showing the result of documents analysis. Through such an interface, users are able to monitor the evolution of the argumentation graph and of the polarities associated with each extracted aspect. In the demo version presented in [18], we included three document sets related to three entities belonging to as many domains (“laptops”, “restaurant”, and “hotel”). The buttons for activating the demo working on each document set are placed in the right part of the interface as it is possible to observe in Fig. 5.
This interface has been designed for supporting two kinds of users: “Researchers” and “Experts”. The Researcher profile can observe how the argumentation graph is created during the documents analysis process. Figure 5 shows the construction of the argumentation graph during the analysis of the documents stream. Once the algorithm detects a new argument, it creates a colored node (green or red) into the graph containing a label composed by (i) the identifier of the opinion holder, (ii) the aspect mentioned in the review, and (iii) the polarity associated with the mentioned aspect. The polarity value affects the background color of the node: green for positive polarity, and red for the negative one. On the contrary, if the argument already exists, the algorithm evaluates if the content of the current review supports or attacks the existing argument. Based on this, the graph is updated accordingly.

The construction of the argumentation graph during the documents analysis process.
The second profile supported by the tool is the “Expert” one. Experts can use an interface focusing on aspects monitoring in order to have a summary of the information associated with them. Figure 6 shows what happen while the system is running in this modality. In the right side of the graph area, two nodes are created: an “Agreement” node, colored in green, meaning the existence of users agreeing with opinions expressed for the connected aspects; and a “Disagreement” node, colored in red, meaning the existence of disagreements between users about opinions expressed for the connected aspects. The agreement or disagreement with an opinion are not directly connected with the polarity expressed for a particular aspect. For example, in the temporarily situation shown in Fig. 6, four aspects have been detected so far: “food”, “sushi”, “place”, and “space”. For “food” and “space” there are both agreements and disagreements about expressed opinions; while, for the “sushi” and “place” there is an overall agreement about what has been expressed by users.
At the end of the textual analysis, the full graph is re-organized and each node is changed for showing (i) the level of interest of each aspect (i.e. how much users discussed about a specific product feature) and (ii) the overall polarity expressed about it. Figure 7 shows an example of such a result. For each node, we can distinguish two graphical elements supporting the analysis of the results. The first one is the border thickness of each node. The thicker the border, the more the aspect has been evaluated as interesting by the system. The second one is the background color of each node. We used a 11-gradient color scale for showing the overall polarity about each aspect. Such a color scale goes from green (positive polarity) to red (negative polarity).

The user interface during the analysis of a raw natural language text.

The results of the analysis of a bunch of documents talking about a specific hotel.
Through this interface, users are able to analyze opinion streams and to have a real-time view on each aspect connected to a particular topic.
In this Section, we present the evaluation of the SMACk system. Such a system is evaluated under different perspectives aiming to show the efficiency and effectiveness of the modules implemented within the system. In [14], we already discussed about the efficiency of the fuzzy labeling algorithm supporting the discovery of the most interesting “nodes” in the generated graph. Here, we focus our evaluation on two perspectives:
Aspect extraction. One of the tasks in charge to the Argumentation Module is the extraction of aspects from text. Such a task is important for defining, later in the analysis process, which aspects are the most significant ones. This evaluation task focused on measuring the effectiveness of the aspect-extraction approach.
Polarity detection. The computation of the aspect’s polarity enables the detection of which product features are strong or weak. The Sentiment Module is in charge of inferring the polarity of each aspect given the context in which such an aspect is included. Here, we measured the capability of the SMACk system of inferring the correct polarity.
Besides the effectiveness of the technological components, we provide also a discussion about the usability of the user interface and the direction we intend to follow for the evolution of the platform presented in this article.
Evaluation on aspect extraction
Table 1 reports the results obtained by our approach on the aspect extraction benchmark used in SemEval 2015 Task 12. The task consists in extracting from raw sentences the list of aspects towards which an opinion is expressed in the given text. For example, given the sentence: “This laptop is extremely portable and easily connects to WiFi at the library and elsewhere.”, the term “laptop” has to be considered as aspect because opinions expressed through the terms “portable” and “easily connects” are associated with it. On the contrary, the term “library” does not have to be extracted because no opinions are associated with this term. Precision and recall are computed by using the well-known machine learning formulas [54] where “true positives” are the extracted aspects that are contained in the gold standard, “false positives” are the extracted aspects that are not contained in the gold standard, and “false negatives” are the aspects contained in the gold standard that are not detected by the algorithm. The algorithm has been tested on the “Restaurant” and “Laptop” datasets. The overall performance is in line with the best systems participating in the evaluation campaign and, on the “Laptop” dataset, our aspect extraction approach recorded the best precision and F-measure. It is also important to highlight that all the systems we compared to, apply supervised approaches for extracting aspects, while our approach implements an unsupervised technique. This way, it is possible to implement the system in any environment without the requirement of training a new model.
Results obtained on the aspect extraction task, for the “Restaurant” and “Laptop” datasets, on the SemEval 2015 benchmark. For each dataset, we reported Precision, Recall, and F-Measure. Acronyms refer to the systems participated in the SemEval 2015 competition
Results obtained on the aspect extraction task, for the “Restaurant” and “Laptop” datasets, on the SemEval 2015 benchmark. For each dataset, we reported Precision, Recall, and F-Measure. Acronyms refer to the systems participated in the SemEval 2015 competition
Results obtained concerning the computation of polarities associated with single aspects on the SemEval 2015 benchmark. For each dataset, we reported the accuracy obtained in computing polarities (“positive”, or “negative”). Acronyms refer to the systems participated in the SemEval 2015 competition
Concerning the “Restaurant” domain, the gap between our approach and the best ones is given by the conservative strategy implemented for extracting aspects. One of the most common issue in unsupervised aspect-based approach is the extraction of false positive aspects [41]. The major consequence of such issue is the poor effectiveness of modules exploiting the outcome of the aspect extraction component. Unfortunately, the adoption of a conservative strategy leads to lower recall values. However, the latter is a preferable solution by considering the massive use of the aspects in the other components of the SMACk platform.
Table 2 reports the results of the polarity computation technique. The approach has been evaluated on the two datasets mentioned above. Here, we measured the accuracy of the polarity detection algorithm: given the set of opinion words associated with an aspect, such a polarity is computed by aggregating the fuzzy polarities of each opinion words. Results demonstrated the effectiveness of the polarity detection techniques implemented within the SMACk system by obtaining the best performance on the “Laptop” dataset, and the third best one on the “Restaurant” dataset. After a detailed analysis of the results, we noticed that the reason for which our approach performs better on the “Laptop” dataset is due to the simple language used for describing product features. Indeed, in the “Restaurant” dataset opinions are expressed in a more articulated way and sometimes the approach fails to detect the right polarity. Improvements in this direction will be part of the future work.
Lessons learned
Early in this section, we demonstrated the suitability of the components integrated within the SMACk platform. In Section 5, we showed how the system is efficient in resolving the labeling task for detecting the most important aspects contained in a set of user-generated documents. Furthermore, in the previous subsections, we validated the capability of the opinion mining module to extract the correct aspects from a document and to infer the right polarity associated to each aspect.
Besides such validation tasks, the development of the SMACk platform allowed us to learn some lessons that will lead the improvement of the whole system. In particular, what we learned from this experience can be recognized in two main aspects: (i) efficient management of data streams, and (ii) understandability of the user interface.
SMACk efficiency. The scenario used in this first prototype focused on using document sets having a limited number of items. By switching from a test environment to a more complex one, we noticed that the time needed for generating the argumentation graph increases significantly. This issue was related to the necessity of detecting, for each aspect that was already extracted, the presence of a support or attack edge. While a possible solution might be the parallelization of this task, some tricks have to be applied. Indeed, the constraint of analyzing documents by keeping the timing order in which they have been generated, requires to perform some checks based on the number of documents that we want to analyze at a certain time. Thus, by having, for example, a window of n documents that we want to parallelize, a possible strategy is to verify if there are conflicts between the aspects extracted from such documents. This way, we would be able to safely update the argumentation graph without losing potential edges.
User interface improvement. The second lesson we learned from this work is related to which improvements should be carried out to the user interface for making the platform more appealing from the user’s perspective. Users interviewed for judging the tool provided feedback that can be summarized in the following two issues.
Contextual information into the argumentation graph: in this prototype we did not take into account the possibility of having different kinds of users: Basic and Advanced. While basic users can be satisfied from a simple graphical information supporting the detection of the most interesting aspects, advanced users wanted to see detailed information associated with each aspect, i.e. the polarity value, a summary of supports and attacks to each aspect, etc. This functionality will be included in the next version of the platform.
Animate the evolution of single aspect: the second issue raised by the users was related to the impossibility of observing how each aspect “evolves” during the analysis of the data stream. In particular, a desiderata is the possibility of focusing on a single aspect and to observe how such aspect is attacked and/or supported through time. This feature has been considered as a valuable support for associating peaks of attacks or supports based on contextual events that cannot be tracked through the SMACk system.
The two points brought to light from users’ feedbacks will be used as a starting point for improving the infrastructure of the SMACk platform. Thus it will be possible to employ such a platform in a larger scale context with the aim of increasing its technological readiness level.
Conclusions
In this paper, we have presented SMACk, a combined framework for argumentation-based opinion extraction from natural language documents. The goal of SMACk is to support users dealing with huge amounts of data, typically in the context of online purchases, to detect the most important aspects for them of the product to be bought, and consequently, to easily detect what are the main arguments raised in favor or against this product with respect to the specific aspect of interest. In order to address this challenging task, we combined two components, namely aspect-based opinion mining, and abstract argumentation theory. The overall framework is presented, and the evaluation results show the feasibility of the task fostering further research in this direction.
Several open challenges have to be considered for future work. First, we plan to improve the user interface of the system in order to enhance usability. The idea is to develop SMACk as a web application such that it could be plugged in different websites, e.g., for online purchase like Amazon. Second, the current version of SMACk works in an unsupervised way. On the one hand, this is a strong point of the proposed approach with respect to the state of the art; on the other hand, due to the availability of effective approaches about domain detection, we plan to exploit such approaches combined with word-embedding techniques for building linguistic models that can be used for polishing the set of extracted aspects. Finally, the version of SMACk discussed in this paper works only on English texts. The next step will focus on the integration of multi-language support.
