Abstract
Social networks have grown exponentially in use and impact on the society as a whole. In particular, microblogging platforms such as Twitter have become important tools to assess public opinion on different issues. Recently, some approaches for assessing Twitter messages have been developed, identifying sentiments associated with relevant keywords or hashtags. However, such approaches have an important limitation, as they do not take into account contradictory and potentially inconsistent information which might emerge from relevant messages. We contend that the information made available in Twitter can be useful to extract a particular version of arguments (called “opinions” in our formalization) which emerge bottom-up from the social interaction associated with such messages. In this paper we present a novel framework which allows to mine opinions from Twitter based on incrementally generated queries. As a result, we will be able to obtain an “opinion tree”, rooted in the first original query. Distinguished, conflicting elements in an opinion tree lead to so-called “conflict trees”, which resemble dialectical trees as those used traditionally in defeasible argumentation.1
This paper extends preliminary work [11] presented at the First International Conference on Agreement Technologies (AT 2012), held in Dubrovnik, Croatia, in October 2012.
Introduction and motivations
Social networks have grown exponentially in use and impact on the society as a whole, aiming at different communities and providing differentiated services. In particular, microblogging has become a very popular communication tool among Internet users, being Twitter2
by far the most widespread microblogging platform. Twitter, created in 2006, enables its users to send and read text-based posts of up to 140 characters, known as “tweets”. It has grown into a technology which allows to assess public opinion on different issues. Thus, for example, nowadays it is common to read newspaper articles referring to the impact of political decisions measured by their associated positive or negative comments in Twitter. Symmetrically, policy makers make public many of their claims and opinions, having an influence on the citizenry,3For example, the current UK Prime Minster David Cameron, the current US President Barack Obama and the Pope Francis I can be followed on Twitter at
As pointed out in recent research (e.g., [18,23]) microblogging platforms (in particular Twitter) offer a number of advantages for opinion mining. On the one hand, Twitter is used by different people to express their opinion about different topics, and thus it is a valuable source of people’s opinions. Given the enormous number of text posts, the collected corpus can be arbitrarily large. On the other hand, Twitter’s audience varies from regular users to celebrities, company representatives, politicians, and even country presidents. Therefore, it is possible to collect text posts of users from different social and interests groups. According to Merriam Webster online dictionary,4
A fundamental need for policy makers is to back their decisions and agreements on reasons or opinions provided by citizens. They might even argue with other policy makers about why making a particular decision is advisable (e.g., “according to the last poll, 80% of the people are against the health system reform; therefore, the reform should not be carried out”). From this perspective, social networks like Twitter provide a fabulous knowledge base from which information could be collected and analyzed in order to enhance and partially automatize decision making processes. In particular, tweets have a rich structure, providing a number of record fields which allow to detect provenance of the tweet (author), number of re-tweets, followers, etc. We contend that the information made available from such tweets can be useful for modeling opinions which emerge bottom-up from the social interaction existing in Twitter.
In this article we present a novel framework which allows to mine opinions from Twitter based on incrementally generated queries. Given a query Q (corresponding to one or more keywords or hashtags), our approach allows to collect those distinguished tweets referring to Q, according to an aggregation criterion (also provided as an input). This collection of tweets will be called a Twitter-based argument A for Q, associated with a prevailing sentiment (computed on the basis of the tweets involved).5
Several software tools have been recently developed for such an association, such as www.sentiment140.com or tweetsentiments.com.
The rest of the article is structured as follows. In Section 2 we present our proposal for characterizing Twitter-based arguments and their interrelationships. We will formalize the notion of opinion tree, which can be constructed from user queries, allowing to assess alternative opinions associated with incrementally generated queries. A high-level algorithm for computing opinion trees is presented, along with a case study to illustrate our proposal. Section 3 discusses the relationship emerging from opinions in conflict, modeled as a conflict tree. Section 4 generalizes previous results using superior lattices, for both opinion and conflict trees. Then, in Section 5, we present an example showing the practical use of our approach. The benefits of applying this mathematical approach are discussed in Section 6. Section 7 reviews related work, and finally Section 8 summarizes the conclusions.
In this section we will describe how different elements in Twitter can be captured under an argumentative perspective [3,19]. First we will characterize distinguished collections of tweets (obtained on the basis of a given query) as arguments with an associated prevailing sentiment. Such arguments will be called TB-arguments (Twitter-based arguments). Then, we will formalize interrelationships between TB-arguments, which lead to the notion of opinion tree.
Formalizing aggregation of Twitter messages
Twitter messages (Tweets) are 140 character long, with a number of additional fields which help identify relevant information within a message (sender, number of retweets associated with the message, etc.). In particular, we will focus on the presence of descriptors which are either hashtags (words or phrases prefixed with the symbol #, a form of metadata tag) or terms that tend to occur often in the context of a given topic. Hashtags are used within IRC networks to identify groups and topics and in short messages on microblogging social networking services such as Twitter, identi.ca or Google+ (which may be tagged by including one or more hashtags with multiple words concatenated). Other good descriptors can be dynamically found by looking for terms that are frequently used in tweets related to the topic at hand. In the sequel we will assume that the term “descriptor” refers to either actual hashtags in Twitter or to relevant keywords found in tweets.
(Tweet. Twitter query).
We define a tweet T as a bag (or multiset) of terms
In the analysis that follows, we will assume that a tweet is just a bag of words, not taking into account the actual order of terms in the tweet. Additionally, we assume that the set of all currently existing tweets corresponds to a snapshot of Twitter messages at a given fixed time, as the Twitter database is highly dynamic. In our approach, a query Q is any set of descriptors used for filtering some relevant tweets from the set of existing tweets
(Tweet set. Aggregation operator).
Let
The aggregation operator could be defined in several ways. For instance, suppose that
Note that for the same query Q, different alternative criteria
A possible range for
Let
The previous elements will allow us to characterize the notion of TB-framework and TB-argument as follows:
(TB-framework).
A Twitter-based argumentation framework (or TB-framework) is a 5-tuple
(TB-argument).
Given a TB-framework
Consider a TB-framework
Then
In the previous section we have shown how to express arguments for queries associated with a given prevailing sentiment. Such arguments might be attacked by other arguments, which on their turn might be attacked, too. In argumentation theory, this leads to the notion of dialectical analysis [19], which can be associated with a tree-like structure in which arguments, counter-arguments, counter-counter-arguments, and so on, are taken into account. Our approach will be more generic, in the sense that for a given argument, the children nodes will correspond to more specific arguments that are not necessarily in conflict with the parent argument. Next we will formalize these notions.
A natural relation that arises between TB-arguments is derived from the inclusion relation between their associated queries. This is formalized by the following definition.
(Argument selectivity).
Consider a TB-framework
If two distinct queries
(Query equivalence).
Let
While it is clear that whenever
(Query subsumption).
Given a TB-framework
A query
Note that the subsumption relation is more general than the inclusion relation, since
Consider a TB-framework
Suppose that a TB-argument supporting the query
We start with a TB-argument A obtained from the original query Q (i.e.,
Next, we compute within A all relevant descriptors that might be used to “extend” Q, by adding a new element (
Then, a new argument for
The high-level algorithm can be seen in Fig. 1. As stated before, note that our approach to opinion trees is more generic than the one used for dialectical trees in argumentation (as done, e.g., in [10]), in the sense that for a given argument, the children nodes will correspond to more specific arguments that are not necessarily in conflict with the parent argument.

High-level algorithm for computing an opinion trees
It is also easy to see that for any query Q, the algorithm
As discussed before, the algorithm shown in Fig. 1 allows to obtain an opinion tree from a given query Q, a criterion C, and the set
Consider the query
This example was obtained from Twitter in December 2012, when Michigan legislature was debating several regulations on abortion practices.

Opinion tree based on query
Figure 2 illustrates how the construction of an opinion tree for the query
Next we will provide a formal definition of conflict between TB-arguments. Intuitively, a conflict will arise whenever two arguments for similar queries lead to conflicting sentiments assuming that the involved queries are related to each other by the subsumption relationship.
(Argument attack).
Consider a TB-framework
Consider query
Note that in the previous situation, adding the descriptor
Let
Given a particular query Q, note that several alternative expansions (supersets of Q) can be identified. We are interested in identifying which is the smallest superset of Q which is associated with a sentiment-shifting argument. This gives rise to the following definition:
(Minimal-shift descriptor. Minimal-shifting relation).
Let
We define a minimal-shifting relation “
(Conflict tree).
Let If there is no Let
Intuitively, a conflict tree depicts all possible ways of extending the original query Q such that every extension (child node in the tree) corresponds to a sentiment change. Figure 2 illustrates how a conflict tree for the query
Generalizing opinion and conflict trees as superior lattices
Next we will show a formal lattice-based characterization of our approach and propose an effective procedure to compute conflict superior lattices, which can be regarded as a generalization of conflict trees. Superior lattices will account for a more generic view of opinion and conflict trees, identifying relevant sublattices based on an equivalence relation between TB-arguments. First we will review some background definitions to make our presentation self-contained.
(Partial order. Partially ordered set).
A partial order is a binary relation “⪯” over a set A which is reflexive, antisymmetric, and transitive, i.e., for all a, b and c in A, we have that (1)
(Cover relation).
Given an ordered set
(Tree order).
An ordered set
(Superior lattice. Inferior lattice. Lattice).
Let
(Join-homomorphism. Meet-homomorphism. Lattice homomorphism).
The mapping h from
In the rest of this section we will show that the above definitions provide a solid mathematical foundation for the study of TB-arguments. Note, in the first place, that the ordered set
Let Q be a query and let
In order to prove that High-level algorithm for computing opinion (superior) lattices from opinion trees. Opinion tree (left) and its corresponding opinion (superior) lattice (right).

To complete the proof we need to show that (1) there is a unique TB-argument
Once the opinion tree

High-level algorithm for computing conflict (superior) lattices from opinion (superior) lattices.

From an opinion superior lattice (left) to a conflict superior lattice (right).
We show on the right-hand side of Fig. 4 the quotient set resulting from the given opinion tree. Note that this new set is a superior lattice (see Definition 4.4). In general, any opinion tree induces a superior lattice
Let Q be a query and let
In order to prove that
Although an opinion (superior) lattice is typically more compact than an opinion tree, we might be interested in finding in a computationally effective way the minimal structure that reflects all existing conflicts between opinions for a given query Q. In other words, we want to build a minimal superior lattice
Figure 6 illustrates the transformation of an opinion (superior) lattice into a conflict (superior)
This results in the following classes:
As specified in Algorithm
In order to compute the 1-equivalent classes, we take each pair of 0-equivalent TB-arguments
To illustrate this, take for instance
If we attempt to compute the 2-equivalent classes, we found out that they are identical to the 1-equivalent classes. Therefore the process terminates and the conflict (superior) lattice shown on the right-hand side of Fig. 6 is returned. Note that in addition, it is possible to verify that the canonical mapping, that maps each element in
In this section, we show how our framework can be applied in a real-world situation by presenting a possible user scenario. The described scenario shows ways in which a policy maker could recognize opinions from mass deliberations of citizens expressing their views on “taxes”. The topic of taxes is typically a trendy one in Twitter, in particular among United States citizens commenting on current tax legislation or on tax changes proposals advocated by their government.
The topic of taxes can be analyzed form various perspectives. A possible perspective would be by looking at opinions that address the issue of taxes on property. A second perspective could focus on the topic of the IRS scandal. A third perspective is provided by analyzing how the health care reform affects taxes. Yet another perspective emerges from analyzing the new tax law, which imposes higher taxes on families earning over $250,000 a year, without changing the situation for the middle class.

Opinion tree and conflict lattice for the query “taxes”. Results are simplified (4 nodes were left out). (Colors are visible in the online version of the article;
The proposed tool facilitates the exploration of these various perspective by imposing a rich structure on a large set of unstructured tweets. In addition, it allows to easily recognize the polarity of each group of opinions (TB-arguments) as well as conflict relations between them. Figure 7 presents a conflict superior lattice for the query “taxes”. In this scenario, certain emerging TB-arguments could shed light on the general desires of citizens. For instance, the fact that the sentiment polarity of “taxes pay companies” is positive may be indicating that the general public expect companies to pay higher corporate taxes. In addition, the use of the tool to identify current topics, such as those associated with the queries “taxes irs scandal” or “taxes health” could greatly help decision and policy makers define priorities and better address citizens’ present-day concerns.
We have developed a Java prototype8
Available to download from http://cs.uns.edu.ar/~cic/decide2.0/twitter.zip.
Our mathematical characterization of opinion and conflict trees as superior lattices provides a natural foundation for the analysis of important concepts prevailing in argumentation theory. In particular, the use of conflict (superior) lattices to represent diverging arguments leads to the identification of the minimal structure that reflects the existing collective positions with respect to a topic of interest.
From the user viewpoint, conflict lattices are intended to provide the theoretical basis for developing an explorative tool in a decision making platform. Consider for example the opinion tree based on the query
Possible users could be, e.g., a journalist, a deputy analyzing a law proposal, etc.
Argument specificity is a key notion in argumentation theory, as it is the first purely syntactic preference criterion proposed to compare arguments. In our framework, specificity can be associated with the “⪯” relation identified in the resulting superior lattices. The use of minimal structures to represent conflicting views facilitates the identification of specificity relations as well as the recognition of relevant (or irrelevant) elements in the argumentation space, as it is formalized by the notions of sentiment-shifting descriptors (or sentiment-preserving descriptors). Similarly, the minimal-shift relation “
It is important to mention that our analysis was done for the English language only. This is due to the fact that English is the lingua franca worldwide, being widely used in Twitter. In addition, most existing sentiment analysis tools assume English as the underlying language. We are currently developing a sentiment analysis tool for the Spanish language, which will allow us to extend the capabilities of the system. We will also investigate the benefits of using a stemming algorithm for Spanish.
Our approach is inspired by recent research in integrating argumentation and social networks (e.g., [13,22]). In the last years, there has been growing interest in assessing meaning to streams of data from microblogging services such as Twitter, as well as some recent research on using argumentation for social networks.
To the best of our knowledge, Torroni and Toni [22] were the first to combine social networks and argumentation in a unified approach, coining the term bottom-up argumentation for the grass-root approach to the problem of deploying computational argumentation in online systems. In this novel view, argumentation frameworks are obtained bottom-up starting from the users’ comments, opinions and suggested links, with no top-down intervention of or interpretation by “argumentation engineers”. As the authors point out “topics emerge, bottom-up, during the underlying process, possibly serendipitously”. In contrast with that proposal, in this paper we generalize this view by identifying arguments automatically from Twitter messages, establishing as well conflict relationships in terms of sentiment analysis (and not specified at the meta-level using rules, as it is the case in [22]). This proposal was recently extended (see [8]), leading towards so-called “microdebates” to help organizing and confronting users’ opinions in an automated way. A microdebate is a stream of tweets where users annotate their messages by using some special tags. In contrast with this approach, in our proposal we are not explicitly searching for debates containing arguments and counterarguments. Rather, different opinions emerge automatically based on collecting tweets associated with a particular topic (TB-arguments), and interrelationships among opinions are obtained on the basis of sentiment shifting/preserving descriptors.
In [1], Abbas and Sawamura formalize argument mining from the perspective of intelligent tutoring systems. In contrast with our approach, they rely on a relational database, and their aim is not related with identifying arguments underlying social networks as done in this paper. In [13], Leite and Martins introduce a novel extension to Dung’s abstract argumentation model, called Social Abstract Argumentation. Their proposal aims at providing a formal framework for social networks and argumentation, incorporating social voting and defining a new class of semantics for the resulting frameworks. In contrast with our approach, the automatic extraction of arguments from social networks data is not considered (as done in this paper), nor the modeling of conflicts between arguments in terms of sentiment analysis. In [2], Amgoud and Serrurier propose a formal argumentation-based model for classification, which generalizes the well-known concept learning model based on version spaces [15]. The framework shares some structural similarities with our approach (as a lattice-based characterization is also involved when contrasting hypotheses). However, the aims of the two approaches are different, as our proposal is not focused on solving classification tasks in a machine learning sense.
A related research area is formal concept analysis [9], which is a method for deriving conceptual structures out of data. As done in our approach, the theory of partial orders is used to formally characterize these structures. However, it differs from our proposal in dealing with concepts rather than opinions and in not attempting to associate sentiments with the elements of the partial order. In addition, it does not deal with notions such as arguments, conflict and attack.
It must be remarked that the rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis. The EU funded Cyberemotions consortium10
In this paper we have presented a novel approach which integrates argumentation theory and microblogging technologies, with a particular focus on Twitter. To the best of our knowledge, no other approach has been developed in a similar direction. We have also presented a definition of a Twitter-based argument for a query Q that considers as a support the bunch of tweets which are associated with Q according to a given criterion. For such an argument, we also define a prevailing sentiment, obtained in terms of sentiment analysis tools. This allowed us to characterize the notion of opinion tree, which can be recursively built by considering arguments associated with incrementally extended queries. We have implemented a prototype of our proposal as a proof of concept, which was used to compute the opinion tree for the case study presented in the paper.
We have also presented a theoretical setting for analyzing Twitter-based arguments, associating a superior lattice rooted in the initial argument for the first given query. Based on the notion of attack between arguments, we have established as well a refined order relationship between conflicting arguments. As a result, from every superior lattice associated with a given query Q, a conflict tree rooted in Q can be built, in which alternating opinions can be better contrasted. Given a node A (argument) associated with query
The prototype that we have implemented so far is intended to be used as a proof of concept. The development of a full-fledged software tool will require tackling several issues, mostly related with user-interface and usability aspects. We believe that before embarking on that stage, it is crucial to investigate and provide a full account of the functional capabilities of the proposed system, both based on a theoretical study and by validating its behavior through simulations with realistic data. Part of our future work will focus on improving the existing prototype, aiming at the deployment of a software tool for real-world users. As a basis for such deployment, visual tools for displaying and analyzing dialectical trees have been already developed for Defeasible Logic Programming [17]. We expect to use the underlying algorithms from this tool in our framework. Additionally, we expect to perform different experiments with hashtags associated with relevant topics, assessing the applicability of our approach in a real-world context.
Another future research avenue would be to take advantage of existing semantic information sources, such as dictionaries, topic directories or ontologies, to better explore query space, either by using synonyms of existing terms or other important terms for the domain under analysis. In addition, we anticipate that the proposed framework could be integrated with mechanisms that allow weighting TB-arguments based on different aspects, such as provenance of the tweets, number of associated tweets, opinion strength, etc. In particular, we are analysing the possibility of extending the current framework in order to consider ontologies for sentiment analysis of Twitter posts (as proposed in [12]). Finally, we are currently working on extending the current Twitter-based model to a more generic setting, in which opinions are collected from other social networks (such as Facebook).11
Footnotes
Acknowledgements
The authors would like to thank anonymous reviewers which helped improve the original version of this article. This research work was funded by Projects LACCIR R1211LAC004 (Microsoft Research, CONACyT and IDB), PIP 112-200801-02798, PIP 112-200901-00863 (CONICET, Argentina), PGI 24/ZN10, PGI 24/N006, PGI 24/N029 (SGCyT, UNS, Argentina) and Universidad Nacional del Sur.
