Integrating argumentation and sentiment analysis for mining opinions from Twitter

Abstract

Social networks have grown exponentially in use and impact on the society as a whole. In particular, microblogging platforms such as Twitter have become important tools to assess public opinion on different issues. Recently, some approaches for assessing Twitter messages have been developed, identifying sentiments associated with relevant keywords or hashtags. However, such approaches have an important limitation, as they do not take into account contradictory and potentially inconsistent information which might emerge from relevant messages. We contend that the information made available in Twitter can be useful to extract a particular version of arguments (called “opinions” in our formalization) which emerge bottom-up from the social interaction associated with such messages. In this paper we present a novel framework which allows to mine opinions from Twitter based on incrementally generated queries. As a result, we will be able to obtain an “opinion tree”, rooted in the first original query. Distinguished, conflicting elements in an opinion tree lead to so-called “conflict trees”, which resemble dialectical trees as those used traditionally in defeasible argumentation.1

This paper extends preliminary work [11] presented at the First International Conference on Agreement Technologies (AT 2012), held in Dubrovnik, Croatia, in October 2012.

Keywords

Argumentation opinion mining social media

1. Introduction and motivations

Social networks have grown exponentially in use and impact on the society as a whole, aiming at different communities and providing differentiated services. In particular, microblogging has become a very popular communication tool among Internet users, being Twitter2

²
www.twitter.com.

by far the most widespread microblogging platform. Twitter, created in 2006, enables its users to send and read text-based posts of up to 140 characters, known as “tweets”. It has grown into a technology which allows to assess public opinion on different issues. Thus, for example, nowadays it is common to read newspaper articles referring to the impact of political decisions measured by their associated positive or negative comments in Twitter. Symmetrically, policy makers make public many of their claims and opinions, having an influence on the citizenry,3

For example, the current UK Prime Minster David Cameron, the current US President Barack Obama and the Pope Francis I can be followed on Twitter at @Number10gov, @BarackObama and @pontifex, respectively.

prompting their “tweeting back” with further comments and opinions. As the audience of microblogging platforms and services grows everyday, data from these sources can be used in opinion mining and sentiment analysis tasks. Indeed, the scientific study of emotions in opinions associated with a given topic has become relevant, consolidating a new area known as sentiment analysis [7,14], with application in several real-world problems such as e-government [4,5], stock market analysis [16] and tracking breaking events [6], among others.

As pointed out in recent research (e.g., [18,23]) microblogging platforms (in particular Twitter) offer a number of advantages for opinion mining. On the one hand, Twitter is used by different people to express their opinion about different topics, and thus it is a valuable source of people’s opinions. Given the enormous number of text posts, the collected corpus can be arbitrarily large. On the other hand, Twitter’s audience varies from regular users to celebrities, company representatives, politicians, and even country presidents. Therefore, it is possible to collect text posts of users from different social and interests groups. According to Merriam Webster online dictionary,4

⁴

http://www.merriam-webster.com.

an opinion can be seen as: (a) a view, judgment, or appraisal formed in the mind about a particular matter; (b) belief stronger than impression and less strong than positive knowledge; a generally held view; (c) a formal expression of judgment or advice by an expert. Clearly, there is a natural link between opinion and argument. In many cases, opinions by themselves do not provide arguments, as they do not necessarily imply giving reasons or evidence for accepting a particular conclusion. However, from a meta-level perspective, policy makers devote much effort in analyzing the reasons underlying complex collections of opinions from the citizenry, as they indicate the willingness of the people to accept or reject some particular issue. A well-known example in this setting is the analysis of public opinion (e.g., through the quantitative measurement of opinion distributions through polls and the investigation of the internal relationships among the individual opinions that make up public opinion on an issue).

A fundamental need for policy makers is to back their decisions and agreements on reasons or opinions provided by citizens. They might even argue with other policy makers about why making a particular decision is advisable (e.g., “according to the last poll, 80% of the people are against the health system reform; therefore, the reform should not be carried out”). From this perspective, social networks like Twitter provide a fabulous knowledge base from which information could be collected and analyzed in order to enhance and partially automatize decision making processes. In particular, tweets have a rich structure, providing a number of record fields which allow to detect provenance of the tweet (author), number of re-tweets, followers, etc. We contend that the information made available from such tweets can be useful for modeling opinions which emerge bottom-up from the social interaction existing in Twitter.

In this article we present a novel framework which allows to mine opinions from Twitter based on incrementally generated queries. Given a query Q (corresponding to one or more keywords or hashtags), our approach allows to collect those distinguished tweets referring to Q, according to an aggregation criterion (also provided as an input). This collection of tweets will be called a Twitter-based argument A for Q, associated with a prevailing sentiment (computed on the basis of the tweets involved).5

⁵

Several software tools have been recently developed for such an association, such as www.sentiment140.com or tweetsentiments.com.

By expanding Q in different ways, we can obtain other, more specific arguments, which might be in conflict with A. These counter-arguments might be in turn in conflict with other more specific arguments. This will result in the characterization of an “opinion tree”, rooted in the first original query. By considering distinguished nodes in an opinion tree we can define so-called “conflict trees”, which resemble dialectical trees as those used traditionally in defeasible argumentation. We also provide theoretical results which account for a lattice-based characterization of our proposal, using equivalence classes to minimize the representation space to be analyzed when contrasting arguments.

The rest of the article is structured as follows. In Section 2 we present our proposal for characterizing Twitter-based arguments and their interrelationships. We will formalize the notion of opinion tree, which can be constructed from user queries, allowing to assess alternative opinions associated with incrementally generated queries. A high-level algorithm for computing opinion trees is presented, along with a case study to illustrate our proposal. Section 3 discusses the relationship emerging from opinions in conflict, modeled as a conflict tree. Section 4 generalizes previous results using superior lattices, for both opinion and conflict trees. Then, in Section 5, we present an example showing the practical use of our approach. The benefits of applying this mathematical approach are discussed in Section 6. Section 7 reviews related work, and finally Section 8 summarizes the conclusions.

2. Twitter-based argumentation framework: Viewing aggregated tweets as arguments

In this section we will describe how different elements in Twitter can be captured under an argumentative perspective [3,19]. First we will characterize distinguished collections of tweets (obtained on the basis of a given query) as arguments with an associated prevailing sentiment. Such arguments will be called TB-arguments (Twitter-based arguments). Then, we will formalize interrelationships between TB-arguments, which lead to the notion of opinion tree.

2.1. Formalizing aggregation of Twitter messages

Twitter messages (Tweets) are 140 character long, with a number of additional fields which help identify relevant information within a message (sender, number of retweets associated with the message, etc.). In particular, we will focus on the presence of descriptors which are either hashtags (words or phrases prefixed with the symbol #, a form of metadata tag) or terms that tend to occur often in the context of a given topic. Hashtags are used within IRC networks to identify groups and topics and in short messages on microblogging social networking services such as Twitter, identi.ca or Google+ (which may be tagged by including one or more hashtags with multiple words concatenated). Other good descriptors can be dynamically found by looking for terms that are frequently used in tweets related to the topic at hand. In the sequel we will assume that the term “descriptor” refers to either actual hashtags in Twitter or to relevant keywords found in tweets.

Definition 2.1 (Tweet. Twitter query).

We define a tweet T as a bag (or multiset) of terms ${t_{1}, t_{2}, \dots, t_{k}}$ , where every $t_{i} \in T$ is a string. A Twitter query (or just query) is a non-empty set $Q = {d_{1}, d_{2}, \dots, d_{k}}$ of descriptors, where every $d_{i} \in Q$ is a string.

In the analysis that follows, we will assume that a tweet is just a bag of words, not taking into account the actual order of terms in the tweet. Additionally, we assume that the set of all currently existing tweets corresponds to a snapshot of Twitter messages at a given fixed time, as the Twitter database is highly dynamic. In our approach, a query Q is any set of descriptors used for filtering some relevant tweets from the set of existing tweets $T w e e t s$ based on a given criterion C. In order to abstract away how such selection is performed, we will define an aggregation operator ${Agg}_{T w e e t s} (Q, C)$ . Formally:

Definition 2.2 (Tweet set. Aggregation operator).

Let $T w e e t s$ be the set of all currently existing tweets. We will write $2^{T w e e t s}$ to denote the set of all possible subsets of $T w e e t s$ . Any element in $2^{T w e e t s}$ will be called a tweet set. Given a query Q, and a criterion C, we will define an aggregation operator ${Agg}_{T w e e t s} (Q, C)$ which returns an element (tweet set) in $2^{T w e e t s}$ based on Q and C.

The aggregation operator could be defined in several ways. For instance, suppose that $C_{1}$ is a criterion that indicates that only tweets posted between time ${timestamp}_{1}$ and ${timestamp}_{2}$ are to be selected. Then ${Agg}_{T w e e t s} (Q, C_{1}) =_{def} {T \in T w e e t s such that Q \subseteq T and T satisfies C_{1}}$ will be the set of tweets that contain all the terms of query Q and have been posted in the time period $[{timestamp}_{1}, {timestamp}_{2}]$ . Other examples of criteria that can be naturally applied are, for instance, requiring that those tweets T were retweeted more than n times, requiring that every user that posted tweets T has at least m followers, etc.

Note that for the same query Q, different alternative criteria $(C_{1}, C_{2}, \dots, C_{k})$ can lead to different distinguished elements in $2^{T w e e t s}$ . As explained before, tweet sets can be associated with different feelings or sentiments. Even if in real life there may be a lot of emotions in tweets (such as anger, happiness, and so on), we will assume here that there is a distinguished set $S$ of possible sentiments. Thus, given a query Q and a criterion C, we assume that the tweet set ${Agg}_{T w e e t s} (Q, C)$ is associated with a prevailing sentiment in $S$ .6

⁶
A possible range for $S$ could be positive, negative and neutral (as done for example in platform sentiment140.com). In this platform, prevailing sentiments associated with a tweet set are expressed by percentages.

We will consider that some sentiments might convey different, possibly conflicting feelings or emotions (e.g., anger and happiness; boredom and excitement, etc.). As before, we will abstract away which are potentially conflicting sentiments as follows.

Definition 2.3 (

sent

and

conflict

mappings).

Let $T \in 2^{T w e e t s}$ be a tweet set, and let $sent : 2^{T w e e t s} \to S$ and $conflict : S \to 2^{S}$ be mappings. The sentiment $sent (T)$ will be called the prevailing sentiment (or just sentiment) for T. For any sentiment $s \in S$ , we will define $conflict (s)$ as a subset of $S$ , such that: (a) $s \notin conflict (s)$ (a sentiment is not in conflict with itself); (b) for any $s^{'} \in conflict (s)$ , then $s \in conflict (s^{'})$ (the notion of conflict is symmetrical). Given two sentiments $s_{1}$ and $s_{2}$ , we will say that they are in conflict whenever $s_{2} \in conflict (s_{1})$ . For simplicity, given a sentiment $s \in S$ , we will write $\bar{s}$ to denote any $s^{'} \in conflict (s)$ .

The previous elements will allow us to characterize the notion of TB-framework and TB-argument as follows:

Definition 2.4 (TB-framework).

A Twitter-based argumentation framework (or TB-framework) is a 5-tuple $(T w e e t s, C, S, sent, conflict)$ , where $T w e e t s$ is the set of available tweets, C is a selection criterion, $S$ is a non-empty set of possible sentiments and $sent$ and $conflict$ are sentiment prevailing and conflict mappings.

Definition 2.5 (TB-argument).

Given a TB-framework $(T w e e t s, C, S, sent, conflict)$ , a Twitter-based argument (or TB-argument) for a query Q is a 3-tuple $⟨ Arg, Q, Sent ⟩$ , where $Arg$ is ${Agg}_{T w e e t s} (Q, C)$ and $Sent$ is $sent ({Agg}_{T w e e t s} (Q, C))$ .

Example 2.1.
Consider a TB-framework $(T w e e t s, C, S, sent, conflict)$ , where $Q = {“ abortion ”, “ murder ”}$ , C is defined as “all $T \in T w e e t s ∣ timestamp (T) ⩾$ 2012-01-01T00:00:00”, and $S = {pos, neg, neutral}$ , such that:
$conflict (pos) =_{def} {neg, neutral}$ ,

$conflict (neg) =_{def} {pos, neutral}$ and

$conflict (neutral) =_{def} {pos, neg}$ .
Then $Arg = {Agg}_{T w e e t s} (Q, C)$ is the set of all possible tweets containing ${“ abortion ”, “ murder ”}$ that have been published since January 1, 2012. Suppose that $sent ({Agg}_{T w e e t s} (Q, C)) = negative$ . Then $⟨ Arg, {“ abortion ”, “ murder ”}, negative ⟩$ is a TB-argument.
2.2. Specificity in a TB-framework. Opinion trees

In the previous section we have shown how to express arguments for queries associated with a given prevailing sentiment. Such arguments might be attacked by other arguments, which on their turn might be attacked, too. In argumentation theory, this leads to the notion of dialectical analysis [19], which can be associated with a tree-like structure in which arguments, counter-arguments, counter-counter-arguments, and so on, are taken into account. Our approach will be more generic, in the sense that for a given argument, the children nodes will correspond to more specific arguments that are not necessarily in conflict with the parent argument. Next we will formalize these notions.

A natural relation that arises between TB-arguments is derived from the inclusion relation between their associated queries. This is formalized by the following definition.

Definition 2.6 (Argument selectivity).

Consider a TB-framework $(T w e e t s, C, S, sent, conflict)$ and let $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ be two TB-arguments. We say that $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ is more selective than $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ , and we denote it $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩ ⪯_{Q} ⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ , if $Q_{1} \subseteq Q_{2}$ .

If two distinct queries $Q_{1}$ and $Q_{2}$ result in the same set of retrieved tweets, it is useful to identify $Q_{1}$ and $Q_{2}$ as equivalent queries. This gives rise to the following definition.

Definition 2.7 (Query equivalence).

Let $(T w e e t s, C, S, sent, conflict)$ be a TB-framework. Given two queries $Q_{1}$ and $Q_{2}$ , we will say that $Q_{1}$ is equivalent to $Q_{2}$ whenever ${Agg}_{T w e e t s} (Q_{2}, C) = {Agg}_{T w e e t s} (Q_{1}, C)$ .

While it is clear that whenever $Q_{1} \subseteq Q_{2}$ , it will hold that ${Agg}_{T w e e t s} (Q_{2}, C) \subseteq {Agg}_{T w e e t s} (Q_{1}, C)$ , it may be the case that for certain queries $Q_{1}$ and $Q_{2}$ , ${Agg}_{T w e e t s} (Q_{2}, C) \subseteq {Agg}_{T w e e t s} (Q_{1}, C)$ but $Q_{1} ⊈ Q_{2}$ . In order to define a broader notion than query inclusion, we provide the following definition of query subsumption.

Definition 2.8 (Query subsumption).

Given a TB-framework $(T w e e t s, C, S, sent, conflict)$ and two queries $Q_{1}$ and $Q_{2}$ , we will say that $Q_{1}$ subsumes $Q_{2}$ whenever it holds that ${Agg}_{T w e e t s} (Q_{2}, C) \subset {Agg}_{T w e e t s} (Q_{1}, C)$ .

Example 2.2.
A query $Q_{1}$ formed by ${“ abortion ”}$ subsumes the query $Q_{2}$ formed by ${“ abortion ”, “ murder ”}$ , as all the tweets that are returned by $Q_{2}$ will be part of the tweets returned by $Q_{1}$ , but not the other way around.

Note that the subsumption relation is more general than the inclusion relation, since $Q_{1}$ subsumes $Q_{2}$ whenever $Q_{1} \subset Q_{2}$ (as ${Agg}_{T w e e t s} (Q_{2}, C) \subset {Agg}_{T w e e t s} (Q_{1}, C)$ ). However, it is possible that $Q_{1}$ subsumes $Q_{2}$ even when $Q_{1} ⊄ Q_{2}$ .
Definition 2.9 (Argument specificity).

Consider a TB-framework $(T w e e t s, C, S, sent, conflict)$ and let $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ be two TB-arguments. We say that $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ is strictly more specific than $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ , and we denote it $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩ ≺ ⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ , if $Q_{1}$ subsumes $Q_{2}$ . We will write $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩ ⪯ ⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ when $Q_{1}$ subsumes $Q_{2}$ or $Q_{1}$ is equivalent to $Q_{2}$ .

Suppose that a TB-argument supporting the query $“ abortion ”$ is obtained, with a prevailing sentiment $negative$ . If the original query Q is extended in some way into a new query $Q^{'}$ that is more specific than Q (i.e., $Q^{'} = Q \cup {d}$ ), it could be the case that a TB-argument supporting $Q^{'}$ has a different (possibly conflicting) prevailing sentiment. For example, more specific opinions about abortion are related to other topics, like for example ethics, social problems or programs, religious issues, etc. To explore all possible relationships associated with TB-arguments returned for a specified query Q and criteria C, we can define an algorithm to construct an “opinion tree” recursively as follows:

We start with a TB-argument A obtained from the original query Q (i.e., $⟨ Arg, Q, Sent ⟩$ ), which will be the root of the tree.

Next, we compute within A all relevant descriptors that might be used to “extend” Q, by adding a new element ( $NewTerm$ ) to the query, obtaining $Q^{'} = Q \cup {NewTerm}$ .

Then, a new argument for $Q^{'}$ is obtained, which will be associated with a subtree rooted in the original argument A.

The high-level algorithm can be seen in Fig. 1. As stated before, note that our approach to opinion trees is more generic than the one used for dialectical trees in argumentation (as done, e.g., in [10]), in the sense that for a given argument, the children nodes will correspond to more specific arguments that are not necessarily in conflict with the parent argument.

Fig. 1.

High-level algorithm for computing an opinion trees ${OT}_{Q}$ from Twitter.

It is also easy to see that for any query Q, the algorithm $BuildOT$ finishes in finite time: given that a tweet may not contain more than 140 characters, the number of contained descriptors is finite, and therefore the algorithm will eventually stop, providing an opinion tree as an output.

2.3. Case study

As discussed before, the algorithm shown in Fig. 1 allows to obtain an opinion tree from a given query Q, a criterion C, and the set $T w e e t s$ of all possible tweets. An additional parameter R allows us to specify the set of tweets to be considered when searching for a new descriptor. Initially, $R = T w e e t s$ . The cardinality of R determines the threshold associated with the depth of the tree.

Consider the query $Q = “ abortion ”$ , and a criterion $C = {T \in T w e e t s ∣ T was posted less than 48 hours ago}$ . A root TB-argument is computed for Q, C and $T w e e t s$ , obtaining an associated prevailing sentiment ( $negative$ ). If $| R |$ is above a given threshold value, the algorithm computes the most frequent word d in R whenever d is not already present in $Q \cup Stopwords$ . The underlying idea is that any new descriptor used to extend the query associated with a node of the opinion tree should not appear in previous nodes at the same level (from left to right) or in ancestors of previous nodes at the same level. The set $Stopwords$ will usually include terms such as $the$ , $as$ , $which$ , etc. In our example, $d = “ michigan ”$ .7

⁷
This example was obtained from Twitter in December 2012, when Michigan legislature was debating several regulations on abortion practices.

A new TB-argument can now be built for query

Q_{new} = {“ abortion ”} \cup {“ michigan ”}

, criterion C and the preserving sentiment calculated for the new subset of tweets

{Tweets}_{Q_{new}}

. In the recursive call, the most frequent word is calculated for this subset (obtaining the result

“ senate ”

), so that a new TB-argument for the query

{“ abortion ”, “ michigan ”, “ senate ”}

is obtained, with a new associated prevailing sentiment. Note that within a particular instance of the recursive call, the REPEAT loop takes care of alternative ways of “extending” Q. This is accomplished by selecting a particular descriptor d from the set of tweets in R. The process is repeated until the threshold has been reached. At the end, the resulting opinion tree

{OT}_{Q}

is returned.

Fig. 2.

Opinion tree based on query $“ abortion ”$ . The associated conflict tree for the same query is shown in dotted lines. (Colors are visible in the online version of the article; https://dx-doi-org.web.bisu.edu.cn/10.3233/AIC-140627.)

Figure 2 illustrates how the construction of an opinion tree for the query $Q = {“ abortion ”}$ looks like. Distinguished symbols (“+”, “−”, “=”) are used to denote positive, negative and neutral sentiments, respectively. Note that the original query Q has cardinality 1, and further levels in the opinion tree refer to incrementally extended queries (e.g., ${“ abortion ”, “ michigan ”}$ , or ${“ abortion ”, “ murder ”}$ ). Leaves correspond to arguments associated with a query Q which cannot be further expanded, as the associated number of tweets is too small for any possible query $Q \cup W$ . Furthermore, we can identify some subtrees in ${OT}_{{“ abortion ”}}$ which consist of nodes which have all the same sentiment. In other words, further expanding a query into more complex queries does not change the prevailing sentiment associated with the root node. In other cases, expanding some queries results in a sentiment change (e.g., from ${“ abortion ”}$ into ${“ abortion ”, “ option ”}$ or ${“ abortion ”, “ wish ”}$ ). This situation will allow us to characterize conflict trees, in which we take into account opinions that attack each other, as discussed in the next section.

3. Conflict trees

Next we will provide a formal definition of conflict between TB-arguments. Intuitively, a conflict will arise whenever two arguments for similar queries lead to conflicting sentiments assuming that the involved queries are related to each other by the subsumption relationship.

Definition 3.1 (Argument attack).

Consider a TB-framework $(T w e e t s, C, S, sent, conflict)$ and let $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ be two TB-arguments such that $Q_{1}$ subsumes $Q_{2}$ , we say that $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ attacks $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ whenever ${Sent}_{1}$ and ${Sent}_{2}$ are in conflict.

Example 3.1.
Consider query $Q_{1} = {“ abortion ”}$ and query $Q_{2} = {“ abortion ”, “ option ”}$ with associated TB-arguments $⟨ {Arg}_{1}, Q_{1}, negative ⟩$ and $⟨ {Arg}_{2}, Q_{2}, neutral ⟩$ . Then $⟨ {Arg}_{2}, Q_{2}, neutral ⟩$ attacks $⟨ {Arg}_{1}, Q_{1}, negative ⟩$ , and vice versa.

Note that in the previous situation, adding the descriptor $“ option ”$ to the original query $“ abortion ”$ involves a sentiment change. We will formalize this situation as follows:
Definition 3.2 (Sentiment-preserving descriptor. Sentiment-shifting descriptor).

Let $⟨ A_{1}, Q, {Sent}_{1} ⟩$ be a TB-argument. We say that a keyword or hashtag d is a sentiment-preserving (resp. sentiment-shifting) descriptor w.r.t. Q whenever there exists a TB-argument $⟨ A_{2}, Q \cup {d}, {Sent}_{2} ⟩$ such that ${Sent}_{1}$ and ${Sent}_{2}$ are non-conflicting (resp. conflicting). TB-argument $⟨ A_{2}, Q \cup {d}, {Sent}_{2} ⟩$ will be called sentiment-preserving (resp. sentiment-shifting argument).

Given a particular query Q, note that several alternative expansions (supersets of Q) can be identified. We are interested in identifying which is the smallest superset of Q which is associated with a sentiment-shifting argument. This gives rise to the following definition:

Definition 3.3 (Minimal-shift descriptor. Minimal-shifting relation).

Let $(T w e e t s, C, S, sent, conflict)$ be a TB-framework. Given two conflicting arguments $⟨ {Arg}_{1}, Q_{1}, Sent ⟩$ and $⟨ {Arg}_{2}, Q_{2}, \bar{Sent} ⟩$ , we will say that $Q_{2}$ is a minimal shift descriptor w.r.t. $Q_{1}$ iff $⟨ {Arg}_{2}, Q_{2}, \bar{Sent} ⟩$ is a sentiment-shifting argument w.r.t. $Q_{1}$ and $∄ Q^{'} \subset Q_{2}$ such that $⟨ {Arg}^{'}, Q^{'}, \bar{Sent} ⟩$ is a sentiment-shifting argument w.r.t. $Q_{1}$ .

We define a minimal-shifting relation “ $⪯_{Q}^{\min}$ ” as follows: $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩ ⪯_{Q}^{\min} ⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ iff $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ attacks $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $Q_{2}$ is a minimal-shifting descriptor w.r.t. $Q_{1}$ .

Definition 3.4 (Conflict tree).

Let $(T w e e t s, C, S, sent, conflict)$ be a TB-framework. Given a query Q, and its associated argument $⟨ Arg, Q, Sent ⟩$ we will define a conflict tree for Q (denoted ${CT}_{Q}$ ) recursively as follows:

If there is no $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ such that $⟨ Arg, Q, Sent ⟩ ⪯_{Q}^{\min} ⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ , then ${CT}_{Q}$ is a conflict tree consisting of a single node $⟨ Arg, Q, Sent ⟩$ .

Let $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩, ⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩, \dots, ⟨ {Arg}_{k}, Q_{k}, {Sent}_{k} ⟩$ be those arguments in $(A r g s, T w e e t s, C, S, s)$ such that $⟨ Arg, Q, Sent ⟩ ⪯_{Q}^{\min} ⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ (for $i = 1, \dots, k$ ). Then ${CT}_{Q}$ is a conflict tree consisting of $⟨ Arg, Q, Sent ⟩$ as the root node and ${CT}_{Q_{1}}, \dots, {CT}_{Q_{k}}$ are its immediate subtrees.

Intuitively, a conflict tree depicts all possible ways of extending the original query Q such that every extension (child node in the tree) corresponds to a sentiment change. Figure 2 illustrates how a conflict tree for the query $Q = {“ abortion ”}$ looks like, depicting nodes and arcs with dotted lines. Every node in the tree (except the root) is associated with a TB-argument which is a sentiment-shifting argument w.r.t. its parent. Leaves correspond to nodes for which no further sentiment shift can be found.

4. Generalizing opinion and conflict trees as superior lattices

Next we will show a formal lattice-based characterization of our approach and propose an effective procedure to compute conflict superior lattices, which can be regarded as a generalization of conflict trees. Superior lattices will account for a more generic view of opinion and conflict trees, identifying relevant sublattices based on an equivalence relation between TB-arguments. First we will review some background definitions to make our presentation self-contained.

Definition 4.1 (Partial order. Partially ordered set).

A partial order is a binary relation “⪯” over a set A which is reflexive, antisymmetric, and transitive, i.e., for all a, b and c in A, we have that (1) $a ⪯ a$ (reflexivity); if $a ⪯ b$ and $b ⪯ a$ then $a = b$ (antisymmetry); if $a ⪯ b$ and $b ⪯ c$ then $a ⪯ c$ (transitivity). A set with a partial order is called partially ordered set (or just ordered set).

Definition 4.2 (Cover relation).

Given an ordered set $(A, ⪯)$ , for two elements $a, b \in A$ we use $a ≺ b$ to specify that $a ⪯ b$ and $a \neq b$ . Let $(A, ⪯)$ be an ordered set. Then for any $a, b \in A$ we say that a covers b if $b ≺ a$ and there is no $c \in A$ such that $b ≺ c ≺ a$ .

Definition 4.3 (Tree order).

An ordered set $(A, ⪯)$ is a tree if (1) there is a unique $a \in A$ such that $b ⪯ a$ for all $b \in A$ , and (2) for all $a, b, c \in A$ , if b covers a and c covers a, then $b = c$ .

Definition 4.4 (Superior lattice. Inferior lattice. Lattice).

Let $(A, ⪯)$ be an ordered set. Then for any $a, b \in A$ , we will say that $c \in A$ is the least upper bound of a and b (also called the join of a and b), denoted $c = a \lor b$ , whenever (i) $a ⪯ c$ and $b ⪯ c$ ; (ii) if for $x \in A$ , it holds that $a ⪯ x$ and $b ⪯ x$ , then $c ⪯ x$ . An ordered set $(A, ⪯)$ is a superior lattice whenever for any pair of elements $a, b \in A$ there is a least upper bound element in A. The notions of greatest lower bound (or meet), denoted $c = a \land b$ , and inferior lattice are defined analogously as the duals of the notions of least upper bound and superior lattice. An ordered set $(A, ⪯)$ that is both a superior lattice and an inferior lattice is called a lattice.

Definition 4.5 (Join-homomorphism. Meet-homomorphism. Lattice homomorphism).

The mapping h from $(X, ⪯)$ to $(Y, ⪯)$ is a join-homomorphism provided that for any $a, b \in x$ , $h (a \lor b) = h (a) \lor h (b)$ . It is also said that “h preserves joins”. The notion of meet-homomorphism is defined analogously as the dual of the notion of join-homomorphism. The mapping h is a lattice homomorphism if it is both a join-homomorphism and a meet-homomorphism.

In the rest of this section we will show that the above definitions provide a solid mathematical foundation for the study of TB-arguments. Note, in the first place, that the ordered set $(2^{T w e e t s}, \subseteq)$ is a lattice, as is the case for any power set of a given set, ordered by inclusion. The join is given by the union and the meet by the intersection of the subsets. More interestingly, it can be shown that for any query Q, the resulting opinion tree ${OT}_{Q}$ associated with a query Q defines a tree order (see Definition 4.3).

Lemma 4.1.
Let Q be a query and let ${OT}_{Q}$ be an opinion tree for Q in a TB-framework $(T w e e t s, C, S, sent, conflict)$ . Then $({OT}_{Q}, ⪯_{Q})$ defines a tree order.
Proof.
In order to prove that $({OT}_{Q}, ⪯_{Q})$ is a tree order, we first need to prove that $({OT}_{Q}, ⪯_{Q})$ is an ordered set. This is straightforward since the “ $⪯_{Q}$ ” relation is defined in terms of the “⊇” relation as follows. $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩ ⪯_{Q} ⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ , if and only if $Q_{1} \supseteq Q_{2}$ . Given that the “⊇” relation defines a partial order on the set of queries, it is clear that $({OT}_{Q}, ⪯_{Q})$ is an ordered set.
Fig. 3.
High-level algorithm for computing opinion (superior) lattices from opinion trees.

Fig. 4.
Opinion tree (left) and its corresponding opinion (superior) lattice (right).

To complete the proof we need to show that (1) there is a unique TB-argument $⟨ Arg, Q, Sent ⟩$ such that $⟨ {Arg}^{'}, Q^{'}, {Sent}^{'} ⟩ ⪯_{Q} ⟨ Arg, Q, Sent ⟩$ for all $⟨ {Arg}^{'}, Q^{'}, {Sent}^{'} ⟩$ in ${OT}_{Q}$ , and (2) for all TB-arguments $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ , $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ , $⟨ {Arg}_{3}, Q_{3}, {Sent}_{3} ⟩$ in ${OT}_{Q}$ , if $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ covers $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $⟨ {Arg}_{3}, Q_{3}, {Sent}_{3} ⟩$ covers $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ , then $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩ = ⟨ {Arg}_{3}, Q_{3}, {Sent}_{3} ⟩$ . Part (1) follows from the construction of the opinion tree in algorithm $BuildOT$ by taking $⟨ Arg, Q, Sent ⟩$ as ${Root}_{{OT}_{Q}}$ . Then it is clear that ${Root}_{{OT}_{Q}}$ is the only TB-argument such that $⟨ {Arg}^{'}, Q^{'}, {Sent}^{'} ⟩ ⪯_{Q} {Root}_{{OT}_{Q}}$ for all $⟨ {Arg}^{'}, Q^{'}, {Sent}^{'} ⟩$ in ${OT}_{Q}$ . In order to prove (2), assume $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ covers $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $⟨ {Arg}_{3}, Q_{3}, {Sent}_{3} ⟩$ covers $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ . According tothe $BuildOT$ algorithm, this means that $Q_{1} = Q_{2} \cup {d}$ and $Q_{1} = Q_{3} \cup {d^{'}}$ , with $d \notin Q_{2}$ and $d^{'} \notin Q_{3}$ . The descriptor selection mechanism for extending queries implemented in the algorithm guarantees that any query is extended by selecting a descriptor not appearing already as part of a query in a previous node at the same level (from left to right) or in ancestors of previous nodes at the same level. Assume, without loss of generality, that $Q_{2}$ is a predecessor of $Q_{3}$ . Then, according to the restriction on the descriptor selection mechanism mentioned above, we can conclude that $d^{'} \notin Q_{2}$ . As a result it must be the case (from $Q_{1} = Q_{2} \cup {d}$ , $Q_{1} = Q_{3} \cup {d^{'}}$ and $d^{'} \notin Q_{2}$ ) that $Q_{2} = Q_{3}$ and therefore $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩ = ⟨ {Arg}_{3}, Q_{3}, {Sent}_{3} ⟩$ . This concludes the proof. □

Once the opinion tree ${OT}_{Q}$ is built, it is possible to identify and merge equivalent TB-arguments. The equivalence relation between TB-arguments is induced by the equivalence of the corresponding queries, i.e., $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩ \sim_{query} ⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩$ if and only if $Q_{i}$ is equivalent to $Q_{j}$ (see Definition 2.7). The algorithm for merging equivalent TB-arguments is presented in Fig. 3. As we will see later, this algorithm returns an opinion superior lattice as a result. Figure 4 illustrates the application of this algorithm on an opinion tree. For the sake of simplicity, we use labels $Q_{1}, \dots, Q_{27}$ to represent $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩, \dots, ⟨ {Arg}_{27}, Q_{27}, {Sent}_{27} ⟩$ . On the left-hand side of this figure we can see an opinion tree as a tree order $({OT}_{Q}, ⪯_{Q})$ . Note that each element in ${OT}_{Q}$ is of the form $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ , while the order relation “ $⪯_{Q}$ ” is defined as $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩ ⪯ ⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ if and only if $Q_{2} \subseteq Q_{1}$ (see Definition 2.6). In this figure we have indicated that some queries are equivalent. As a consequence, based on the given algorithm, we can identify a quotient set, where each member is an equivalence class $[⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩]$ defined as the set ${⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩ ∣ Q_{j} is equivalent to Q_{i}}$ .

Fig. 5.
High-level algorithm for computing conflict (superior) lattices from opinion (superior) lattices.

Fig. 6.
From an opinion superior lattice (left) to a conflict superior lattice (right).

We show on the right-hand side of Fig. 4 the quotient set resulting from the given opinion tree. Note that this new set is a superior lattice (see Definition 4.4). In general, any opinion tree induces a superior lattice ${OL}_{Q}$ , which we will refer to as opinion (superior) lattice. This is formally stated in the following lemma:
Lemma 4.2.
Let Q be a query and let ${OL}_{Q}$ be the quotient set of ${OT}_{Q}$ by the query equivalence relation. Then $({OL}_{Q}, ⪯)$ is a superior lattice.
Proof.
In order to prove that $({OL}_{Q}, ⪯)$ is a superior lattice we need to prove that for any pair of TB-argument classes $[⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩]$ and $[⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩]$ in ${OL}_{Q}$ , $[⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩] \lor [⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩]$ is a TB-argument in ${OL}_{Q}$ . Since ${OL}_{Q}$ is the quotient set of ${OT}_{Q}$ by the query equivalence relation, and by Lemma 4.1 ${OT}_{Q}$ is a tree order, we have that $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩ \lor ⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ is the most specific common ancestor of $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩$ and $⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩$ . Then, for the given pair of TB-argument classes, we take $[⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩] \lor [⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩] = [⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩ \lor ⟨ {Arg}_{2}, Q_{2}, {Sent}_{2} ⟩]$ . This concludes the proof. □

Although an opinion (superior) lattice is typically more compact than an opinion tree, we might be interested in finding in a computationally effective way the minimal structure that reflects all existing conflicts between opinions for a given query Q. In other words, we want to build a minimal superior lattice $({CL}_{Q}, ⪯)$ such that it is possible to define a join-homomorphism h (see Definition 4.5) from $({OL}_{Q}, ⪯)$ to $({CL}_{Q}, ⪯)$ . In addition, we will require that if $h (⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩) = ⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩$ then ${Sent}_{i}$ and ${Sent}_{j}$ are non-conflicting. We will call ${CL}_{Q}$ the conflict (superior) lattice for Q. By applying a partitioning algorithm it is possible to obtain a conflict (superior) lattice from any opinion (superior) lattice. The algorithm for computing conflict (superior) lattices for a given opinion (superior) lattice is presented in Fig. 5.

Figure 6 illustrates the transformation of an opinion (superior) lattice into a conflict (superior) $⟨ {Arg}_{1}, Q_{1}, {Sent}_{1} ⟩, \dots, ⟨ {Arg}_{27}, Q_{27}, {Sent}_{27} ⟩$ . Initially, the 0-equivalent classes are computed based on the polarity of the sentiment associated with each TB-argument. Therefore, $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ and $⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩$ are in the same 0-equivalent class if and only if ${Sent}_{i} = {Sent}_{j}$ .

This results in the following classes: $\begin{array}{rcl} 0-equivalent classes: \\ {Q_{1}, Q_{2}, Q_{3}, Q_{5}, Q_{8}, \\ Q_{10}, Q_{12}, Q_{13}, Q_{14}, Q_{18}, Q_{19}, Q_{20}}, \\ {Q_{4}, Q_{6}, Q_{7}, Q_{9}, Q_{15}, Q_{16}, Q_{17}}, \\ {Q_{21}, Q_{22}} . \end{array}$

As specified in Algorithm $BuildCL$ , the n-equivalent classes are computed as a refinement of the $(n - 1)$ -equivalent classes. This will be characterized as follows: we will say that two TB-arguments $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ and $⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩$ are n-equivalent if and only if (1) $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩$ and $⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩$ are $(n - 1)$ -equivalent, and (2) for every $⟨ {Arg}_{k}, Q_{k}, {Sent}_{k} ⟩$ in ${OL}_{Q}$ it holds that $⟨ {Arg}_{i}, Q_{i}, {Sent}_{i} ⟩ \lor ⟨ {Arg}_{k}, Q_{k}, {Sent}_{k} ⟩$ and $⟨ {Arg}_{j}, Q_{j}, {Sent}_{j} ⟩ \lor ⟨ {Arg}_{k}, Q_{k}, {Sent}_{k} ⟩$ are $(n - 1)$ -equivalent.

In order to compute the 1-equivalent classes, we take each pair of 0-equivalent TB-arguments $Q_{i}$ and $Q_{j}$ and verifywhether $Q_{i} \lor Q_{k}$ is 0-equivalent to $Q_{j} \lor Q_{k}$ for all $Q_{k}$ . If this is the case, then $Q_{i}$ and $Q_{j}$ remain in the same 1-equivalent class. Otherwise, these two TB-arguments are distinguishable and each one is included in a different 1-equivalent class.

To illustrate this, take for instance $Q_{3}$ and $Q_{20}$ . Since for all $Q_{k}$ , $Q_{3} \lor Q_{k}$ is 0-equivalent to $Q_{20} \lor Q_{k}$ , we can assert that $Q_{3}$ and $Q_{20}$ are 1-equivalent (i.e., they are not distinguishable at this stage). On the other hand, take the pair $Q_{1}$ and $Q_{5}$ . We have that $Q_{1} \lor Q_{6} = Q_{1}$ and $Q_{5} \lor Q_{6} = Q_{4}$ . Furthermore, we know that $Q_{1}$ is not 0-equivalent to $Q_{4}$ . Since there is at least a $Q_{k}$ such that $Q_{1} \lor Q_{k}$ is not 0-equivalent to $Q_{5} \lor Q_{k}$ , $Q_{1}$ and $Q_{5}$ are included in separate 1-equivalent classes (i.e., they are distinguishable at this stage). After performing a similar analysis with the remaining pairs of TB-arguments, we obtain the following classes: $\begin{array}{rcl} 1-equivalent classes: \\ {Q_{1}, Q_{2}, Q_{3}, Q_{8}, Q_{10}, Q_{12}, Q_{13}, Q_{14}, Q_{19}, Q_{20}}, \\ {Q_{5}}, \\ {Q_{18}}, \\ {Q_{4}, Q_{6}, Q_{7}}, \\ {Q_{9}}, \\ {Q_{15}, Q_{16}, Q_{17}}, \\ {Q_{21}, Q_{22}} . \end{array}$

If we attempt to compute the 2-equivalent classes, we found out that they are identical to the 1-equivalent classes. Therefore the process terminates and the conflict (superior) lattice shown on the right-hand side of Fig. 6 is returned. Note that in addition, it is possible to verify that the canonical mapping, that maps each element in ${OL}_{Q}$ to its equivalence class in ${CL}_{Q}$ , defines a join homomorphism.
5. Application

In this section, we show how our framework can be applied in a real-world situation by presenting a possible user scenario. The described scenario shows ways in which a policy maker could recognize opinions from mass deliberations of citizens expressing their views on “taxes”. The topic of taxes is typically a trendy one in Twitter, in particular among United States citizens commenting on current tax legislation or on tax changes proposals advocated by their government.

The topic of taxes can be analyzed form various perspectives. A possible perspective would be by looking at opinions that address the issue of taxes on property. A second perspective could focus on the topic of the IRS scandal. A third perspective is provided by analyzing how the health care reform affects taxes. Yet another perspective emerges from analyzing the new tax law, which imposes higher taxes on families earning over $250,000 a year, without changing the situation for the middle class.

Fig. 7.

Opinion tree and conflict lattice for the query “taxes”. Results are simplified (4 nodes were left out). (Colors are visible in the online version of the article; https://dx-doi-org.web.bisu.edu.cn/10.3233/AIC-140627.)

The proposed tool facilitates the exploration of these various perspective by imposing a rich structure on a large set of unstructured tweets. In addition, it allows to easily recognize the polarity of each group of opinions (TB-arguments) as well as conflict relations between them. Figure 7 presents a conflict superior lattice for the query “taxes”. In this scenario, certain emerging TB-arguments could shed light on the general desires of citizens. For instance, the fact that the sentiment polarity of “taxes pay companies” is positive may be indicating that the general public expect companies to pay higher corporate taxes. In addition, the use of the tool to identify current topics, such as those associated with the queries “taxes irs scandal” or “taxes health” could greatly help decision and policy makers define priorities and better address citizens’ present-day concerns.

We have developed a Java prototype8

⁸

Available to download from http://cs.uns.edu.ar/~cic/decide2.0/twitter.zip.

as a beta version of a software tool for mining opinions from Twitter. This prototype was used for the analysis of the abortion case (Section 2.3) and the previous tax example.

6. Discussion

Our mathematical characterization of opinion and conflict trees as superior lattices provides a natural foundation for the analysis of important concepts prevailing in argumentation theory. In particular, the use of conflict (superior) lattices to represent diverging arguments leads to the identification of the minimal structure that reflects the existing collective positions with respect to a topic of interest.

From the user viewpoint, conflict lattices are intended to provide the theoretical basis for developing an explorative tool in a decision making platform. Consider for example the opinion tree based on the query $“ abortion ”$ . By having the conflict tree at hand, an analyst9

⁹
Possible users could be, e.g., a journalist, a deputy analyzing a law proposal, etc.

would be able to easily identify which are the terms or keywords that induce a sentiment shift when considering different tweets. For the case in Fig. 2, it can be noted that we get a conflict tree (which can be considered as particular case of conflict lattice). In a more general situation, conflict superior lattices provide a suitable mathematical structure for avoiding redundancies when considering attacks in conflict trees. For example, in Fig. 6(right), the analyst will be able to identify a single argument (Q18) which simultaneously attacks two other arguments (Q15 and Q21).

Argument specificity is a key notion in argumentation theory, as it is the first purely syntactic preference criterion proposed to compare arguments. In our framework, specificity can be associated with the “⪯” relation identified in the resulting superior lattices. The use of minimal structures to represent conflicting views facilitates the identification of specificity relations as well as the recognition of relevant (or irrelevant) elements in the argumentation space, as it is formalized by the notions of sentiment-shifting descriptors (or sentiment-preserving descriptors). Similarly, the minimal-shift relation “ $⪯_{Q}^{\min}$ ” can be intuitively studied in the light of the resulting mathematical structures. It must be remarked that our dialectical analysis of TB-arguments aims at modeling the possible space of alternatives associated with different (incrementally more specific) queries. In contrast, the dialectical analysis in standard argumentation frameworks [3,19] aims at determining the ultimate status of a given argument at issue (in terms of some acceptability semantics).

It is important to mention that our analysis was done for the English language only. This is due to the fact that English is the lingua franca worldwide, being widely used in Twitter. In addition, most existing sentiment analysis tools assume English as the underlying language. We are currently developing a sentiment analysis tool for the Spanish language, which will allow us to extend the capabilities of the system. We will also investigate the benefits of using a stemming algorithm for Spanish.

7. Related work

Our approach is inspired by recent research in integrating argumentation and social networks (e.g., [13,22]). In the last years, there has been growing interest in assessing meaning to streams of data from microblogging services such as Twitter, as well as some recent research on using argumentation for social networks.

To the best of our knowledge, Torroni and Toni [22] were the first to combine social networks and argumentation in a unified approach, coining the term bottom-up argumentation for the grass-root approach to the problem of deploying computational argumentation in online systems. In this novel view, argumentation frameworks are obtained bottom-up starting from the users’ comments, opinions and suggested links, with no top-down intervention of or interpretation by “argumentation engineers”. As the authors point out “topics emerge, bottom-up, during the underlying process, possibly serendipitously”. In contrast with that proposal, in this paper we generalize this view by identifying arguments automatically from Twitter messages, establishing as well conflict relationships in terms of sentiment analysis (and not specified at the meta-level using rules, as it is the case in [22]). This proposal was recently extended (see [8]), leading towards so-called “microdebates” to help organizing and confronting users’ opinions in an automated way. A microdebate is a stream of tweets where users annotate their messages by using some special tags. In contrast with this approach, in our proposal we are not explicitly searching for debates containing arguments and counterarguments. Rather, different opinions emerge automatically based on collecting tweets associated with a particular topic (TB-arguments), and interrelationships among opinions are obtained on the basis of sentiment shifting/preserving descriptors.

In [1], Abbas and Sawamura formalize argument mining from the perspective of intelligent tutoring systems. In contrast with our approach, they rely on a relational database, and their aim is not related with identifying arguments underlying social networks as done in this paper. In [13], Leite and Martins introduce a novel extension to Dung’s abstract argumentation model, called Social Abstract Argumentation. Their proposal aims at providing a formal framework for social networks and argumentation, incorporating social voting and defining a new class of semantics for the resulting frameworks. In contrast with our approach, the automatic extraction of arguments from social networks data is not considered (as done in this paper), nor the modeling of conflicts between arguments in terms of sentiment analysis. In [2], Amgoud and Serrurier propose a formal argumentation-based model for classification, which generalizes the well-known concept learning model based on version spaces [15]. The framework shares some structural similarities with our approach (as a lattice-based characterization is also involved when contrasting hypotheses). However, the aims of the two approaches are different, as our proposal is not focused on solving classification tasks in a machine learning sense.

A related research area is formal concept analysis [9], which is a method for deriving conceptual structures out of data. As done in our approach, the theory of partial orders is used to formally characterize these structures. However, it differs from our proposal in dealing with concepts rather than opinions and in not attempting to associate sentiments with the elements of the partial order. In addition, it does not deal with notions such as arguments, conflict and attack.

It must be remarked that the rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis. The EU funded Cyberemotions consortium10

¹⁰
http://www.cyberemotions.eu/.

was created in 2009 to better understand collective emotional phenomena in cyberspace, with the help of knowledge and methods from natural, social, and engineering sciences. Within this project, Thelwall et al. [20,21] carried out a number of experiments to assess the feasibility of sentiment analysis within social networks, with a particular focus on Twitter. In contrast with our approach, no opinion mining was considered in this context, nor the analysis of alternative opinions (as modeled by conflict trees in our proposal).

8. Conclusions and future work

In this paper we have presented a novel approach which integrates argumentation theory and microblogging technologies, with a particular focus on Twitter. To the best of our knowledge, no other approach has been developed in a similar direction. We have also presented a definition of a Twitter-based argument for a query Q that considers as a support the bunch of tweets which are associated with Q according to a given criterion. For such an argument, we also define a prevailing sentiment, obtained in terms of sentiment analysis tools. This allowed us to characterize the notion of opinion tree, which can be recursively built by considering arguments associated with incrementally extended queries. We have implemented a prototype of our proposal as a proof of concept, which was used to compute the opinion tree for the case study presented in the paper.

We have also presented a theoretical setting for analyzing Twitter-based arguments, associating a superior lattice rooted in the initial argument for the first given query. Based on the notion of attack between arguments, we have established as well a refined order relationship between conflicting arguments. As a result, from every superior lattice associated with a given query Q, a conflict tree rooted in Q can be built, in which alternating opinions can be better contrasted. Given a node A (argument) associated with query $Q^{'}$ with a prevailing sentiment s, every children node for A in a conflict tree corresponds to an argument for a more specific query $Q^{'}$ , which is in conflict with A as it is associated with a sentiment shift. Conflict trees allow us to explore the space of possible confronting opinions associated with a given opinion, using the specificity principle as traditionally used in argumentation for preferring arguments.

The prototype that we have implemented so far is intended to be used as a proof of concept. The development of a full-fledged software tool will require tackling several issues, mostly related with user-interface and usability aspects. We believe that before embarking on that stage, it is crucial to investigate and provide a full account of the functional capabilities of the proposed system, both based on a theoretical study and by validating its behavior through simulations with realistic data. Part of our future work will focus on improving the existing prototype, aiming at the deployment of a software tool for real-world users. As a basis for such deployment, visual tools for displaying and analyzing dialectical trees have been already developed for Defeasible Logic Programming [17]. We expect to use the underlying algorithms from this tool in our framework. Additionally, we expect to perform different experiments with hashtags associated with relevant topics, assessing the applicability of our approach in a real-world context.

Another future research avenue would be to take advantage of existing semantic information sources, such as dictionaries, topic directories or ontologies, to better explore query space, either by using synonyms of existing terms or other important terms for the domain under analysis. In addition, we anticipate that the proposed framework could be integrated with mechanisms that allow weighting TB-arguments based on different aspects, such as provenance of the tweets, number of associated tweets, opinion strength, etc. In particular, we are analysing the possibility of extending the current framework in order to consider ontologies for sentiment analysis of Twitter posts (as proposed in [12]). Finally, we are currently working on extending the current Twitter-based model to a more generic setting, in which opinions are collected from other social networks (such as Facebook).11

¹¹

http://www.facebook.com.

Research in this direction is currently being pursued.

Footnotes

Acknowledgements

The authors would like to thank anonymous reviewers which helped improve the original version of this article. This research work was funded by Projects LACCIR R1211LAC004 (Microsoft Research, CONACyT and IDB), PIP 112-200801-02798, PIP 112-200901-00863 (CONICET, Argentina), PGI 24/ZN10, PGI 24/N006, PGI 24/N029 (SGCyT, UNS, Argentina) and Universidad Nacional del Sur.

References

[1]

Abbas and

Sawamura, Argument mining based on a structured database and its usage in an intelligent tutoring environment, Knowl. Inf. Syst. 30(1) (2012), 213–246.

[2]

Amgoud and

Serrurier, Agents that argue and explain classifications, Autonomous Agents and Multi-Agent Systems 16(2) (2008), 187–209.

[3]

Besnard and

Hunter, The Elements of Argumentation, MIT Press, London, 2008.

[4]

Cao,

M.A.

Thompson and

Yu, Sentiment analysis in decision sciences research: an illustration to it governance, Decision Support Systems 54(2) (2013), 1010–1015.

[5]

C.I.

Chesñevar,

A.G.

Maguitman,

Estevez and

R.F.

Brena, Integrating argumentation technologies and context-based search for intelligent processing of citizens’ opinion in social media, in: ICEGOV,

Ferriero,

T.A.

Pardo and

Qian, eds, ACM, New York, 2012, pp. 166–170.

[6]

Choi and

Kim, Sentiment analysis for tracking breaking events: A case study on Twitter, in: ACIIDS (2),

Selamat,

N.T.

Nguyen and

Haron, eds, Lecture Notes in Computer Science, Vol. 7803, Springer, Berlin, 2013, pp. 285–294.

[7]

Feldman, Techniques and applications for sentiment analysis, Commun. ACM 56(4) (2013), 82–89.

[8]

Gabbriellini and

Torroni, Large scale agreements via microdebates, in: Proceedings of the First International Conference on Agreement Technologies, AT 2012, Dubrovnik, Croatia, 15–16 October 2012,

Ossowski,

Toni and

G.A.

Vouros, eds, CEUR Workshop Proceedings, Vol. 918, 2012, pp. 366–377, available at: ceur-ws.org.

[9]

Ganter and

Wille, Formal Concept Analysis – Mathematical Foundations, Springer, 1999.

10.

[10]

A.J.

García and

G.R.

Simari, Defeasible logic programming: an argumentative approach, Theory and Practice of Logic Programming 4(1,2) (2004), 95–138.

11.

[11]

Grosse,

C.I.

Chesñevar and

A.G.

Maguitman, An argument-based approach to mining opinions from Twitter, in: Proceedings of the First International Conference on Agreement Technologies, AT 2012, Dubrovnik, Croatia, 15–16 October 2012,

Ossowski,

Toni and

G.A.

Vouros, eds, CEUR Workshop Proceedings, Vol. 918, 2012, pp. 408–422, available at: ceur-ws.org.

12.

[12]

Kontopoulos,

Berberidis,

Dergiades and

Bassiliades, Ontology-based sentiment analysis of Twitter posts, Expert Syst. Appl. 40(10) (2013), 4065–4074.

13.

[13]

Leite and

Martins, Social abstract argumentation, in: IJCAI,

Walsh, ed., IJCAI/AAAI, 2011, pp. 2287–2292.

14.

[14]

Martineau, Identifying and isolating text classification signals from domain and genre noise for sentiment analysis, PhD thesis, Univ. Maryland, Baltimore County, USA, 2011.

15.

[15]

T.M.

Mitchell, Generalization as search, Artif. Intell. 18(2) (1982), 203–226.

16.

[16]

Mizumoto,

Yanagimoto and

Yoshioka, Sentiment analysis of stock market news with semi-supervised learning, in: ACIS-ICIS,

Miao,

R.Y.

Lee,

Zeng and

Baik, eds, IEEE, 2012, pp. 325–328.

17.

[17]

Modgil,

Toni,

Bex,

Bratko,

Chesñevar,

Dvořák,

M.A.

Falappa,

S.A.

Gaggl,

A.J.

García,

M.P.

Gonzalez,

T.F.

Gordon,

Leite,

Mozina,

Reed,

G.R.

Simari,

Szeider,

Torroni and

Woltran, The added value of argumentation: Examples and challenges, in: Handbook of Agreement Technologies, Springer, Berlin, 2013, pp. 357–404.

18.

[18]

Pak and

Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in: LREC,

Calzolari,

Choukri,

Maegaard,

Mariani,

Odijk,

Piperidis,

Rosner and

Tapias, eds, European Language Resources Association, 2010.

19.

[19]

Rahwan and

Simari, Argumentation in Artificial Intelligence, Springer, Berlin, 2009.

20.

[20]

Thelwall,

Buckley and

Paltoglou, Sentiment in Twitter events, Journal of the Association for Information Science and Technology 62(2) (2011), 406–418.

21.

[21]

Thelwall,

Buckley and

Paltoglou, Sentiment strength detection for the social web, Journal of the Association for Information Science and Technology 63(1) (2012), 163–173.

22.

[22]

Toni and

Torroni, Bottom-up argumentation, in: TAFA,

Modgil,

Oren and

Toni, eds, Lecture Notes in Computer Science, Vol. 7132, Springer, Berlin, 2011, pp. 249–262.

23.

[23]

Wagner,

Singer,

Posch and

Strohmaier, The wisdom of the audience: An empirical study of social semantics in Twitter streams, in: ESWC,

Cimiano,

Ó.

Corcho,

Presutti,

Hollink and

Rudolph, eds, Lecture Notes in Computer Science, Vol. 7882, Springer, Berlin, 2013, pp. 502–516.

Integrating argumentation and sentiment analysis for mining opinions from Twitter

Abstract

Keywords

1. Introduction and motivations

2 www.twitter.com.

2.1. Formalizing aggregation of Twitter messages

Definition 2.1 (Tweet. Twitter query).

Definition 2.2 (Tweet set. Aggregation operator).

6 A possible range for S could be positive, negative and neutral (as done for example in platform sentiment140.com). In this platform, prevailing sentiments associated with a tweet set are expressed by percentages.

Definition 2.4 (TB-framework).

Definition 2.5 (TB-argument).

Definition 2.6 (Argument selectivity).

Definition 2.7 (Query equivalence).

Definition 2.8 (Query subsumption).

7 This example was obtained from Twitter in December 2012, when Michigan legislature was debating several regulations on abortion practices.

Definition 3.1 (Argument attack).

Definition 3.3 (Minimal-shift descriptor. Minimal-shifting relation).

Definition 3.4 (Conflict tree).

4. Generalizing opinion and conflict trees as superior lattices

Definition 4.1 (Partial order. Partially ordered set).

Definition 4.2 (Cover relation).

Definition 4.3 (Tree order).

Definition 4.4 (Superior lattice. Inferior lattice. Lattice).

Definition 4.5 (Join-homomorphism. Meet-homomorphism. Lattice homomorphism).

9 Possible users could be, e.g., a journalist, a deputy analyzing a law proposal, etc.

10 http://www.cyberemotions.eu/.

Footnotes

Acknowledgements

References

²
www.twitter.com.

⁶
A possible range for $S$ could be positive, negative and neutral (as done for example in platform sentiment140.com). In this platform, prevailing sentiments associated with a tweet set are expressed by percentages.

⁷
This example was obtained from Twitter in December 2012, when Michigan legislature was debating several regulations on abortion practices.

⁹
Possible users could be, e.g., a journalist, a deputy analyzing a law proposal, etc.

¹⁰
http://www.cyberemotions.eu/.