Abstract
Argumentative debates are a powerful tool for reaching agreements in open environments. However, in large scale settings, such as social networks, making sense of ongoing debates may be a compelling task, and debates risk to lose their effectiveness. We thus propose “microdebates” to help organizing and confronting users’ opinions in an automated way.
Introduction
According to Mercier and Sperber [45], the need to create and use arguments to convince others is the main driver behind the evolution of human reasoning. Supporting evidence is how good people are at reasoning when they communicate through an argumentative context, rather than in an abstract setting. Moreover, arguments are used to convince others especially in absence of trust.

Network graph representation of the arguments about Lights for sensors in the Evidence Hub for Energy Awareness in KMi. Retrieved on November 22, 2013 from
Given the prominent role of arguments in human reasoning, it comes as no surprise that, also in the context of social media, people became accustomed to arguing online. Online debates are usually organized in threaded sequences of posts and differently from debates requiring physical presence, they can be very long-lived and involve many actively involved participants and an even larger number of observers and bystanders.
However, this “freedom of expression” comes at a cost. When many individuals participate in a discussion, making sense of opinions emerging from these streams of unstructured text may be challenging even for the participants themselves: there is simply too much “noise”. One way of coping with such noise is to restrict one’s focus on the general sentiment emerging from an ongoing discussion, ignoring any specific claims or opinions that may be there. State-of-the-art opinion mining/sentiment analysis techniques and tools classify the sentiment orientation of opinions by defining positive/negative scales of values for every specific domain. Usually these algorithms need training from large corpora of sentences expressing opinions, such as online reviews about some product, brand, or service [43,49].
Such an approach is especially effective if the domain is well-defined, and if a large enough training set is available. In domains such as customer reviews [36], where the concepts involved can be defined using specialized ontologies, and the jargon is relatively narrow and well-defined, the classification accuracy of existing sentiment analysis algorithms is more than acceptable. In other domains instead, such as in political debate, this is not the case [61]. Importantly, sentiment analysis does not explicitly tell why certain opinions are in place and how they relate to other opinions.
A second approach is to force users to structure their contributions using dedicated tools. Debate-friendly tools that should help users visualize and understand the outcome of a discussion are now becoming popular. Among them there are (a) visualization tools, such as DebateGraph;2
An example of the last category is Debatepedia,5
Applications of such debate-friendly tools include e-gov, e-participation and policy-making. For these specific purposes, diverse platforms are being developed within a number of EU cooperation projects. The present work is part of the ePolicy7
One of ePolicy’s aims is to develop methods for deriving social impacts through opinion mining on e-participation data extracted from forums and blogs. Other related projects are IMPACT,8
Grosse et al. [29] are also oriented towards the deployment of an e-gov framework in which argumentation and debates are used. That is particularly related to our work, because it uses a Twitter-based perspective. However, the focus is on mining opinions from Twitter, by taking Twitter messages into consideration for analyzing underlying arguments, rather than on the definition of a Twitter dialect and on the identification of the emerging computational argumentation framework, which is the subject of the present article. On the other hand, the idea behind DECIDE 2.0, by Chesñevar et al. [14] – a framework integrating argumentation technologies and context-based search [44] for processing citizens’ opinions in social media – is to employ context-based search and knowledge engineers with expertise on defeasible logic programming [28] who will be in charge of maintaining a knowledge base. This is a major difference from the bottom-up argumentation approach which, as we will see shortly, envisages a self-regulating discussion without expert intervention.
More recently, a cluster of projects has been funded in the CAPS (Collective Awareness Platforms for Sustainability and Social Innovation) initiative of the EC under FP7.10
The WeGov toolbox, like DECIDE 2.0, relies on sentiment analysis algorithms. IMPACT instead has developed a language-independent approach which aims at modeling policy issues at a conceptual level. The project delivered an argumentation toolbox for supporting open, inclusive and transparent deliberations. The IMPACT toolbox supports argument reconstruction from online text data, and visualization of arguments about policy issues. The argumentation toolbox is pluggable into existing content management systems in order to be used with a variety of e-participation platforms.
These activities show a general interest in argument retrieval for understanding how people argue about policy issues. In such settings it is reasonable to expect users to publish not only an opinion (as in writing a review), but also to expand on their opinions by laying out arguments, to convince others and reply to criticisms. Following this stream of research, in the scope of ePolicy we have been working along two directions: extracting arguments from text [41], and fostering a bottom-up approach to argument production. The present paper introduces and discusses this second approach.
The idea of bottom-up argumentation [55] is that traditional opinion gathering methods, such as questionnaires and polls, pose severe limitations in that they mainly force those interviewed to express preferences upon some predetermined options. Social media may overcome such limitations, enabling online debates between (informed) citizens, who can come up with new ideas and perspectives. The problem is then how to keep conversations manageable, thus debate-friendly and effective tools are needed.
Current online debating tools, such as the aforementioned ones, build on and extend the traditional forum-like structure, where users can reply to or quote other users, by introducing debate-oriented concepts. They are not very different from a standard discussion forum with reputation, moderators and recommendation features. Moreover, they require the user to comply with and adapt to the abstractions they are built around, and not vice-versa.
We believe that introducing built-in debate-oriented concepts is an important step in understanding people opinion and facilitating online debate. However, it is hard for such tools to achieve the huge amount of traffic of the main social networks. This is not necessarily related to the size of marketing (i.e. monetary) investments, but could be well explained instead with the concept of tipping point. Even big companies, like Apple or Google, tried to launch a service, like Ping or Orkut, and then were forced to shut it down because it did not reach the threshold of users’ engagement (the so-called hype moment) needed to spread a new online behavior.
Other companies, like Facebook, YouTube and Twitter, had instead a huge success. Would it be possible to exploit the massive online presence of such services to gather structured debates?
The WeGov project is somewhat moving in this direction by developing a toolbox that detects, tracks and mines opinions and discussions on policy oriented topics among existing and well established social networking sites, like the aforementioned and other ones such as Wordpress and Baboo. The key idea is to mine opinions and engage with users by means of channels citizens are already familiar with. However, the project mainly focuses on describing the key features of a discussion, such as: which post is most relevant, how active or important a discussion is, or which user is perceived as most influential, rather than on argument retrieval and on the reasons why users make such and such claims.
Our aim is to encourage free, unconstrained online debates and to provide policy-makers and users with tools to automatically make sense of possibly very lengthy debates. Such tools should not only show the general sentiment around a specific topic, as current sentiment analysis tools do. Instead, they should also identify specific opinions, as well as the relations among them. At the same time, the tools should gather structured debates from well-established social networks, in order to take advantage of the high level of participation and rich discussions that already take place in such contexts. But how to retrieve structured debate from online social networking sites? Following the bottom-up argumentation approach, we identified a possible solution, summarized in the phrase: structuring debates without a structuring tool.
We rely on a well-established convention among online users, that is the ability to tag their own messages [1]. We selected Twitter to start our experimentation, because Twitter users are already accustomed to annotating their messages with

A fragment of a Twitter stream, showing an example of microdebates. Twitter organizes its entries top to bottom from newest to oldest.
In this article, we describe microdebates for Twitter. Figure 2 shows an example of them. The microdebate language is a simple yet powerful Twitter dialect. It allows users to debate about an issue, by explicitly referring to emerging arguments, either in support or in opposition, with no need to learn new interfaces or move to new social networks. We identify computational argumentation, and in particular abstract argumentation [21], as the conceptual and computational framework to model the retrieved arguments and reason from them automatically.
This paper offers a comprehensive description and an empirical evaluation of microdebates and their enabling software tools. It is structured as follows. In Section 2, we show that the most popular organization of online conversation does not suit well to argumentative debates, and motivate the introduction of new solutions. In Section 3 we introduce Twitter and micro-blogs. We then define microdebates, their syntax, and provide examples of bottom-up argumentation via microdebates. Enabling software tools are the subject of Section 4. In Section 5 we introduce the notion of group support of arguments, and discuss he role of weights in microdebates. Section 6 presents some experimentation done with different user groups and elaborates on the observed results. We conclude with Section 7, where we discuss areas of application, criticalities, and future perspectives.
Threading is a popular way to structure online conversations, common to diverse media such as email, blogs, discussion forums, and online platforms in general. User messages (comments, posts, tweets, emails, etc.) are grouped in a hierarchy by topic, with any replies to a message arranged visually near to the original message. A set of messages grouped in this way is called a thread. Figure 3 shows an example of a threaded discussion on Facebook.

Excerpt of one of the Syria threads on New York Time’s Facebook home page, retrieved on September 27, 2013. These are only a small fraction of over 200 comments. (Colors are visible in the online version of the article;
Organizing messages via threads has many advantages. For example, in an educational environment, online threaded discussions may offer a forum for quiet students to develop and verbalize ideas; promote in-depth response and reflection; encourage peer affirmation; and provide opportunities for more teacher–student and student–student interaction [24].
Threads are very useful for sorting mail, and other types of online content originated by a reply-to mechanism. However, threading does not correspond to the way natural human conversations take place in the real world. Threaded comments risk to break up the dialogue into a bunch of private conversations instead of an ongoing, open discussion, which leads to a more confrontational debating style [11,33].
In computational argumentation, and in abstract argumentation in particular [21], arguments are unordered objects, connected with one another by binary attack relations. This results in general in a directed argument graph, as opposed to an argument thread. A thread is a special case of a directed graph. If we were to map comments to arguments, and threaded discussions to argument graphs, we would obtain a tree of arguments, which is in general less expressive than an argument graph.
Arguments and their attacks form argumentation frameworks. A (Dung-style) argumentation framework (AF) is defined as a pair
S is conflict-free if
an argument
S is an admissible extension if S is conflict-free and all its arguments are acceptable w.r.t. A;
S is a preferred extension if it is a maximal admissible set, w.r.t. set inclusion;
S is a complete extension if S is admissible and
Other extension-based semantics have been defined, such as the grounded, stable, semistable, and ideal semantics [2].
From a practical viewpoint, nowadays there are many efficient implementations of abstract argumentation semantics [13], which can be used to support social applications with an underlying Dung-style knowledge representation. Figure 4 shows the output produced by the web interface of one such tool, ASPARTIX,12

Argumentation framework emerging from the Syria microdebate: ASPARTIX interface showing a complete extension. (Colors are visible in the online version of the article;
If argument graphs are in fact argument threads, the last arguments in the thread are by construction unchallenged, and therefore acceptable. In a sense, this reflects the way threaded discussions evolve: the most recent post has, typically, an advantage over its predecessors, as it is often displayed on top of them, and – at least for a while – it stays unchallenged. On the other hand, we think that graphs represent a better way to organize arguments in a debate.
Our work is an attempt to use the existing technology to host debates where arguments and their attack relations are clearly identified, and mutually compatible arguments and opinions can be clustered and visualized together.
Our proposal is not to do away with thread-based discussions altogether, but to enhance them. We enable authors to mark the arguments in their posts, and let arguments be expressed in several posts, possibly by several users, in a collaborative fashion, as opposed to mapping individual posts to arguments one-to-one. In this way, we make it possible for “older” arguments to attack “newer” arguments. This also allows posts to make explicit reference to the arguments they attack.
Micro-blogging is a form of communication whereby users can describe their current status in short posts distributed by instant messages, mobile phones, email or the Web [20]. A very popular platform for micro-blogging is Twitter, where people talk about their daily activities and seek or share information [34] by broadcasting brief textual messages (tweets) to their followers [32].
Users can also add tags to their messages. Such tags include the now famous hashtag, i.e., the
Figure 5 shows a tweet broadcasted on January 22, 2013 by user

A sample tweet. (Colors are visible in the online version of the article;
The tweet indicates a reposting (
The Twitter jargon may seem cryptic to the novice. Nevertheless, it is interesting to notice its adoption by government officials. For instance, as of January 2013, all US senators as well as 90% US representatives were on Twitter, as officially announced by the Company.13
Recently, Twitter has also endorsed the
The interesting fact about the hashtag (from a sociological perspective) is that users invented it. Twitter users started adding hashtags to their messages sometime around February 2008 [8]. In a short while, hashtags became very widespread. Twitter simply accommodated its users’ behavior by highlighting the hashtags in the tweet and by facilitating their retrieval.
Tagging behaviors in Twitter are interesting not only for their bottom-up nature, but also because they are distinct from those in other social media. Twitter users are less likely to index messages for later retrieval [51]. This reflects the fact that tagging patterns in Twitter have a conversational rather than organizational nature [31], i.e., users follow what people are saying about a topic by following the related tag.
Users can also reply to tweets, which results in threaded discussions. As we pointed out in Section 2, threaded discussions are not expressive enough to represent argument graphs. Microdebates address exactly this shortcoming.
A microdebate is a set of tweets, each contributing to a debate. The contribution may be for instance a statement expressing an opinion, providing some evidence, or defining a fully-fledged argument. Such tweets may contain explicit references to ideas expressed in other tweets in the same debate. Such reference is made via short combinations of characters that express positive or negative relations.
In this way, all that is asked of the user is to use certain combinations of characters in order to put their opinion in the context of other opinions. In exchange, debates will be easier to parse, and a number of visualization tools could be deployed to facilitate browsing, participation and focus. Indeed, microdebates can be processed by automatic reasoners, such as argumentation-based reasoning tools [7,23] and the output can be visualized graphically as clusters of coherent opinions, where different clusters may attack each other. This could foster awareness of different opinions on a topic. In some cases it may encourage arguers to reach an agreement.
Microdebates are inspired by Twitter’s micro-blogging nature. They consist in a stream of tweets annotated with all the available tags, plus two new tags to mark opinions and conflicts between opinions. Introducing these special tags to a tweet enables us to convert a stream of tweets into a microdebate – where the prefix “micro” reflects the micro-blogging nature of tweets.
Let us summarize the meaning of tags in the microdebate Twitter dialect:
There is no special syntax for tweets belonging to a microdebate, other than the usual Twitter syntax which imposes a 140 character limit for a tweet, and space-free tags. However, tweets belonging to a microdebate should at least contain a discussion identifier (hashtag), and an argument identifier (double-cashtag). There are no other restrictions on the number and type of tags a tweet can contain.
This is how microdebates work in practice:
content elements are tweets with a suitable
users annotate their tweets using
users can attack (counter) opinions using the
a single tweet may as well fit to more than one argument; in that case, it may include more than one double-cashtag, indicating support to a set of opinions. Similarly, a single tweet may explicitly attack more than one argument: in that case, it will include more than one bang-cashtag;
if a user adds a tweet with a new
Our interpretation of this set of tags allows us to distinguish between different microdebates, because each of them will have a different hashtag, and between supported (
To demonstrate how microdebates unfold, let us consider again the stream of tweets in Fig. 2, between three fictitious Tweeter users: Angel Eyes (
In this example, a microdebate consisting of around 30 tweets eventually produces around 20 focal arguments. Such arguments are not statically defined. They do not even exist before the debate starts. We can see instead how arguments take shape bottom-up [55] as time goes by.
From a Twitter user’s perspective, especially for those interested in policy issues, the motivation to use microdebates is that by doing so opinions can be named and thus they can be identified and made gain prominence, which is not possible in standard Twitter exchanges. This is thanks to a number of visualization tools that we are currently developing (see Section 4).
In the context of ePolicy, where we are interested in gathering and analyzing e-participation data extracted from forums and blogs, and where we aim to encourage citizen to participate and get engaged in the policy-making life cycle, this approach seems to offer a promising avenue.
In the next section, we discuss some prototypes we have implemented so far.

A snapshot of a discussion labeled
The first microdebate analysis tool prototype was implemented as a NetLogo model [59]. Figure 6 shows a snapshot of its user interface. In this model, each NetLogo agent (or turtle) represents an argument used in the microdebate. Attacks between arguments are represented by directed links from one agent to another one. Notice that in NetLogo turtles are primitive types, and they are essential elements in creating and reasoning about graphs. Therefore, unintuitive as it may seem to those not familiar with the language, using turtles and direct links to model graphs, such as AFs, is common practice in the NetLogo community. The choice of language for implementing this first prototype fell on NetLogo because that is a widely known framework to potential audiences, such as computational sociologists, and because of its simplicity.
NetLogo does not provide native methods for computing the semantics of abstract AFs. Instead, we bundle together, using the NetLogo API, an extension called semconarg. That includes ConArg [7], a Java tool that uses constraint programming to model and solve different reasoning problems related to abstract AFs. In particular, we rely on ConArg to compute admissible, complete, and stable semantics with and without weights (see Section 5), and to solve problems such as enumerating and counting all the extensions and deciding if an argument is credulously or skeptically accepted. The outcome of such reasoning is reflected in the visualization.
A concise summary of reasoning problems in AFs is provided by Charwat et al. [13], where the authors also report known results about computational complexity of reasoning in AFs. An experimental assessment of ConArg is given in [6]. Due to the lack of benchmarks for abstract argumentation systems [46], but still wanting to study the scalability of their algorithm on realistic cases, the authors run experiments on particularly complex networks of different topologies, such as those known in the literature as Barabasi, Kleinberg, Erdös–Rényi and Watts–Strogatz networks. The experiments show that in such worst-case AFs ConArg can find all the stable and complete extensions of networks of 40 to 60 nodes in a matter of seconds on an Intel® CoreTM i7 2.4 GHz processor, with 16 Gb of RAM. In reality, our initial experiments indicate that AFs deriving from online debates produce much sparser graphs, therefore we expect ConArg to be able to handle even larger AFs without problems. According to our measurements, most of the computation time is instead devoted to the visualization tasks.

3D view of the
To use the NetLogo model, as a first step, one should enter a hashtag identifying a microdebate in the GUI’s debate text box. For instance, the debate identifier in Fig. 6 is
a new argument for each new double cashtag encountered;
a new attack for each pair (double cashtag, bang cashtag) encountered.
The NetLogo tool analyzes the content of the tweets and then visualizes the outcome. The user can interact with the graph in 2D (see Fig. 6) or by moving, orbiting, and zooming a 3D representation of the underlying AF, such as the one in Fig. 7. Nodes and edges have different radius and thickness, depending on the number of times an argument or an attack is found in the given set of tweets. It is possible to inspect edges and nodes to find which tweets are associated with them.
Our prototype supports all semantics supported by ConArg, which include, among others, the admissible, complete, and stable semantics. The interface permits to choose a specific microdebate by selecting a debate ID (buttons debate and Get Debate), indicate which semantics to apply (buttons semantic and Get Extension), and finally chose one among possibly multiple semantics extensions to visualize (buttons available-extensions and Apply). The user can also specify an alpha value, whose meaning will become clear in the next section.
Figure 6 shows all argument tags of the selected debate. A complete extension – actually, the only complete extension – in this example contains two arguments, represented by larger circles on the right-hand side of the figure:
Alongside developing analytic tools for the policy maker, we have been developing user-oriented tools. In particular, we have implement a web service and interface for storing and browsing microdebates, and an Android application. One can therefore take part in microdebates either via a Twitter client (such as the Twitter web site or any mobile App for Twitter), in which case the microdebate will not be visualized any differently than any other Twitter stream; or one can use the Microdebates web site (

Some tagclouds generated from the Syria discussion. (Colors are visible in the online version of the article;
Figure 8 shows automatically generated tag clouds for some arguments in the Syria microdebate. Each tag cloud shows a group of keywords that describe, somehow, the content of the tweets in support of that argument. The font color indicates the status of the argument in the debate, which can either be skeptically accepted (present in all extensions), credulously accepted (present in at least one extension) or defeated (absent from all extensions). The font size is indicative of the prominence of a certain keyword in the set of tweets supporting the central argument. When the user touches an argument’s tag cloud, the App shows the tweets that support or attack such an argument. Arguments are ranked based on the support they receive (see Section 5) and displayed top to bottom based on their ranking.
In this way, each tweet can potentially impact on the visual representation of a discussion: some keywords can emerge, gain emphasis, or disappear as new tweets about a give argument are broadcasted. By giving users a possibility to contribute also visually to a discussion, we can improve an ongoing discussion in terms of focus, accessibility, and impact.
Because of tag clouds which rely on linguistic features, the Microdebates visualization in the web site and Android App is language-dependent. Currently, English and Italian are supported. The language is automatically detected by the server routines.
As Dung observes [21], the way humans argue is based on a very simple principle which is summarized succinctly by an old saying: The one who has the last word laughs best.
Indeed, considering all attacks equally important may give raise to counter-intuitive outcomes and makes the whole framework unstable. It may happen that many users believe in an argument, or in a specific attack, and what we would expect is that such argument or attack is, somehow, acceptable. However, a new attack on a single argument posted by an individual user may very well suffice to defeat everything everybody else agrees upon. Although this is not a logical problem in the abstract world, it does create a problem in concrete applications of microdebates. We do want to let mainstream arguments be challenged and possibly defeated by new arguments, but intuitively, this should happen only if there is enough consensus on such new arguments.
Notice that this is not only a problem of microdebates. Online community discussions are subject to this and all other kinds of “bad behaviour”, whose effects can be limited with the help of various techniques [35].
For microdebates, we address the issue by relying on weighted argumentation frameworks [22]. Thus we consider not only the content of the tweets, but also the number of tweets containing the same arguments and attacks, and how many times these are re-tweeted.
Weighted Argument Frameworks (WAFs) are a natural extension of Dung’s AFs. The idea is simple. Each attack between two arguments has a weight that specifies its intensity, thus a weighted argument system is a triple
A possible semantics of WAFs is described by Dunne et al. using the idea of inconsistency budgets [22]. An inconsistency budget β defines how much inconsistency we are prepared to tolerate within an argumentative framework. Then, we can disregard attacks up to a total weight of β in order to find extensions in a Dung-style AFs.

Another snapshot of the NetLogo-based analysis tool, this time considering weights. (Colors are visible in the online version of the article;
Some generalizations of this approach emerged in recent literature. In particular, Coste-Marquis et al. [16] focus on generalizing the WAF setting by considering other ways to aggregate weights than using summation and to show how weights can be exploited to define new notions of extensions. Leite and Martins [39] propose a class of semantics for “social abstract argumentation”, extending Dung-style frameworks with positive (pro) and negative (con) votes. Efficient implementations of these semantics have been presented in [15].
Bistarelli and Santini [7] propose a solution that captures the semantics of the different metrics used in literature by independent models with a single parametric semiring-based framework. That leads to a unifying modeling framework, supported by Soft Constraint Programming techniques. Moreover, they focus on small-world networks and are strongly oriented towards application in real-world contexts, like discussion fora or online social networks where arguments may be rated by users leading to the definition of WAFs. These algorithms are efficiently implemented in the aforementioned ConArg Java tool (see discussion in Section 4), which suffices for our purposes. For these reasons, we decided to adopt Bistarelli and Santini’s solution, and embed their software in semconarg.
Inconsistency budgets come in handy when we try to incorporate the wisdom of crowd into argumentative systems. In a social networked environment it does matter how many people agree with a certain attack, and that should impact on the outcome of the debate, in terms of the extensions we are prepared to accept.
In our setting, weights are related to how many users reiterate the same attack. In particular, every new attack has an initial weight of 1. For each new tweet that expresses an attack already present in the framework, if the tweet comes from a user that has not expressed that attack yet, the weight associated to that attack is increased by one. This also applies to re-tweets.
ConArg proposes a new approach to weighted extensions, called α-extensions, which suits particularly well to microdebates. Similarly to Dunne et al.’s inconsistency budgets, a certain level of inconsistency in the theory is tolerated, thus attacks that sum up to the threshold level β, but do not exceed β, are tolerated within each extension. In ConArg’s α-extensions, however, defenses are also weighted, as well as attacks. In order to defend itself, an extension should have arguments that counter-attack external attacks, as usual, but the counter-attacks should overall outweigh the external attacks. An α-extension is thus a set of arguments that defends itself because those who agree with it outnumber those who agree with its attackers.
This feature is particularly relevant in our context: to be successful, an attack should be not only put in place, but also reach a significant consensus, otherwise it will not be effective. In this way, attacks that do not attract enough consensus are disregarded and the argumentative framework, i.e. the microdebate, cannot be spoiled by a single user with a “last minute” broadcast.
The application of α-extensions is illustrated in Fig. 9. The microdebate is the same as in Fig. 6, but this time repetitions and
As we see from the picture, adding group support reverses the dominant position in the microdebate. The α-extension contains
So we can say that if the outcome of microdebates is decided using WAF, the one who has the last word laughs best – only if enough words have been said that agree with that last word.
However, while microdebates allow users to contribute to the acceptability of an argument with multiple tweets, there is no guarantee that each argument in the microdebate is a well-formed one. We cannot guarantee that, because by adopting an abstract approach, we do not look inside the arguments. Thus the outcome of microdebates should not be considered as a logical truth, but as a socially accepted position.

Some examples of AFs extracted from our initial experimentation with students at the University of Bologna. (Colors are visible in the online version of the article;
We made some initial experiments in November 2012 with students from the University of Bologna. We asked them to use microdebates to discuss about certain issues, such as:
what sort of jobs they wish for their future ( what they think of microdebates ( what they think of a candidate for the Italian Democratic Primaries ( what they think of speaking in public (
Since back then we did not have a server to collect tweets nor any specially designed user interface, we asked users to participate using their devices and tools, and to address their tweets also to a Twitter profile we created specifically for this purpose (
Figure 10 shows a number of graphs corresponding to some of the microdebates resulting from the experimentation. We collected twelve microdebates. Those illustrated here belong to a sample that span from very short dialogues to more complex debate structures.
The resulting microdebates are of various graph structures, as there are no restrictions on the number and chronological order of the attacks. Indeed, linear graphs (see Fig. 10(b), (f)) as well as non-linear graphs (see Fig. 10(d), (e)) were possible outcomes of microdebates.
Based on this initial experimentation with students we observed, encouragingly, that users that have never been exposed to any type of formal argumentation, computational or otherwise, got accustomed to our syntax almost immediately.
As of January 2014, the Microdebates App tool introduced in Section 4 became available, so we could run further experimentation. A first study, described in [60], was designed to understand whether Microdebates App for Android could provide understandable, useful input to a human user; and under which circumstances. We also wanted to gauge how much the user experience would be influenced by the system’s calibration (in particular, by the value given to α), and whether having to create new cashtags would be seen as a hurdle by users not accustomed to microdebates.
For this study, we approached ten participants in the 25–34 age group, all of them with an Android phone and a Twitter account. The experiment was conducted in English, in Turkey among non-native English speaking participants, all with a reasonable command of English. We divided all participants into two equally sized groups; each group was given a topic, and a 40 min time frame, to discuss using Microdebates App. At the end of 40 min we gave a two-hour break. Then we gave a different topic, and an additional 40 min for a second microdebate. Eventually, we asked participants to answer an anonymous survey.
The topics were: Are occupy protest movements justified? and Is nuclear energy justified and should it be expanded? In the first debate, participants were allowed to create new cashtags in order to label their arguments. In the second debate, participants were given a fixed set of cashtags, each one with a brief explanation of the concepts around it. These conditions were the same for both groups. α was set to 1 for a group’s first debate (
We observed that the structure of the debates was not visibly influenced by α, and that there was no substantial difference between debates whose cashtags were given and those with free cashtags.
Based on the data we gathered via questionnaires, we observed an interesting correlation between a participant group’s interest in a topic and the number of tweets and explicit attacks produced by the group, all else being equal. When at least 4/5 participants declared interest in a topic, the discussion received 14 to 21 tweets, containing 6 to 10 attacks and the connectivity of the argument network was 3. When less than 4/5 participants declared lack of interest in a topic, the discussion received 7 to 9 tweets with 2 to 5 attacks, and the connectivity was less than 3.
In general, interest in the topic under discussion is key for obtaining argumentation frameworks rich with connections and weights, where groups engaged in microdebates are more likely to enjoy a sharper consensus. This is also thanks to the information extracted from attack edges and weights which enables a better, sharper visualization, which in turn may be generally perceived, by the group, to be appropriate and useful.
We designed and run a second experiment, in Italian, on May 15, 2014, on the night of the pan-European presidential debate. The debate among candidates to the Presidency of the European Commission was broadcasted live. We gathered a group of around 50 Italian students of Political Science in Bologna, plus some more students at other University sites (Trento and Siena), and gave them a 30 min tutorial about microdebates. Like in the first study, all participants were equipped with their own devices and Internet connections. About half of the participants had an Android phone, and not all of them installed the App, so visual feedback was limited, as most participants could see the other tweets, but could not enjoy the hash tags and browsing features provided by the Microdebates App. The web site was not yet available at that time.
The participants in Bologna were accommodated in a room with a large screen from which they could follow the two-hour debate. In preparation for the experiment, we identified 9 different topics which would be addressed by the candidates during the debate, and we published a corresponding list of 9 hashtag, one for each topic (
We gathered 293 tweets from 24 active participants. However, the majority of tweets contained syntax errors (for example, the
The differences between the outcomes of the two studies could surely be ascribed, at least in part, to differences in the experimental settings. The demographics were similar, but the technology used was different – much more varied in the second study – and the context and background were also quite different – the dynamics of a fast-paced and often emotionally engaging ongoing debate had a different impact on the participants. However, we believe that these two studies also reflect different debating styles identifiable in online communities.
According to a recent study published by the Pew Research center [54] not all online conversation are of the same “type”. On the contrary, the study identifies at least six distinctive structures of social media crowds (“conversation archetypes”) which form depending on the subject being discussed, the information sources being cited, the social networks of the people talking about the subject, and the leaders of the conversation. Each has a different social structure and shape.
For example, polarized discussions feature two big and dense groups that have little connection between them. Polarized crowds on Twitter are not arguing: they are ignoring one another while pointing to different web resources and using different hashtags. Discussions in so-called community clusters instead are characterized by multiple smaller groups, which often form around a few hubs, each with its own audience, influencers, and sources of information. Community clusters conversations look like bazaars with multiple centers of activity. There we can see arguments: some information sources and subjects ignite multiple conversations, each cultivating its own audience and community. These can illustrate diverse angles on a subject based on its relevance to different audiences, revealing a diversity of opinion and perspective on a social media topic.
As there is no single way conversations take shape in social media, there is no single way arguments take shape, either. For this reason, the macroscopic difference between the outcomes of the two studies could be explained also, in part, by relating to the different archetypes. In particular, we could argue that the participants in the political debate behaved like “polarized crowds”, as witnessed by the large number of endorsements and insults and by the few arguments and connections, whereas the participants in the first behaved like “tight crowds”, where tight, information-rich argument graphs could emerge if the topic was interesting for that crowd.
Discussion
Recent findings in cognitive science [45] suggest that people are good at arguing, actually that the main function of reasoning is argumentative. However, when big numbers are in play, as it may happen with very crowded online platforms or with very complex debates, it may be difficult for bystanders and potential contributors to make sense of what is going on.
The motivation behind this work is to improve online debates and support the agreement process, by formalizing and rationalizing the debate itself. Microdebates aim to encourage debaters to focus on the arguments involved in the debate, on the relations between such arguments, and on the possible evolutions of arguments.
A legitimate question is whether microdebates could be accommodated by Twitter using current tags, and whether no additional syntax was required at all. The answer is negative. The hashtag is grounded in Twitter habits as the way to indicate “what we are talking about”, therefore it cannot be used to identify arguments. However, it was unavoidable to introduce new symbols to label arguments and attacks. The recently introduced cashtag is meant to be used to keep track of specific stocks. To the best of our knowledge, double- and bang-cashtags are not in use yet. Weights are calculated based on the existing language and do not require further additions. All in all, the extension to the Twitter language that we propose is both necessary and, in a sense, minimal.
A purpose that microdebates could serve particularly well is to support the activities preceding deliberations, in online democracy and e-participation environments that call for new experimental solutions.15
This is especially true in the Italian context. Quoting Fiorella de Cindio, “Right now Italy is a lab for participatory online platforms since there is a strong need to rebuild trust into politics and politicians”
Microdebates follow the grassroots argumentation philosophy introduced in [55]. Users contribute to a debate by broadcasting annotated comments. As a result, arguments arise bottom-up.
Interestingly, arguments here are dynamic entities that evolve over time. A particular argument, identified by a hashtag, can be thrown in the debate arena even if it is not “well-formed”. We do not have a concept of well-formedness. There are no arbitrarily chosen standards that an argument is expected to meet. This marks a fundamental difference with other argumentation-based sense-making tools such as those reviewed in our introductory section.
Instead, many users can cooperate, tweet by tweet, to make that argument evolve, gain focus, strength and social support. New elements can be added at any time, turning a simple claim into a fully-fledged argument. Other arguments can arise, that counter those already in place, forming an argumentation framework. Such a collaborative effort does not require conforming to (rigid) rules, learning new interfaces, nor creating new social networking environments.
Microdebates also offer important degrees of flexibility. Our proposal is orthogonal to the semantics of argumentation, by design. Different extension-based semantics may suit to different domains. Moreover, we are platform-independent. We do refer to Twitter, because the Twitter user base seems to be particularly well versed in this type of content tagging and could easily pick up the spirit of microdebates. Besides, Twitter has important features that greatly help implementing the idea. However, microdebates may be run in other online social platforms as well.
In a broader perspective, the role of microdebates goes beyond that of an innovative tool for debating and reaching agreements. As an analytical tool, microdebates allow a deep analysis of arguers’ position in a debate. For instance, policy-makers need to understand why citizens feel in some way or another, about a given policy. There is a lot of material available in public online forums, but it is very hard to perform an analysis of arguments using only statistical approaches. Even state-of-the-art argumentation mining techniques are not yet able to provide high levels of accuracy [41].
As a tool to support scientific research, microdebates could produce significant benchmarks for argumentation. The lack of benchmark libraries for argumentation is a well known issue [46]. As Modgil et al. say, a benchmark library will bring various benefits to the field of argumentation as it will support the implementation of new theoretical ideas, as well as their testing and comparison with the state of the art. Moreover, we can gain insights on the practical use of different semantics for specific domains subject of debate.
Finally, microdebates could provide valuable input for agent-based social simulations. In particular, NetArg [26,27] is a model for agent-based social simulations, where agents are modeled using AFs that represent their beliefs. It would be interesting to use AFs produced by microdebates, to set up simulations and possibly predict how opinions may spread in a social network. That would contribute to sociological theory and could also be used as a social monitoring tool. Indeed, arguing is a social process, so we may use sociological models to capture diffusion of ideas and innovations among arguers, in order to monitor anomalies.
There are also intrinsic risks in our approach. In particular, users may not embrace our syntax, even if they already have an account on Twitter. This sort of risk is associated with many initiatives whose success depends on community engagement. Experimentation discussed in Section 6 indicates that users that have never been exposed to computational argumentation can get accustomed to the microdebates syntax almost immediately. This fact suggests that our syntax will not represent a hurdle for Twitter users, and that it will fit their habits without particular efforts. But will it be engaging to join a “real world” microdebate? Only a wider experimentation will answer this question. We made a Twitter profile16
We are actively working on the microdebates software tools described in Section 4, in order to enable further experimentation. The last experiments taught us that well-designed user interfaces are essential to the uptake of this technology, so we are actively working on improving them. Tools are also crucial to understanding the boundaries of our approach: skilled Twitter users may develop habits that could be different from what we expect, leading to unforeseen system behaviors.
Other risks are due to the openness of our approach. A criticality of all open, non-moderated online debate platforms is that they are prone to (intentional or unintentional) “bad behavior”, and microdebates are no exception. They suffer from a weakness shared by many other democratic approaches: they offer maximal freedom to citizens, but are prone to exploits that could inflate support without real arguments, or even without real followers. Trolling is a well-known issue [5,57], as well as fake accounts, which could be spammers or bots.18
Since arguments are dynamic entities that evolve over time, it is inevitable that the meaning of arguments changes over time and it may happen, for instance, that someone who referred to a given argument in the past does no longer agree with it at some later point, though the user’s contribution to the argument weight is still there. Addressing this issue does not pose a technical hurdle: a Twitter user can easily delete old tweets. However, it is unrealistic to assume that users will monitor the content of all arguments they have been referring to in the past. In the end, the impact of argument dynamics and the user response to the issue are the kinds of effects we expect to be able to observe at the macro-level.
Another matter of discussion is the scope of microdebates. Twitter streams are usually public, i.e., visible to everybody. However, some microdebates may need to be kept private. We are working on privacy-enabling extensions using other online platforms, as there is no reason why the microdebates concept should be restricted to Twitter.
Scalability is another well-known issue of online debates [17]. We address it by following a principle of locality and distribution, and via an incentive mechanism. In particular, since users tag their own arguments, even if large crowds may bring in many arguments, tagging is still done locally. Moreover, because arguments that weigh more have a better chance to emerge, there is an incentive in building on consolidated, “heavy” arguments, rather than creating new arguments with minimum initial weight. This should help containing the number of arguments.
For future work, it would be useful to sift through arguments automatically in order to filter out the spam. We are aware that argument analysis [53] is a very difficult task, even when it is done by hand on a single argument. Besides, the bottom-up, unstructured nature of arguments arising from microdebates makes the task even harder. However, automated argument analysis is a rapidly expanding field [41] and some interesting results are becoming available [10,29,30,40,42,47,48,58]. Thus a scenario of intelligent argument filtering may be not so unrealistic as it seemed a few years ago. We are working on integrating an “argument filtering” module to our visualization tools, so that “better” arguments can be given more emphasis.
The concept of group support (or “social acceptance” [39]), also needs to be further explored. It is somewhat related to a democratic principle: what the majority believes fair will be considered as fair. Suppose a scenario in which microdebates are widely adopted to collect opinions in settings like political elections, like nowadays some TV shows use tweets to measure what the audience thinks about an issue, or to fact-check what politicians say in live debates.19
See for example the ESPRC EDV project,
While we believe that argumentative skills are fundamental to foster democratization processes, we recognize that argumentative elements in generic social media tools are very basic: Twitter, Facebook and Google Plus use “RT”, “Like” and “+1” buttons, while YouTube also added a “thumbs down” option. “Argumentation support has not yet moved firmly from the academic lab, into the mainstream”, conclude Schneider et al. [52] after a comprehensive review of the state-of-the-art of actual argument tools, and they claim that different interfaces are needed to support different kinds of arguing, because people often argue to position and establish themselves, not only to solve hard problems. This is where microdebates come into play. To the best of our knowledge, this is the first attempt to introduce unstructured, community-based argumentation into a popular social web platform.
Footnotes
Acknowledgements
We thank the anonymous reviewers for providing high quality, constructive feedback. We thank Nefise Yağlıkçı, Lorenzo Michelacci, and Günce Çağloğu for their work with the implementation of the Microdebates App and of the web interface, and our colleagues and students at the Political Sciences department of the University of Bologna for their contribution in the experimentation. A special thank you and farewell to Aldo Di Virgilio.
The research described here was mostly carried out when the first author was affiliated with University of Bologna’s DISI, in the research staff of the ePolicy project. This work was partially supported by the ePolicy EU project FP7-ICT-2011-7, grant agreement 288147. Possible inaccuracies of information are under the responsibility of the project team. The text reflects solely the views of its authors. The European Commission is not liable for any use that may be made of the information contained in this paper.
The original posts can be retrieved from:
