Abstract
Abstract
Social bots are currently regarded an influential but also somewhat mysterious factor in public discourse and opinion making. They are considered to be capable of massively distributing propaganda in social and online media, and their application is even suspected to be partly responsible for recent election results. Astonishingly, the term social bot is not well defined and different scientific disciplines use divergent definitions. This work starts with a balanced definition attempt, before providing an overview of how social bots actually work (taking the example of Twitter) and what their current technical limitations are. Despite recent research progress in Deep Learning and Big Data, there are many activities bots cannot handle well. We then discuss how bot capabilities can be extended and controlled by integrating humans into the process and reason that this is currently the most promising way to realize meaningful interactions with other humans. This finally leads to the conclusion that hybridization is a challenge for current detection mechanisms and has to be handled with more sophisticated approaches to identify political propaganda distributed with social bots.
Introduction
Social media is a phenomenon that has existed for a bit more than a decade now (Facebook went online 2004, Twitter in 2006). For the first time, a large part of the world's population is enabled to participate in direct and partly worldwide visible information exchange.
Together with the increasing importance of social media in all-day live and a growing reach of these networks, their use (or misuse) for orchestrated information distribution in terms of advertisement up to political propaganda becomes attractive for different stakeholders. Due to the underlying technical nature of the communication medium, automated and thus cost efficient access to social media channels is easy. Similar to email services several years ago, social media channels are used for simple spamming. 1 However, since about 2010, reports on trolling or automated so-called social bot activity in social media increase—especially with a focus on political manipulation and propaganda.2–4 Undoubtedly, nowadays social bots have a high societal impact, 5 independent of the approaches realizing them. This leaves research with new and multidisciplinary challenges: detecting and fighting automated and orchestrated manipulation via social media necessitates insights and understanding of motivation, processes, economics, and current limits of manipulation. Computer scientists track networks, measure interactions, build algorithms, and are concerned with security issues. However, they are usually unfamiliar with communication aspects and effects. Social scientists have to understand new (semiautomatic) ways of distributing information or propaganda and answer questions of possible societal impact. Both have to collaborate with statisticians and researchers in the area of artificial intelligence to understand challenges and limits of developing big data-based detection mechanisms.
As a first step, this work covers technical details and processes, economic considerations, as well as limits of automated manipulation via social networks in a multidisciplinary way and provides a baseline for further discussion. For an initial common understanding, we review the existing interpretations of the term social bot and propose a consolidated definition. This definition is complemented by a comprehensive discussion and classification of automated actors in the web.
Then, we focus on the technical details and challenges in the development of social bots. We first show the construction and implementation principle of a responsive Twitter bot and extend this implementation to a framework for realizing human-like behavior. In addition, the latter is validated by a social bot experiment at Twitter, applying 30 social bots for gaining followers and distributing (ethically harmless) content. For both concepts, we discuss the realization costs.
In a final step, we address the existing gap of automation on the behavioral level and automation on the communication level. In this study, we argue that it is currently most cost efficient to automate bots on a behavioral level, while content generation and bot–human communication is still controlled by humans. In the context of the performed bot network experiment, we empirically show that current automatic detection mechanisms cannot distinguish hybrid bots from human users.
Definition and Taxonomy of Social Bots
When journalists, bloggers, or scientists report on social bots and their potential influence on society, many of these articles provide an own definition of the term social bot. Very often, these definitions strongly differ from each other, some focusing on technical details, others highlighting social interaction. Sometimes, the definitions even contradict each other or explicitly exclude a class of social bots others include. Although the capabilities and effects of social bots taking part in public Internet communication have been discussed frequently, no common understanding of the expression itself has evolved.
As the term itself suggests, its definitions stem from a mixed, partly social science and partly technical perspective, while the weighting of the perspectives is usually up to the respective definition's author.
From a technical perspective, the term bot is often related to robots, automation, and algorithms. 6 All of these terms are certainly part of the understanding of social bots, however, their equivalent substitution interweaves technically different concepts as algorithms and robots in a simplistic way and may lead to misunderstandings. Geiger 7 defines social bots—in a more exact but still very general way—as automated software agents. Emmer 8 adds properties such as artificial intelligence and the ability to autonomously act in the web.
The social science perspective usually addresses the social or political implications of the actions of social bots. Wooley 9 states that social bots “mimic human social media users” and “manipulate public opinion and disrupt organizational communication.” He also defines so called “political bots” as a special case of social bots. Hegelich 10 highlights that social bots are hidden actors with a political agenda. He explicitly distinguishes them from chat bots or other “assistants.” A wider definition which specifically considers the communication behavior is given by Frischlich et. al. 11 The authors point out that the imitation of human communication (behavior) is a key feature of social bots. This certainly also includes chat bots. Even more general, Kollanyi et al. 12 consider interaction with other users through automated social media as key property of social bots. Interestingly, social media platforms such as Facebook recently recognized possible effects of social bots by admitting “false amplifications,” however, they do not use the term social bot throughout their publication. 13
Many application examples of social bots are presented in a recent overview article by Ferrara et al. 14 This work allows to identify many types of bots and to evaluate the available definitions. In addition, the authors give an own but (in our view) slightly too tight definition of social bots: “A social bot is a computer algorithm that automatically produces content and interacts with humans on social media, trying to emulate and possibly alter their behavior.” 14 We will keep several aspects of this definition but do not restrict ourselves to social media usage alone. In addition, we include the communication aspect introduced by Frischlich et al. 11 and cover the interaction property by referring to agent behavior.
The term social bot is a superordinate concept, which summarizes different types of (semi-) automated agents. These agents are designed to fulfill a specific purpose by means of one- or many-sided communication in online media.
The most significant difference to other definitions is that we define social bots as a high-level concept, which comprises many types of specific bots. In addition, our definition covers:
• fully automated as well as partly human-steered bot action, • autonomous action (agent-like), • an orientation toward a goal, • multiple modes of communication, • and a wider ecosystem (all online media).
In the following paragraphs, we give several examples of social bots and specific subtypes. Moreover, bots which are not covered by our definition and, thus, are not supposed to be social bots are discussed.
Social bots
The most popular type of a social bot is the chat bot, “a software system, which can interact or chat with a human user in natural language such as English.” 15 The recent wave of chat bot development probably originated in the context of the Loebner Prize competition, 16 where Hugh Loebner set the task to find the most human-like acting program. 17 Nevertheless, chat bots are only as intelligent as their scripts and the respective databases behind. That is why they are often only developed for specific topics. Companies often use chat bots to handle customer service issues. One can find them “in daily life, such as help desk tools, automatic telephone answering systems, tools to aid in education, business, and e-commerce.” 18 As chat bots are created to communicate in dialogs with specific users or customers, a multitude of chat platforms is conceivable, such as private chats of social media pages, as well as other online media such as email or help sections on private company websites. The bots partly replace human interaction and are used to fulfill simple preprocessing tasks, such as figuring out the correct contact person for a specific service issue.
While chat bots focus on one-to-one communication, spam bots are developed to reach a large audience. The goal of this one-to-many communication is spreading information, advertisements, or fishing links, without involving the recipient. As they are used to communicate a certain message on behalf of a company, group, or person, they nevertheless fall into the category of social bots.
As mentioned earlier, political bots can be seen as a special type of social bots with the aim to spread political content or participate in political discussions on online platforms. 9 Political bots are designed by politically motivated groups to communicate their opinions and mindsets. A typical goal of political bots is, for example, boosting the popularity of a specific idea or person on a (social) media platform, by generating “likes” or “follows”. Furthermore, political bots may make use of the characteristics of chat bots or spam bots. They discover public conversations, posts, and comments by identifying keywords and intervene or flood them using generated or prepared (sometimes propagandistic) content. Whether political bots are able to participate in simple conversations with other users or just spread spam in a not-reactive way is just a question of the aim (and technical skill) of the operator and the code behind the bot profile. Human-like political bots that act on social media platforms such as Twitter and Facebook are potentially capable of influencing other users. Especially, if many bots cooperate in bot networks, they are able to arouse undeserved awareness for topics or simulate political moods. Examples of potential bot armies were discovered in the context of the U.S. presidential election 2016. Bessi and Ferrara 19 found out that nearly 19% of all election-related Twitter posts during this time were sent by social bots. Furthermore, the German news page Spiegel Online reports on parties which considered the usage of social bots supporting their election campaigns for the German federal election of 2017.20,21
Another type of social bots is the class of mobile phone assistants. Software such as Apples Siri 22 is designed to manage human-to-machine communication using natural language as input and output. Nearly any possible functionality of the mobile phone can be used by voice commands. In this case, the social bot acts as a translator between human users and the mobile phone. Similar to chat bots, but supported by voice recognition, keyword identification, and voice synthesis, the program performs appropriate actions or presents search results to the user.
Bots not regarded as social bots
Bots that are not covered by our definition of social bots are, for example, content management bots or curator bots. The job of a curator bot is to manage or collect content and to present it in an easy-to-digest way to humans. In contrast to social bots, the communication aspect is not pronounced for curator bots; they only work silently with content. Wikipedia bots are an appropriate example for this class of bots. Pywikibot 23 helps users to nurture articles by deleting redundant whitespace, generating links to related pages or correcting typos. Another example of content bots is data aggregation bots that are built to collect and manage data used for analysis later on.
Game bots help their users to be successful in computer games. Tasks of these bots can be as various as the games they are used in. Game bots can act as opponents to enable training, help to navigate through the game. Furthermore, they can be used for cheating or stand-in for short periods of unavailability. So-called farming bots such as in games like World of Warcraft perform simple tasks and free players from time-consuming but necessary duties. 24 Nowadays, game bots that realize all these functions and more are available in USB stick format from graphics card vendors. In contrast to social bots, game bots do not focus on communication and interaction but exclusively on substituting users by imitation.
Service-Level-Agreement (SLA) negotiators focus on machine-to-machine communication. These bots are built to handle SLA s autonomously. Again, there is no human communication or interaction aspect regarding this class of bots, which is why they are not covered by our definition of social bots.
Discussion
As also shown by the categorization into social and other types of bots above, we consider human–machine interaction as a key factor. Social bots automate social interaction via communication. Every online medium, where human communication takes place through publicly visible posts, chats, comment-functions, or direct messages, is a possible point of connection and enables the involvement of social bots. Nevertheless, our definition of social bots should be seen as a high-level concept. Social bots appear as different from each other as the reasons they were built for and have to be discussed within their specific context. Considering the mentioned examples, it is obvious that social bots are also designed for various other tasks than influence people. Announcements as Facebook's support for group bots and bot repositories, 25 let us expect that social bots are going to be a pervasive part of Internet experience in the upcoming years. When discussing the possibilities of social bots to influence single users up to whole societies, we shall therefore use more precise notions and terms.
Automation Using Social Bots
The application of social bots for multiple purposes (from advertisement to propaganda) implies different technical challenges as well as economic considerations to be handled. On the one hand, costs are rapidly increasing in terms of technical complexity (such as for making social bots more human-like). On the other hand, simple technical realizations may have a big enough impact to maximize monetary or political revenue in some cases. In the following sections, we will present a most simple technical realization of a social bot and extend it to a moderately complex behavioral human-like actor on Twitter—trying to keep costs rather low. We then present an experiment using 30 of those Twitter bots and lead over to an economic discussion of hybrid extensions of social bots in the next chapter.
A simple reactive Twitter bot example
One of the most simple ways to develop a reactive social bot adopts the Twitter Stream API. 26 This basically means, that the bot listens to the ongoing worldwide Twitter activity and reacts to arriving posts. More formally, we use a Twitter Stream Listener component that registers with Twitter and additionally implement a simple actuator component which is triggered by incoming Twitter posts and uses the Twitter REST API 27 to reply to these posts, if applicable. Figure 1 provides a schematic overview of the components and data flow of the social bot. For the full implementation details using the Python tweepy 28 framework, refer to the listing given in Appendix . Note that due to bandwidth management of the public Twitter stream, only a subset of posts will reach the listener. Depending on the registered topics and activity on the platform, only about 1% up to 40% (in very restricted cases also more) of the actual traffic may arrive at the listener. 14

Components and data flow of a simple Twitter bot realization using the Twitter Stream API. (Source: Authors).
Functionality
The presented social bot enables us to react on Twitter posts directly, answering to the sender. In our implementation, the Twitter Stream Listener consumes the current Twitter stream with respect to a given set of hashtags or topics. Thereby, we are able to adapt to a specific context or domain of interest. Although the current implementation only greets the user of a received post, the functionality of the actuator can easily be extended. The application of this bot ranges from simple demonstration (and greeting) purposes to simple service activities based on standardized responses, such as
• Returning the weather forecast for a city or region mentioned in the current post. Therefore, the actuator may use external weather information sources such as OpenWeatherMap. 29
• Answering questions on specific topics detected in the current post. Using the Google Knowledge Graph, 30 a mighty ontology network can be connected to the bot, covering an enormous knowledge base.
• In a political context: respond to specific topics and confront users (usually independent of the content they posted) with a number of fixed political statements.
These three application examples already demonstrate the potential of a very simple social bot implementation comprising not more than 30 lines of code for a fully functional frame.
Costs
Obviously, the costs for developing a simple social bot can almost be neglected. Implementation time is certainly lower than one hour for a medium experienced developer (including error-handling code, which is not provided in our listing). The main effort has to be put into the setup for the bot's Twitter account. Therefore, a standard Twitter account must be created and connected to a mobile number for developer access to the API. Both can easily be done in an anonymous way using a fake email address and an anonymously registered mobile number.
Clearly, the behavior of the presented bot can easily be detected as automated action. Neither a human recipient of a message nor a current automated detection mechanism will consider it to be sent by another human. Instant and standardized reactions, permanent activity, and restricted capabilities to analyze content will expose the social bot as such.
A social bot with human-like behavior
Development of a social bot with sophisticated human-like behavior faces three main challenges:
(1) Producing credible and intelligent content, which is accepted as such by human consumers. (2) Leaving a trace of human-like metadata in social networks. (3) Creating an adequate (often balanced) network of friends or followers to spread information.
While the first challenge is a rather open issue in science and even the more in practice (we will comment on this later), the second aspect can be handled to a certain extent by imitating human actions in social networks sticking to normal human temporal and behavioral patterns. This includes performing activities in a typical day–night cycle, carefully measured actions at the social media platform, as well as variability in actions and timing. Thus, at Twitter, a bot should pause between actions to simulate phases of inactivity (sleep or work), limit posting and Retweeting activities to a realistic, human-like level, and also vary these pauses and limits.
Another key issue is to grow a network of followers or friends. For social media, a network of friends implies a certain reach: the larger the network of followers, the more the Twitter users receive distributed content of the respective account. To create a network, Lehmann 31 proposes an effective strategy based on a simple observation: users follow other users hoping that those follow back again (which they often do, if the proactive profile does not obviously look bot-like), thereby establishing a friend relationship. In case this does not happen within a certain time span (the other user does not follow back), the one-sided connection is often dissolved to keep a balanced following-follower ratio. An exception to this is very prominent accounts with usually strongly imbalanced following-follower ratios (far more followers than followed users) or accounts that are mainly used to distribute advertisement (far more followed users than followers). The overall principle of follow-for-follow is not only respected by most human users but can also be applied to widen follower networks of bot accounts.
Extending bot functionality
The previously presented simple social bot implementation can easily be extended to address the challenges two and three. Therefore, several actuators are created that independently perform specific actions on Twitter. Considering the schematic depiction in Figure 2, we briefly describe the important components:
• CollectionActuator: This component listens to the Twitter stream and stores user names as candidates to follow later on. The selection of following candidates can be made with respect to different characteristics such as the following-follower ratio (balanced accounts are preferred), activity on Twitter (potential multiplicators are preferred), and Tweet properties (such as users sending popular Tweets are preferred). • BotProfile: The personal profile of a social bot is defined using a dedicated component, which stores all constraints and guidelines to simulate a certain behavior. Here, the day–night cycle and rest periods can be defined, general parameters for the posting and Retweeting behavior can be set, and an individual following behavior is formulated. Note that all settings should be guidelines only and have to be supplemented by some random variability. The component also provides functionality to request the next action time for all other actuators. This function interprets the given behavior values and (by adding some random noise) proposes the next action. • FollowActuator: This actuator ensures a continuous execution and management of the follow-for-follow procedure. With respect to the BotProfile, the component follows a certain amount of previously collected users (see CollectionActuator) and supervises reactions. If a contacted user follows back, the component adds this user as friend. If there is no response within a certain time window (we used about 24 hours, here), the one-sided friendship is canceled and the user is blacklisted. • PostActuator: This actuator enables the bot to post or Retweet on Twitter. Therefore, a database of individual Tweets and collected Tweets is accessed. The amount of actions is determined by the BotProfile. PictureActuator: the ability to post pictures on Twitter is implemented by this actuator. In analogy to the behavior of the PostActuator, pictures and matching comments are extracted from a picture database and posted on Twitter.

Components and data flow of the advanced social bot with behavioral settings, follow-for-follow mechanism, and human-like activity profile. (Source: Authors).
Experimental evaluation
To evaluate our mimicry approach for human behavior in the real-world context, we set up an experiment comprising 30 Twitter bots. In cooperation with the German TV station Pro7, we created 30 fake profiles, see also Table 1, and equipped them with the social bot framework described before. Each bot ran the same code, however, we individualized the bot profiles and Twitter Stream Listeners. Each social bot had its own day–night cycle, activity pattern, and following behavior. In addition, each bot listened to an individual set of topics within the Twitter stream (which is equivalent to expressing personal areas of interest for each bot identity). Overall, the experiment was divided into three phases:
(1) Building a network of followers during a setup and testing period of 2 days and the following 8 days of combined action. Thus, the experiment lasted 10 days in total, of which only the eight productive days were documented. Note that half of the social bots were mutually befriended by default, while the second half started with no followers at all. (2) Publish content in a coordinated way to test the potential of setting a trend on Twitter. The published content was devised by human actors and only distributed by the bots. (3) Reveal the social bot identity of the respective fake accounts to followers and the public (supported by a TV documentary on the experiment, which is available in German
32
).
Note that three Twitter bots were banned by Twitter during the experiment and are thus not included. (Source: Data collected by authors' experiments using BotOrNot)
Especially, phase one demonstrated that the follow-for-follow approach could successfully be applied to acquire followers automatically. As shown in Figure 3, the amount of followers continuously increased for the evaluated 8 days, resulting in about 1,400 followers after this short time period.

The plot shows the growth of the follower network for the initial Twitter bot setup in about 1 week. Twenty-seven of initially 30 Twitter bots continuously performed the follow-for-follow strategy automatically without any human intervention. Potential followers were selected from the Twitter stream regarding individual topics. (Source: Visualization by authors computed from data collected via the Twitter API).
During the second phase, two hashtags were promoted to test the reach of the acquired followers:
• We used the hashtag #schreinachten with a funny reference to the German words Schrei for cry and Weihnachten for Christmas. This artificial word addressed the feeling of holiday stress in a humorous manner. Social bots acting on this topic sent out short messages such as (translated to English): “I have no christmas present for my girlfriend, yet #schreinachten.” or “My grandma always gives me the chocolate I hate most… #schreinachten.” In addition, the whole message was posted in capital letters to symbolize shouting. The goal of this topic was to animate followers to think of further funny messages related to the hashtag. • The hashtag #sayyes was attached to a positive messages of joy and happiness. Here, messages like “Being happy today, the whole day! #sayyes” or “Made a proposal today and she said yes! #sayyes” were posted by the social bots. The goal of spreading this hashtag was to initiate a wave of positive messages.
Both hashtags were unknown or inactive before the start of the experiment. Thus, no bias was introduced by external actors or political events. At the same time, no ongoing conversation was influenced, and only little bias was introduced into Twitter traffic. From an ethical perspective, we consider the chosen topics and hashtags as harmless. The used topics were obviously humorous. Although they did address certain feelings to motivate users to participate, in our view, the distribution of the specific content by social bots did not attack or violate any personal feelings. To reveal the experimental setting afterward, we explicitly included the third phase (reporting on the experiment) into the experimental concept. Still, we are aware that our experiment had two main consequences: (a) due to the funny and apolitical character of the applied hashtags, it was more difficult to reach the Twitter trending topics. (b) We expected involved users to react differently to the third phase. However, we expected that personal and emotional involvement with the topics would be low.
In fact, the second hashtag briefly appeared in the German top 100 trending topics, however, a significant trend could not be established. Phase three showed, that many human followers had been deceived by the fake bot identities and actions. Reactions from Twitter users were different ranging from disappointment to anger and from amusement to disbelieve. Overall, no significant discussion or expression of anger followed the disclosure.
Although our experiment is only a snapshot of what is possible by applying human-like acting social bots, some important insights can be extracted. First of all, tedious tasks as building a follower network as well as posting and Retweeting content may be automated without being exposed as bot. Second, the automatically generated network can be used to spread content to all followers at any point in time. This will cause at least brief visibility and possibly push a topic to reach wider popularity. Finally, human users can easily be deceived by simple, but fairly realistic social bots behaviors.
Certainly, an important ingredient for the success of our social bots was–besides the human-like behavior patterns–the human-generated content published by all bots. As mentioned before, we used manually generated content to be spread by the bots. We include the discussion of this aspect into our cost review.
Costs
The development time of the extended Twitter bot (less than 2 days) can still be neglected compared to the functionality and benefit of automation provided by the general framework. The more tedious task was to generate all 30 fake accounts on Twitter. Thereafter, we were able to deploy the same code 30 times with only minor adaptations regarding the individual configuration of each bot. Then, phase one (growing the network) was performed by all bots without any human intervention.
Likewise, publishing content in phases two and three needed no intervention. However, content was not automatically generated but provided by humans. We decided to do so after reasoning on the following two questions: What would have been the costs of generating content automatically, and which content quality can be achieved?
Implementing the generation of intelligent and creative content for our hashtags would have required far more effort than setting up the whole social bot framework. Simple approaches based on templates still require some human interaction and lack creativity. More complex generators based on learned patterns still follow firm rule sets, which limit the variability of linguistic expressions. Both probably would have had reduced the credibility of our social bots due to repetitive content. Furthermore, due to the application of 30 cross-linked social bots and their continuous Retweeting behavior, a single message was repeated many times by other bots and followers, thereby extending its range automatically.
Hybrid Social Bots
The extended social bot framework presented in the previous chapter is able to mimic human behavior on the action level, which includes that a social bot is able to automatically create a follower network and manage content. Content production, however, is done by human actors. Figure 4 arranges different bot types in a purely qualitative manner with respect to the degree of automation and orchestration. It shows the automation–orchestration relationship of human users and simple social bots as single actors and as human troll farm or bot army, respectively. Regarding automation, we define hybrid social bots as an intermediate class of fully automated (behaviorally simple) bots and purely human users. Used under orchestration, communication approaches and activity patterns of single actors certainly differ. We assume that the army of simple bots is often following a mere client server model with rather similar acting single bots and little autonomy per agent. We even believe that also hybrid bot networks are often still centrally controlled. Each bot, however, possesses a behavioral autonomy, which mimics human behavior. In contrast, human troll farms are usually acting on a central interest, context, or overall goal but have the highest autonomy per agent. For them, central content generation becomes dispensable. Note that the arrangement of bot types in Figure 4 is not based on quantitative data regarding the degree of automation or orchestration. The purpose of the figure is to visualize the previously described qualitative perspectives on fully automated bots, hybrid bots, and troll actors.

Qualitative classification of the potential influence of humans and bots in social media, with respect to automation and orchestration. (Source: Classification proposed and designed by authors).
In this chapter, we argue that hybridization of bots is an effective (compared with an army of social bots) and low-cost (compared to a human troll farm) approach to gain a high potential of influence via social media by simulating human behavior and speech. We will show that a network of these hybrid bots is able to sufficiently outsmart current automatic detection mechanisms such as BotOrNot.33,34
Hybridization as low-cost mimicry approach
The current societal opinion on bot technology seems to be driven by recent success stories of AI, for example, the prominently featured wins of an artificial intelligence against world-class Go players. These successes follow to a large extent from the development of deep learning algorithms 35 that are (a) able to use big data collections for learning and (b) benefit from extreme parallelization. However, at the core of deep learning successes, we see human competitive (or even better) pattern matching and modeling capabilities. It is not at all trivial to use these capabilities to establish creative tasks, and especially human communication skills are still beyond of what algorithms can do.
Riedl 36 gives an overview and vision of how AI can approach computational interactive narrative, which requires that computers can understand human communication and react adequately. Attempts in this direction are currently still very limited, as, for example, shown by Martin 37 in terms of computational improvisation in relatively open (not targeted) communication.
Existing chat bots can answer simple questions in a limited domain of their expertise, but lack skills to participate in an open discussion. Recent attempts to improve their capabilities include the ParlAI 38 platform published by Facebook. However, these approaches are currently active research directions. While progress in modification of images is astonishing (Zhu 39 provides a tool that can translate images to other styles, such as the style of a specific painter), this is not yet possible for working with text, which is, in this respect, considered much more complex than image translations.
In the related fields of computational creativity and procedural content generation (mainly for games), we see similar problems, which has led to so-called mixed initiative approaches 40 where a human designer and a computer program work together, taking turns, to reach a specific design goal. Without human interaction, the available methods would not be able to produce results in a human compatible style. At least for the time being, it is seemingly mandatory to use hybrid approaches to establish results that can be perceived as human-generated and thereby appear human-like.
Hybridization as strategy against rule-based detection mechanisms
To evaluate our social bots against state-of-the-art detection mechanisms, we confront them with the BotOrNot service provided by Indiana University. 34 The BotOrNot service tries to assess the overall probability that a submitted Twitter account is automated. Therefore, the service compares previously learned patterns regarding the account's metadata, network, behavioral timing, friend relationships, sentiment, and content. The authors report on “more than 1,150 features” that constitute the patterns in all the named high-level classes. 34 Finally, the results of all indicators are aggregated to a value in [0,1] which represents the probability of an account being controlled by a social bot. Table 1 shows the overall rating for each continuously active social bot account of our experiment. Obviously, the probability ranges between 0.37 and 0.60 with an average of 0.48. That confirms that, on average, no clear bot identification is possible for our social bots.
To judge the quality of these score distribution for our bots, we generate a baseline distribution of BotOrNot score values of worldwide user accounts.
Methodology
As basis for user extraction we used data from the Twitter Decahose Stream, which provides a random 10% sample of worldwide Twitter traffic. The Twitter Decahose Stream provides roughly 300 posts per second. This sums up to about 160 GB of data per day. From this huge data sample of a single day, we extracted unique user accounts at four points in time: midnight, morning (6 am), noon (12 am), and evening (6pm), to respect possible effects of the day–night cycle. The gathered user accounts (about 1,200) were classified by using the BotOrNot-API provided by the BotOrNot service. As our social bots acted in the German language domain, we additionally only extracted German user accounts at the same points in time for a second, localized baseline distribution of scores.
Comparison of bots and average accounts
The comparison of our social bots' overall scores to the baseline distributions for the worldwide and German users is shown in Figure 5. Although the bots cannot clearly be classified as bots with respect to the score measure, in retrospective evaluation, their score is significantly higher than the baseline score of our sample score from the worldwide and German Twitter stream.

Statistics of the overall BotOrNot score values for the social bot network (red box plots, grouped into bots that initially act as single entity or group, respectively) contrasted with two baseline overall scores for a set of sample users. The sample users are taken from the worldwide (green box plots) and German (light green box plots) Twitter Decahose Stream at four points in time. The analysis was performed using the BotOrNot API. (Source: Visualization by authors, based on data acquired through the Twitter API and Decahose Stream).
To further analyze these findings, we additionally take a look at the detailed metafeatures provided by BotOrNot and the according scores.
• Content-related features: Figure 6 shows the detailed results for the sentiment, content, and language scores. For the sentiment score, features such as happiness, valence, arousal, and dominance as well as polarization, and emoticon statistics of Tweets are evaluated and aggregated. Here, our bots obviously behave as German baseline users. Both German users and bots have generally scored higher than the worldwide baseline, which may be caused by the fact that sentiment analysis for the German language is more difficult than for English. We also stress that the BotOrNot rule set has been trained using English accounts, 19 and thus, German content is certainly hard to analyze. The same observation can be made for the content feature, which aggregates Tweet length and entropy. Here, the bots also range in the German baseline. Language features combine statistics on part-of-speech tags in Tweets, which are low level features on the tagged or annotated grammar and context of words used in the Tweets. Here, a significant difference to both baselines is observed. The reason for this may be the high amount of slang terms and thus grammatically complex structure of Tweets used to push the topics in phase two of our social bot experiment.
• Metadata-related features: For metadata features, we observe that on average our bots behave similar to the German user baseline, except for the user score. The user score aggregates account-specific metadata information such as age of the account and profile description as well as frequency and temporal development of actions on Twitter. Especially for these features, our experimental bot accounts are certainly too young to be classified human-like. All other metafeatures, however, confirm human-likeness of the social bot behavior—especially, when we consider friendship features, networking, and temporal behavior. Here, no significant difference to the German baseline accounts can be identified (see Figure 7).

Detailed statistics of three metafeatures (sentiment, content, and language) for our social bots (red) and the baseline accounts worldwide (green) and from Germany (light green). (Source: Visualization by authors based on data acquired through the Twitter API and Decahose Stream).

Detailed statistics of four metafeatures (friendship, network, temporal, and user) for our social bots (red) and the baseline accounts worldwide (green) and from Germany (light green). (Source: Visualization by authors based on data acquired through the Twitter API and Decahose Stream).
A comment on detection mechanisms and hybridization
The evaluation of our social bot network showed that multiple features of a state-of-the-art detection tool as BotOrNot can be bypassed. Especially, the scores attacked by our automation framework (friendship, network, temporal behavior) are not distinguishable for bots and the evaluated random German account sample. Only features on content and the user profile show indications for bot behavior. Furthermore, for all samples (bot and baseline), the indicators reveal the large variance of scores for the provided accounts. In fact, the seemingly significant difference of social bot and baseline accounts is only identifiable due to the a-priori grouping of the known social bot accounts. If confronted with a single bot account, the BotOrNot detection mechanism provides values ranging from 0.40 to 0.60 in our case, leaving an uncertainty of at least 0.40 when we consider the complete metric scale of [0.0, 1.0]. Although Bessi and Ferrara 19 define scores larger than 0.50 as social bots by trend, this does not provide a sufficient overall score for flagging any of our social bots as such. This underlines our introduced concept of hybrid bots: accounts cannot be flagged as bots or humans anymore, if human interaction is present. In fact, we believe that interaction of human and automatic behavior is the normal case, when we speak of social bots. This should lead to a less binary perspective on social bots and on detection mechanisms.
Conclusion
With this article, we have contributed an interdisciplinary perspective on social bot taxonomy, degrees of automation, developmental costs, and the benefit and importance of human interaction for making social bots invisible for modern detection mechanisms. In detail, we gave a consolidated definition of social bots and applied it to known variants of automated actors in the web. From a more technical perspective, we provided insight into the implementation and costs necessary to deploy simple but reactive social bots at Twitter. To increase credibility, we extended the simple bot implementation by mimicking human behavior in temporal and operational properties. Content production was left to humans, leading to a hybrid bot network. We experimentally deployed such a network and demonstrated its applicability in principle. Tedious tasks were automated (like collecting followers, Retweeting, or posting human-prepared content). Finally, we discussed the costs and current technological limits for full human-like hybridization. Furthermore, by means of an empirical analysis of the Twitter bot experiment and average user data extracted from the Twitter Decahose Stream, we have shown that hybrid social bots are able to bypass important indicators of current rule-based detection mechanisms as BotOrNot.
Our results reveal several new challenges for future research in social bot detection. The next big challenge for detection systems will be to identify hybrid social bots, which expose real human behavior, on the one hand, and automatic patterns in some actions, on the other hand. We assume that rule-based methods will not suffice for these tasks. In fact, adaptive and real-time detection mechanisms, which are able to reconfigure and learn, are necessary to react on changing behavioral patterns almost instantly. In addition, we believe that the inclusion of human interaction into hybrid social bots should shift the focus from purely automated detection systems to hybrid detection systems that are able to judge on content, background strategies, and distributed narratives by the inclusion of human (possibly crowd) intelligence.
Footnotes
Acknowledgments
This work is part of the PropStop project, which is funded by the German Federal Ministry of Education and Research (FKZ 16KIS0495K). The authors are also supported members of the ERCIS network. We also thank Petra Huber and her team for their support in realizing our social bot experiment.
Author Disclosure Statement
No competing financial interests exist.
