A machine learning approach for socialbot targets detection on Twitter

Abstract

In online social networks (OSNs), socialbots are responsible for various malicious activities, and they are mainly programmed to imitate human-behavior to bypass the existing detection systems. The socialbots are generally successful in their malicious intent due to the existence of OSN users who follow them and thereby increase their reputation in the network. Analysis of the socialbot networks and their users is vital to comprehend the socialbot problem from target users’ perspective. In this paper, we present a machine learning-based approach for characterizing and detecting socialbot targets, i.e., users who are susceptible to be trapped by the socialbots. We model OSN users based on their identity and behavior information, representing the static and dynamic components of their personality. The proposed approach classifies socialbot targets into three categories viz. active, reactive, and inactive users. We evaluate the proposed approach using three classifiers over a dataset collected from a live socialbot injection experiment conducted on Twitter. We also present a comparative evaluation of the proposed approach with a state-of-the-art method and show that it performs significantly better. On feature ablation analysis, we found that network structure and user intention and personality related dynamic features are most discriminative, whereas static features show the least impact on the classification. Additionally, following rate, multimedia ratio, and follower rate are most relevant to segregate different categories of the socialbot targets. We also perform a detailed topical and behavioral analysis of socialbot targets and found active users to be suspicious. Further, joy and agreeableness are the most dominating personality traits among the three categories of the users.

Keywords

Machine learning social network analysis social network security user profiling socialbots

1 Introduction

Twitter, a popular microblogging service, allows its users to connect with friends, celebrities, and politicians to get the subscription of their views on events, personal life updates, etc. in real-time. Unlike other online social networks (OSNs) that are mainly used for entertainment, Twitter is generally used to discuss the current political scenarios, local and global news stories, and to get updated about the personal life events and views of the global leaders and celebrities. On Twitter, a user follows others to get the subscription of their activities and content. Benign users generally use OSNs for legit reasons. But, open nature, real-time message broadcasting, anonymity, and easy to use functionalities have made OSNs suitable for malicious users and their activities. Such malicious users are creating novel and sophisticated problems, such as astroturfing, fake news, and propaganda [16, 35] which are significantly different from the conventional problems like spamming and malware injection [15, 26] in terms of sophistication level, scalability, and robustness. The socialbots are such malicious users, and they are the source of the problems for both social media service providers and network users. Socialbots are OSN profiles operated by a computer program which mimic human behavior to resemble real human beings. Socialbots are programmed to deceive benign users to follow them in order to build trust and reputation in the network. To this end, they exploit the homophily phenomenon to attract other similar users or those who abruptly follow anyone without verification. The existing literature has reported various studies conducted in different OSNs to analyze the potential of socialbots to influence user behavior and manipulate the network structure [1 –3]. In a seminal work, Boshmaf et al. [1] conducted a socialbot injection experiment on Facebook and observed that 80% of friend requests from socialbots are accepted by the recipients when they have mutual friends. The authors also analyzed the economic feasibility of operating socialbots for the large-scale infiltration attacks.

1.1 Motivation

The existing literature has feature-engineering, graph-partitioning, and behavior modeling-based detection methods to tackle the challenging problem of socialbot detection in different OSNs [4–6 , 47]. Researchers have also presented community-level decision making and malicious campaign detection approaches [43, 46]. As the approaches get mature, socialbots also evolve to evade the detection methods [8]. Although, detection of socialbots and other malicious profiles is important, characterization and identification of users who are susceptible to these malicious profiles are also important from the security perspective of both OSNs and their users. Since the susceptible users facilitate the reputation building process of the socialbots by following them, their profiling is an important research problem. The characterization and identification of such users can help the OSNs to keep the users alarmed about the repercussions of their behavior and alert them while accepting/initiating connections from/with unknown/suspicious users. In the existing literature, the problem of profiling susceptible users is understudied. In the first work of this kind, Wanger et al. [9] presented a machine learning approach for classifying the susceptible and non-susceptible users. They utilized linguistic, network, and behavior-based features to characterize the two categories of the users. Wald et al. [10] modeled the Twitter users who either interacted (followed the socialbot) or replied to the socialbot (communicated through tweets). They used personality, linguistic, and demographic features to train different classification models for segregating the two categories of the users. However, to the best of our knowledge, no existing approach models Twitter users based on their connection formation behavior with the socialbots. Therefore, in this study, we categorize Twitter users into three categories – active, reactive, and inactive users. Moreover, we model it as both ternary and binary classification problems to analyze the difference in the working behavior of the active and reactive susceptible users, and their impact on the trained classification models. We use personality and emotion-based features, which are robust and efficient and never used in the existing literature for susceptible user profiling. We could have modeled this problem using a deep learning method exploiting the network and word embeddings [45], but it may not be effective due to small dataset.

1.2 Contributions

This work is an extension of our initial study published in a conference proceedings [11]. The substantial extension from the conference version includes – (i) a detailed description of the identified features, (ii) comparative performance evaluation with a state-of-the-art method for both binary and ternary classifications, (iii) Feature ablation analysis to investigate the discriminative power of different feature categories, and (iv) a detailed analysis to observe behavioral differences among the three categories of the users. In brief, we characterize three categories of OSN users based on their identity and behavioral characteristics, such as (i) who they are?, and (ii) what they tweet? The profile of a Twitter user consists of two types of information. The first category of information is identity-related information which includes name, age, and Twitter handle. These information either do not change or hardly change over time. The second category of information represents the online behavior of a user, which is generally dynamic. It includes tweets count, followers count, topic distribution, emotion distribution, and so on. We divide all characterizing features into two categories – static and dynamic, representing the two behavioral components of a user profile. Further, using the extracted features, three machine learning classifiers are trained to classify the users into active, reactive, and inactive categories. We perform a feature ablation analysis to observe the classification power of each feature category. We also identify the dominant features using a set of two feature selection and ranking algorithms. Finally, we investigate the tweets corpora and other behavior of three categories of users to identify their topical distribution, suspicious behavior, and tone. In short, the main contributions of this paper can be summarized as follows.

A novel study that categorizes the Twitter users into three categories – active, reactive, and inactive users, based on their connection formation behavior with the socialbots.

Modeling the characterization and detection of susceptible users, both as a ternary and a binary classification problem to know the impact of difference in the working behavior of active and reactive users.

Identification of robust and efficient features for effective profiling of susceptible users and a description of the identified features.

An extensive empirical analysis of the features to get insights regarding different behavioral aspects of the active, reactive, and inactive users.

The rest of the paper is organized as follows. Section 2 presents a brief review of the literature on profiling of users who are susceptible to socialbots. Section 3 presents the problem definition and some of the preliminary concepts. Section 4 presents a detailed description of the identified features. Section 5 presents experimental setup and evaluation results, including a comparative performance evaluation with one of the state-of-the-art methods. Section 6 presents an analysis of the users to investigate their behavioral characteristics like topical distribution, malicious behavior, and emotions embedded in text. Finally, section 7 summarizes the paper with concluding remarks.

2 Related works

2.1 Online social networks and socialbots

Since inception, OSNs are facing various cyber threats and issues in the form of spamming, identity theft, cyberbullying, fake news, and so on [12 –15]. Adversaries conduct such malicious activities using various forms of fake profiles like sock puppets, and Sybil. Socialbots are also a type of malicious profiles, mainly designed and deployed in OSNs to perform astroturfing, spamming, spear phishing, and so on. In the existing literature, researchers have conducted various experiments to observe and analyze the socialbots’ behavior, their infiltration and manipulation efficacy to pollute the OSNs, and other behavioral impacts [3 , 17]. In such an experiment on aNobii, which is an OSN for book lovers, Aiello et al. [17] demonstrated that socialbot can easily reach among the most influential users of the network. In another interesting experiment, Elyashar et al. [3] showed that socialbots can also trap technical users exploiting various social engineering tactics like mutual friends and common profession. It signifies that every OSN user is at the risk of the socialbot attacks. Boshmaf et al. [18] found that operating a socialbot campaign is economically unfeasible. Further, the authors identified three inherent vulnerabilities of an online social networking environment which is exploited by the socialbots. Abokhodair et al. [19] unearthed a Twitter social botnet, which was tweeting and coordinating among the rebels in their anti-government movement. The authors have also analyzed the botnet growth, their content generation behavior in the discourse, and content-level characteristics, which differentiate them from benign users.

2.2 Profiling and detection of socialbot targets

A user profile is a set of information which characterizes the user, whereas user profiling is the process of identification and modeling of those information to characterize the user [20]. OSNs generate a massive amount of data, and it facilitates the researchers to perform various interesting analytical problems, such as human behavior analysis, interest tag identification, and so on [48]. Researchers are also using OSN data to track malicious activities and underlying users. The existing literature has user profiling strategies for the characterization and detection of different types of malicious entities, such as spammers, sock puppets, and socialbots. Oentaryo et al. [21] presented a feature engineering-based approach to classify the Twitter users into broadcaster, consumor, and spambots. Pennacchiotti et al. [22] used linguistic and profile-based features to predict the users political orientation and ethnicity. The authors also investigated the topical distribution of users content to observe their interest. Esparza et al. [23] presented a profiling approach to filter user timelines based on their interests. The authors also modeled users’ interest by analyzing the content of both tweets and webpages redirected by the URLs embedded in the tweets. Similarly, various approaches exist to characterize and detect content polluters, spammers, and bots in different OSNs [24 –27]. Researchers have also used sequencing techniques to profile those users who are compromised by the malware and other illicit entities based on their activity analysis, such as the first activity post login, browsing sequence, and so on. Inspired by DNA sequencing, Cresci et al. [5] presented a digital DNA sequencing method for profiling individuals’ activities to classify social spambots and benign users. They grouped user activities based on activity type and encoded them with different characters. However, very few studies have analyzed and profiled the users who interact with the socialbots. In this direction, Wagner et al. [9] presented a machine learning approach to profile those users who have any form of interaction (reply, mention, follow, or retweet) with the socialbots and labeled them as susceptible users. They trained six different machine learning classifiers to segregate susceptible and non-susceptible users. Wald et al. [10] profiled those users who interacted (followed) or communicated (mentioned, replied, or sent the direct message) with the socialbots. Wald et al. used the same set of features like [9] and trained six different classifiers to classify the interacted and communicated users. To the best of our knowledge, none of the existing approaches have categorized the socialbot targets based on their interaction initiation behavior and how they differ from unresponsive users. Also, no approach in the existing literature analyzes the susceptible user profiling as both ternary and binary classification problems.

3 Problem definition and preliminaries

3.1 Problem definition

The objective of the proposed approach is to detect users who are vulnerable to socialbots on Twitter. The basic functions of our approach are as follows:

Input: A list of users.

Identification: Socialbot targets, i.e., users that can be trapped by the socialbots.

Classification: Classifying socialbot targets into active, reactive, and inactive users.

In [2], we presented the analytical observations of a socialbots injection experiment on Twitter. In the experiment, hundreds of Twitter users created connections and interacted with the socialbots, constructing the injection network of the socialbots. On analysis, we found that socialbots generally target random users, but two categories of users facilitate them. The first category includes those users who initiate connection with the socialbots by following them, and they are termed as active socialbot target. The second category includes those users who responded to socialbots’ following activity by following them back, and they are termed as reactive target. These two categories of users are different from those users who do not responded to the socialbots’ requests, and they are termed as inactive target. Therefore, given the two categories of socialbot targets as input, our approach characterizes the Twitter users and train different machine learning models to detect and segregate them from inactive users. Table 1 presents the symbols along with a brief description that are used in this paper.

Table 1
A list of symbols and their brief description

Symbol Description

N Socialbots injected network

U _S Set of socialbots

U _T Set of target users

U Aggregate set of socialbots and target users

E Set of connections in network N

$A$ , $R$ , $I$ Set of active, reactive and inactive targets respectively

T _u Set of tweets by a user u

t A tweet from T_u

$T_{u}^{t} (τ)$ Set of n topics extracted from tweet t

$T_{u}^{t} (ω)$ Set of relevance scores for topics set $T_{u}^{t} (τ)$

$T_{u}^{t} (τ_{i}^{m})$ m^th level sub-topic of i^th topic $T_{u}^{t} (τ_{i})$

$E_{u}^{i} (ɛ)$ Set of n entities extracted from tweet t

$E_{u}^{i} (ω)$ Set of relevance scores for entities set $E_{u}^{i} (ɛ)$

Symbol	Description
N	Socialbots injected network
U _S	Set of socialbots
U _T	Set of target users
U	Aggregate set of socialbots and target users
E	Set of connections in network N
$A$ , $R$ , $I$	Set of active, reactive and inactive targets respectively
T _u	Set of tweets by a user u
t	A tweet from T_u
$T_{u}^{t} (τ)$	Set of n topics extracted from tweet t
$T_{u}^{t} (ω)$	Set of relevance scores for topics set $T_{u}^{t} (τ)$
$T_{u}^{t} (τ_{i}^{m})$	m^th level sub-topic of i^th topic $T_{u}^{t} (τ_{i})$
$E_{u}^{i} (ɛ)$	Set of n entities extracted from tweet t
$E_{u}^{i} (ω)$	Set of relevance scores for entities set $E_{u}^{i} (ɛ)$

3.2 Preliminaries

3.2.1 Socialbot targets

Socialbots first create connections with OSN users to build trust in the network. They target users based on certain parameters, and different categories of users respond differently. Some users start connection/following socialbots without any request from them, whereas other users respond differently. A network of socialbots and their target users is called socialbots injection network. If N (U, E) represents a socialbots injection network, where U represents the aggregated set of socialbots U_S and target users U_T such that U = U_S ∪ U_T, and E represents the set of connections (including follower and following relations) between them, then the three categories of users can be described as follows.

Active target:

In an injection network N (U, E), a target user u ∈ U_T is considered as an active target $A_{u}$ of a socialbot s ∈ U_S, if u first follows s. Such a user is called active because he/she is always eager to follow anyone without any offline familiarity or connection request. Among the active targets, some users follow socialbots due to homophily effect.

Reactive target:

In an injection network N (U, E), a user u ∈ U_T is considered as a reactive target $R_{u}$ of a socialbot s ∈ U_S, if u follows back to s in response to its being followed by s. Such a user is called reactive because he/she gets trapped through a push-up (followed) action.

Inactive target:

In OSNs, benign users generally connect with known users and public figures, whereas malicious users follow and respond to anyone to increase their followers count and consequently the network scope. In an injection network N (U, E), if a socialbot s ∈ U_S follows a user u ∈ U_T to get a follow back, but u neither follows back nor starts any kind of interaction with s, then u is called an inactive target $I_{u}$ .

On Twitter, every user with respect to socialbots can be classified into one of the aforementioned three user categories. Therefore, in an injection network N (U, E), the set of target users U_T includes active, reactive, and inactive users.

3.2.2 Ethical aspects

In OSNs, various privacy and security issues related to the sharing and exposure of users’ information exist. At each stage of the experiment, we tried our best to ensure the privacy of the collected user information. Our socialbots injection experiment, monitoring, and data logging were conducted for an academic research purpose to investigate the impact of the socialbots’ attributes on the Twitter users of different geographies. To comply with the Twitter’s rules and guidelines, we filtered all radical and suspicious tweets while crawling tweets from the trending topics and followers’ timeline. Socialbots were also programmed to retweet only those tweets that do not contain any spam or radical keywords. Moreover, quotes stored in the database were manually verified for inflammatory, racial, controversial, or provoking content. Based on the ethical issues discussed by Elovici et al. [28], we identified several ethical aspects that were complied during and after the injection experiment. Table 2 presents a brief description of the identified ethical considerations.

Table 2
Ethical consideration and respective compliance

S. no. Ethical consideration Compliance

1 User consent We did not inform the targeted users in advance because that may change their behavior. However, at the end of the experiment, socialbots were suspended.

2 Indirect exposure We did not crawl information related to the followers of the socialbots’ followers, so there is no question of indirect exposure.

3 Exposure of human weaknesses We identified certain human weaknesses towards socialbots or while accepting friend requests from unknown users.

4 Waste of resources We did not create too many profiles to avoid any adverse effect on Twitter network like change in discourse.

5 Impact on statistics Since the number of socialbots was only 98, it cannot impact the network statistics.

6 Exposure of sensitive information We injected socialbots using the Twitter API and did not crawl any private or unauthorized user information.

7 Confidentiality We did not share any private or public information with any third person or organization.

8 Anonymity The crawled data is logged in an encrypted format to ensure user privacy.

S. no.	Ethical consideration	Compliance
1	User consent	We did not inform the targeted users in advance because that may change their behavior. However, at the end of the experiment, socialbots were suspended.
2	Indirect exposure	We did not crawl information related to the followers of the socialbots’ followers, so there is no question of indirect exposure.
3	Exposure of human weaknesses	We identified certain human weaknesses towards socialbots or while accepting friend requests from unknown users.
4	Waste of resources	We did not create too many profiles to avoid any adverse effect on Twitter network like change in discourse.
5	Impact on statistics	Since the number of socialbots was only 98, it cannot impact the network statistics.
6	Exposure of sensitive information	We injected socialbots using the Twitter API and did not crawl any private or unauthorized user information.
7	Confidentiality	We did not share any private or public information with any third person or organization.
8	Anonymity	The crawled data is logged in an encrypted format to ensure user privacy.

Moreover, our experiment was approved by and performed under the guidance of the departmental research committee, including my doctoral supervisor as a member. We conducted every step of the experiment under the monitoring of the research committee. We did not share any private or public information of the crawled users to any third person or organization. We also ensure that it will not be shared in the future as well. In the experiment, we tried our best not to violate the Twitter’s terms of service 1 and privacy policy 2 .

3.2.3 Data collection and semantic enrichment

The dataset associated with the three categories of the socialbot targets viz. active, reactive, and inactive users is collected from a socialbots injection experiment conducted on Twitter. In the experiment, we injected 98 socialbots associated with the top-six (in terms of the user-base) Twitter using countries – USA, Brazil, UK, Japan, Indonesia, and India. The socialbots were programmed to randomly perform various activities, such as following, tweeting, and retweeting. The socialbot network was in operation for approximately four weeks. We have presented a detailed description of the injection experiment and the corresponding statistical results in one of our previous works [2]. Throughout the experiment, socialbots followed a total number of 6, 963 users. Among them, 1, 248 users followed back to socialbots (reactive users). The remaining 5, 715 users are the inactive users because they did not acknowledge any socialbots’ request. Also, a total number of 1, 659 users followed the socialbots without any following from the socialbots (active users). Finally, we have a total number of 2, 907 active and reactive users that were trapped by the socialbots. In this study, we have considered only those trapped users who have at least 200 tweets in their timeline because 200 tweets are sufficient to observe the emotional, personality, and topical behavior of a user. As a result, we have only 611 active and 773 reactive users. Out of these, we have randomly selected 250 users from each of the active and reactive user categories, and 250 users from the set of inactive users. Thus, the final dataset has 750 users comprising 250 users from each of the aforementioned three user categories.

Thereafter, We crawled 200 tweets and profile information of each of the 750 users of the dataset, and processed them using the Natural Language Understanding (NLU 3 ), a powerful natural language processing service from the IBM. NLU first semantically enriches a text using an external knowledgebase, and then extracts topics and entities embedded within the enriched text. We have used NLU because latent Dirichlet allocation [29] is not efficient for small text. In NLU, the extracted topics have extended sub-topics that are arranged hierarchically. It assigns a score between 0 to 1 to every extracted topic/entity describing their relevance in the underlying tweet/document. For example, if T_u represents the set of tweets of user u, then for a tweet t ∈ T_u, NLU extracts topics as $T_{u}^{t} (τ, ω)$ , where $T_{u}^{t} (τ)$ represents the set of n topics extracted from t, as given in Equation (1), and $T_{u}^{t} (ω)$ represents the corresponding relevance score set. Similarly, NLU extracts entities and their corresponding relevance score. In $T_{u}^{t} (τ)$ , $T_{u}^{t} (τ_{i})$ represents the i^th topic of t. Further, each topic $T_{u}^{t} (τ_{i})$ has hierarchical sub-topics, as given in Equation (2), where $T_{u}^{t} (τ_{i}^{m})$ represents the m^th level sub-topic of the i^th topic $T_{u}^{t} (τ_{i})$ . Similarly, extracted entities $E_{u}^{i} (ɛ, ω)$ have similar organization, except that they do not have sub-entities. Additionally, five emotional aspects – anger, fear, joy, sadness, and disgust are also extracted using NLU. We also extract the big-five personality traits of all users using the Personality Insights 4 , another natural language processing service from the IBM.

$T_{u}^{t} (τ) = \sum_{i = 1}^{n} T_{u}^{t} (τ_{i})$ (1)

$T_{u}^{t} (τ_{i}) = T_{u}^{t} (τ_{i}^{1}) \supset T_{u}^{t} (τ_{i}^{2}) \supset . . . \supset T_{u}^{t} (τ_{i}^{m})$ (2)

4 Features identification

This section presents user characterization approach to profile users who are susceptible to be the socialbots’ victims. Since the trapped users aid in trust-building process of the socialbots and eventually make them successful in OSNs, it is vital to characterize and profile the trapped users to differentiate them from other users of the network who do not respond to the socialbots’ following requests. In OSNs, identity-related information associated to a user hardly changes, whereas behavior-related information frequently changes with time. Therefore, we categorize a user information into static and dynamic components. Figure 1 presents a schematic representation of the proposed approach. A brief discussion of static and dynamic components along with different categories of the features is presented in the following sub-sections.

Fig. 1

Work-flow of the proposed approach for classifying socialbot targets on Twitter.

4.1 Static features

In real-world, people identify a user through name, gender, etc. that generally do not change with time. Similarly, users provide some basic identity-related information on OSNs. However, no universally accepted mechanism exists for automatic verification of the provided information. Further, due to usability issues and fear of user-base reduction, OSNs do not put strict measures on user registration and operation. As a result, OSNs exhibit various vulnerabilities abused by the malicious users for illicit activities, such as fake profile creation [1]. Based on the users’ profile information, the following sub-section presents a description of the static component – online identity (who you are?).

4.1.1 Online identity (who you are?)

On OSNs, a user verifies the sender of a friend request using his/her publicly available information. The receiver does this verification to ensure the identity of the sender. The OSNs use different sets of personal information to construct a user profile and authorize the user to customize the profile to control the visibility of the personal information up to a certain extent to which she is comfortable. Unlike other OSNs, Twitter require limited information, such as name, Twitter handle for the new account registration. In the proposed approach, the static component for user profiling includes a set of 9 features that are briefly summarized in Table 3. In the static category, 5 out of 9 features are based on users’ bio description and Twitter handle. The static features are generally straightforward to compute. The name and handle similarity is calculated using Jaro similarity [31], which is a string matching algorithm to compare two small strings. For example, Jaro similarity J_s between two strings st₁ and st₂ can be computed using Equation 3, where m represents the number of matched characters and tr is the half of the number of transpositions.

Table 3
Static features and their descriptions

Feature name Description

Timespan It is the number of days between the date of joining the Twitter and the crawling date

Geo-enable-status It determines whether geo-enabled feature of a user is active or not

Profile description length It represents the number of characters in a user’s profile description

Special character count in profile description It is the number of special characters in a profile description

Special character ratio in profile description It is the ratio of the number of special characters to the total number of characters in a profile description

Profile image status It determines whether a user has profile image or not

Handle length It is the number of characters used in a user’s Twitter handle

Special character ratio in handle It is the ratio of the number of special characters to the total number of characters in a user’s handle

Name and handle similarity It is the similarity between the user name and its handle

Feature name	Description
Timespan	It is the number of days between the date of joining the Twitter and the crawling date
Geo-enable-status	It determines whether geo-enabled feature of a user is active or not
Profile description length	It represents the number of characters in a user’s profile description
Special character count in profile description	It is the number of special characters in a profile description
Special character ratio in profile description	It is the ratio of the number of special characters to the total number of characters in a profile description
Profile image status	It determines whether a user has profile image or not
Handle length	It is the number of characters used in a user’s Twitter handle
Special character ratio in handle	It is the ratio of the number of special characters to the total number of characters in a user’s handle
Name and handle similarity	It is the similarity between the user name and its handle

$J_{s} = {\begin{matrix} 0 & if m = 0 \\ \frac{1}{3} (\frac{m}{| {st}_{1} |} + \frac{m}{| {st}_{2} |} + \frac{m - tr}{m}) & otherwise \end{matrix}$ (3)

4.2 Dynamic features

We generally use static attributes of users for their identity verification. However, except for identity, the static attributes do not have any notable impact on the position of a user in the network. In an OSN, users are distinguished based on their behavior. The static features can be easily manipulated by a user (malicious or benign) through adjusting the profile attributes as per the requirement, but her activities in a network indicate her off-line behavior and intent behind joining the network. In this study, we model dynamic behavior, such as network structure, tweets content, and topical distribution of a user using a set of dynamic features. However, modeling user characteristics is a difficult task due to the complex behavioral dynamics. In OSNs, it is even a more challenging task due to informal writing, code-mix languages, noisy content, and lack of efficient multilingual NLP tools. The dynamic features, $D_{u}$ , can be grouped into four categories, namely, “textual preference (what you tweet?)” denoted by $D_{u}^{T}$ , “interaction methods (how you tweet?)” denoted by $D_{u}^{I}$ , “user intention and personality (why you tweet?)” denoted by $D_{u}^{P}$ , and “network structure (whom you connect?)” denoted by $D_{u}^{N}$ , as given in Equation (4). The following sub-sections present a brief description of each of these dynamic feature categories.

$D_{u} = D_{u}^{T} \cup D_{u}^{I} \cup D_{u}^{P} \cup D_{u}^{N}$ (4)

4.2.1 Textual preference (what you tweet?)

In the real-world, every individual has a distinct set of characteristics, and people are different from each other in terms of writing style, language skills, lifestyle, and so on. In terms of textual preference, people from different professions have disparate language and writing styles. For example, journalists and bloggers are generally very proficient writers and they generally use long sentences to express views on the current affairs; advertisers post product and service-related content; and, normal users generally post related to their daily life events. The textual preference feature category monitors such characteristics of an individual using six different features, reflecting her linguistic preferences. The features of a user u under the textual preference category is represented using $D_{u}^{T}$ , and briefly described in Table 4. Most features of this category are straightforward and calculated on a set of 200 tweets for each user. The tweet similarity between the tweets of a user u is computed using the bag-of-words model, which converts each tweet into a word-vector, and thereafter computes Cosine similarity between each pair of tweets. If T_u represents the set of tweets of u, then to find the Cosine similarity between a pair of tweets t_i and t_j, first, both tweets are tokenized and stop-words are filtered, and then an aggregate vector combining the words from both tweets is created. Thereafter, each tweet is converted into a numeric vector wherein a value is 1 if the underlying word occurs in the tweet, otherwise, it is 0. The Cosine similarity between the vectors of a pair of tweets is calculated using Equation 5, where $V_{u}^{i}$ and $V_{u}^{j}$ represent the numeric vector representations of tweets t_i and t_j, respectively. Finally, tweet similarity between the tweets of u is calculated as the average of the Cosine similarity between all tweet-pairs, as given in Equation 6. We do not present a detailed description of the other features of this category because they have already been described in Table 4.

Table 4
Textual preference-based features and their brief description

Feature name Description

Tweet similarity It finds the average of the similarity between every pair of tweets

Average tweet length It represents the average tweet length of all the tweets

Tweet length variance It finds the deviation in the tweets lengths

Multimedia ratio in tweets It is the ratio of the number of audio, video, and images in tweets to the total number of tweets

Advertising keyword ratio It is the ratio of the number of advertising keywords to the total number of tweets based on dictionary

URL ratio It is the ratio of the number of URLs to the total number of tweets

Feature name	Description
Tweet similarity	It finds the average of the similarity between every pair of tweets
Average tweet length	It represents the average tweet length of all the tweets
Tweet length variance	It finds the deviation in the tweets lengths
Multimedia ratio in tweets	It is the ratio of the number of audio, video, and images in tweets to the total number of tweets
Advertising keyword ratio	It is the ratio of the number of advertising keywords to the total number of tweets based on dictionary
URL ratio	It is the ratio of the number of URLs to the total number of tweets

$C (t_{i}, t_{j}) = \frac{V_{u}^{i} . V_{u}^{j}}{| | V_{u}^{i} | | . | | V_{u^{j}} | |}$ (5)

$TS (u) = \frac{2 \times \sum_{i = 1}^{| T_{u} |} \sum_{j = i + 1}^{| T_{u} |} C (t_{i}, t_{j})}{| T_{u} | (| T_{u} | - 1)}$ (6)

4.2.2 Interaction method (how you tweet?)

In OSN, users exhibit distinct behavior while interacting with the platform. Some users actively tweet and retweet, whereas there are users who actively use the network but rarely tweet. Such users use the social network only as a consumer of information [32 –34]. The existing literature has various spammers, spambots, and socialbots detection techniques that exploit the interaction-based user characteristics [31]. In a user profile, this category of features represents the interaction behavior, rather than the access methods of the user. This category presents three features to model the interaction behavior of a user. The features of a user u under this category is denoted by $D_{u}^{I}$ and briefly described in Table 5. Like previous features, all these features are straightforward and easy to understand based on the brief descriptions given in Table 5.

Table 5
Interaction method-based features and their brief description

Feature name Description

Tweet rate It represents the number of tweets per day by a user

Retweet rate It represents the number of retweets per day by a user

Language count It is the number of languages used in the tweets of a user

Feature name	Description
Tweet rate	It represents the number of tweets per day by a user
Retweet rate	It represents the number of retweets per day by a user
Language count	It is the number of languages used in the tweets of a user

4.2.3 User intention and personality (why you tweet?)

In cyberspace, different OSNs exist for different categories of users, such as ResearchGate for researchers, Instagram for pic lovers, and so on. Moreover, within a network, users join and use an OSN like Twitter for distinct reasons. For example, some users join Twitter to connect with the favorite celebrities to subscribe their content, whereas others join it merely for entertainment. On the other hand, companies use Twitter to connect with customers to inform them regarding the latest updates and to get feedback on products and services. However, Twitter is generally used as a real-time news source to get politicians’ and celebrities’ views on current affairs [2]. Therefore, users’ tweets can be analyzed to observe their behavioral, emotional, and topical inclination. In this category of features, we analyze users’ tweets to extract big-five personality traits and emotional attitudes. We extract the big-five personality traits using Personality Insights 5 tool and chose the dominating character as a feature. Section 3.2.3 has already described the topics, entities, emotional aspects, and personality traits extraction process. For each user, we extract all these features from a set of 200 tweets. The features of a user u under the user intention and personality category is denoted by $D_{u}^{P}$ , and it consists of 14 attributes that are briefly explained in Table 6. Like previous features, all these features are also straightforward and self-explanatory.

Table 6
User intention and personality-based features and their brief description

Feature name Description

Topic count It is the number of topics extracted from the 200 tweets of a user

Topic ratio It represents the number of topics per tweet

Average topic weight It represents the average relevance score of all the topics discussed in the 200 tweets of a user

Entity count It represents the number of entities extracted from the 200 tweets of a user

Entity ratio It represents the number of entities per tweet

Average entity weight It represents the average relevance score of all the entities discussed in the 200 tweets of a user

Positive to negative sentiment ratio It is the ratio of the sum of positive sentiment to the sum of negative sentiment from 200 tweets of a user

Sentiment orientation It is the aggregate sentiment orientation based on the sentiment from all 200 tweets

Anger score It represents the average anger score over 200 tweets of a user

Fear score It represents the average fear score over 200 tweets of a user

Joy score It represents the average joy score over 200 tweets of a user

Sadness score It represents the average sadness score over 200 tweets of a user

Disgust score It represents the average disgust score over 200 tweets of a user

Dominating character It represents the most dominating character among the big-five personality traits from the 200 tweets

Feature name	Description
Topic count	It is the number of topics extracted from the 200 tweets of a user
Topic ratio	It represents the number of topics per tweet
Average topic weight	It represents the average relevance score of all the topics discussed in the 200 tweets of a user
Entity count	It represents the number of entities extracted from the 200 tweets of a user
Entity ratio	It represents the number of entities per tweet
Average entity weight	It represents the average relevance score of all the entities discussed in the 200 tweets of a user
Positive to negative sentiment ratio	It is the ratio of the sum of positive sentiment to the sum of negative sentiment from 200 tweets of a user
Sentiment orientation	It is the aggregate sentiment orientation based on the sentiment from all 200 tweets
Anger score	It represents the average anger score over 200 tweets of a user
Fear score	It represents the average fear score over 200 tweets of a user
Joy score	It represents the average joy score over 200 tweets of a user
Sadness score	It represents the average sadness score over 200 tweets of a user
Disgust score	It represents the average disgust score over 200 tweets of a user
Dominating character	It represents the most dominating character among the big-five personality traits from the 200 tweets

4.2.4 Network structure (who are connected?)

The ultimate goal of a malicious user like socialbot, spammer, and sock puppets is to maximize the number of followers. Among the malicious users, socialbots exploit different tactics like common interest and mutual friend to grab the followers and friends. In the process, socialbots generally trap other illicit users [2]. Therefore, analysis of the friendship network of socialbots is essential to monitor their fraudulent behavior. This category of features is denoted by $D_{u}^{N}$ and it represents the connection forming behavior of users who are susceptible to be a victim of the socialbots. Table 7 presents a brief description of each feature of this category. It should be noted that the follower rate and following rate of a user are computed by dividing the followers and followings count of the user by the Timespan, as defined in Table 3.

Table 7
Network structure-based features and their brief description

Feature name Description

Follower rate It is the number of followers grabbed by a user per day

Following rate It is the number of users followed by a user per day

Follower to following ratio It is the ratio of the number of followers to the number of followings of a user

#mentions It is the number of users mentioned in the 200 tweets of a user

Feature name	Description
Follower rate	It is the number of followers grabbed by a user per day
Following rate	It is the number of users followed by a user per day
Follower to following ratio	It is the ratio of the number of followers to the number of followings of a user
#mentions	It is the number of users mentioned in the 200 tweets of a user

5 Experimental setup and results

This section presents a description of performance evaluation techniques, including classification models training and evaluation, comparative analysis, feature ablation analysis, feature ranking, and finally behavior analysis. The performance evaluation of the proposed approach is performed using three different classification models and four standard evaluation metrics over the dataset described in Section 3.2.3.

5.1 Classifiers and evaluation metrics

The proposed approach first presents the profiling of socialbot target as a ternary classification problem, i.e., classifying users into three categories. We evaluate our approach using three machine learning multi-class classifiers, namely, naive Bayes, REP decision tree, and random forest. Among these, naive Bayes is a probability-based classifier, which assigns the feature vector of an instance to the most likely class using the Bayes theorem, a statistical technique to assign a posterior probability to an event based on its prior knowledge [36]. The REP decision tree initially builds a decision tree, and then prunes it to remove insignificant branches using the reduced error pruning approach [37]. The random forest is an ensemble classifier, which combines the results from multiple decision trees that follow the same distribution. The proposed approach uses 100 trees in the random forest-based classification. To predict the class of an instance using random forest, first, its feature vector is passed to each decision tree of the forest, and thereafter majority voting is used to determine its class label [38].

We have evaluated the proposed approach using four standard evaluation metrics – true positive rate (TPR), false positive rate (FPR), Precision, and F-Measure. The TPR (aka Recall) is the fraction of correctly classified active or reactive users from the set of all active or reactive users. Mathematically, it is defined using Equation (7), where TP represents the number of correctly classified active or reactive users and FN represents the number of active or reactive users misclassified as inactive users. Similarly, FPR is the fraction of inactive users misclassified as active or reactive users, as defined in Equation (8), where FP represents the number of inactive users misclassified as active or reactive users, and TN represents the number of correctly classified inactive users. Precision of a classification model is the fraction of the correctly classified active or reactive users from the set of users classified as active or reactive, as defined in Equation (9). Finally, F-Measure of a classification model is the harmonic mean of the precision and recall (TPR), which is defined in Equation (10).

$TPR (Recall) = \frac{TP}{TP + FN}$ (7) $FPR = \frac{FP}{FP + TN}$ (8) $Precision = \frac{TP}{TP + FP}$ (9)

$F - Measure = \frac{2 \times Precision \times Recall}{Precision + Recall}$ (10)

5.2 Performance evaluation results

After feature extraction, the three aforementioned classifiers are trained as ternary classifiers using Weka 6 , a handy open-source tool having the in-built implementation of various machine learning and data mining algorithms in Java. We used 10-fold cross-validation to avoid bias and ensure the participation of every user in both training and testing procedures. The 10-fold cross-validation partitions the dataset into ten equal parts, wherein nine parts are used to train the model, which is further tested on the remaining one part. This process is repeated ten times to confirm the use of every user in both training and testing procedures. Table 8 presents the evaluation results for all three classification models in terms of the aforementioned evaluation metrics. Among the trained models, naive Bayes classifier under useSupervisedDiscreatization=TRUE setting performs moderately with TPR value at 57.7%, as given in Table 8; but, performs worst in terms of FPR value. In terms of TPR, Precision, and F-Measure, REP decision tree performs worst which can be observed from the first row of Table 8. Among the three classification models, random forest with default parameters demonstrates the best performance. It shows the TPR at 61.4%, as shown in the first row of Table 8. Based on the results, we can conclude that the overall performance of the trained models is not exciting. It is because of the poor classification performance of the trained models for reactive users who are significantly similar to active users in terms of behavior and operation characteristics.

Table 8
Performance evaluation results using Naive Bayes, REP Decision Tree, and Random Forest classifiers when trained for a 3-class problem

Naive Bayes REP Decision Tree Random Forest

Feature Set TPR FPR Precision F-score TPR FPR Precision F-score TPR FPR Precision F-score

F 0.577 0.204 0.561 0.569 0.568 0.201 0.562 0.565 0.614 0.199 0.609 0.611

F\Online identity 0.565 0.209 0.556 0.560 0.546 0.224 0.530 0.537 0.601 0.205 0.603 0.601

F\Textual preference 0.551 0.225 0.547 0.545 0.534 0.228 0.533 0.533 0.581 0.214 0.581 0.581

F\Interaction methods 0.571 0.206 0.567 0.567 0.545 0.231 0.543 0.543 0.598 0.195 0.598 0.598

F\User intention and personality 0.567 0.203 0.570 0.568 0.559 0.215 0.561 0.548 0.590 0.201 0.587 0.588

F\Network structure 0.435 0.264 0.429 0.429 0.454 0.246 0.453 0.453 0.549 0.221 0.550 0.549

	Naive Bayes	REP Decision Tree	Random Forest
F	0.577	0.204	0.561	0.569	0.568	0.201	0.562	0.565	0.614	0.199	0.609	0.611
F\Online identity	0.565	0.209	0.556	0.560	0.546	0.224	0.530	0.537	0.601	0.205	0.603	0.601
F\Textual preference	0.551	0.225	0.547	0.545	0.534	0.228	0.533	0.533	0.581	0.214	0.581	0.581
F\Interaction methods	0.571	0.206	0.567	0.567	0.545	0.231	0.543	0.543	0.598	0.195	0.598	0.598
F\User intention and personality	0.567	0.203	0.570	0.568	0.559	0.215	0.561	0.548	0.590	0.201	0.587	0.588
F\Network structure	0.435	0.264	0.429	0.429	0.454	0.246	0.453	0.453	0.549	0.221	0.550	0.549

Since both active and reactive users are socialbot traps, we observe their combined effect on the performance of the classification models. Accordingly, we adjudge both as a single unit called trapped users and label them as one class, whereas inactive users constitute the second class to model the trapped users detection as a binary classification problem. Further, we train the same set of classifiers and perform the experimental evaluation on the same dataset discussed in Section 3.2.3. Table 9 presents the evaluation results in terms of the aforementioned performance evaluation metrics. As a binary classification problem, random forest demonstrates the best performance in terms of TPR, Precision, and F-Measure with their values at 80.9%, 79.4%, and 80.0%, respectively, but moderate performance in terms of FPR, as shown in the first row of Table 9. Overall, the performance of the binary classification models is significantly better in comparison to modeling the detection of socialbot targets as a ternary classification problem. This result demonstrates and proves our hypothesis that active and reactive users are significantly similar in terms of working behavior.

Table 9

PPerformance evaluation results using Naive Bayes, REP Decision Tree, and Random Forest classifiers when trained for a 2-class problem

	Naive Bayes				REP Decision Tree				Random Forest
Feature Set	TPR	FPR	Precision	F-score	TPR	FPR	Precision	F-score	TPR	FPR	Precision	F-score
F	0.769	0.310	0.760	0.764	0.748	0.411	0.736	0.742	0.809	0.361	0.794	0.800
F\Online identity	0.755	0.290	0.765	0.759	0.740	0.415	0.733	0.736	0.805	0.356	0.799	0.802
F\Textual preference	0.764	0.292	0.753	0.758	0.735	0.419	0.722	0.719	0.789	0.372	0.781	0.776
F\Interaction method	0.733	0.301	0.756	0.744	0.744	0.402	0.729	0.736	0.793	0.373	0.785	0.780
F\User intention and personality	0.737	0.283	0.741	0.739	0.738	0.408	0.703	0.720	0.790	0.358	0.789	0.789
F\Network structure	0.707	0.301	0.712	0.709	0.734	0.409	0.719	0.726	0.778	0.345	0.774	0.755

We also performed the feature ablation analysis to analyze the discriminative power of each category of features. In this analysis, one category of features is excluded from the feature vector to observe its impact on the performance of the classification models [39]. Therefore, using the feature ablation analysis, aforementioned experiment is performed five times to observe the impact of each category of features on the classification models. The performance evaluation results of feature ablation analysis for each category of features are presented in the second, third, fourth, fifth, and sixth rows of Tables 8 and 9. We can observe from both the tables that in terms of TPR, Precision, and F-Score network structure-based features are most discriminative for all the classification models, except in terms of Precision and F-Score for REP decision tree model for binary classification problem. It represents the fact that different types of users have different connection formation behavior, and accordingly they exhibit different network structures. For example, follower seeking users generally follow back everyone who follow them. The follower seeking users are also quick in following users without having any prior familiarity with them. The user intention and personality-based features also show moderate discriminative power that includes most of the newly defined features, which is one of the contributions of this study. The textual preference-based features also show good classification power in the case of the ternary classification problem. It endorses the fact that different categories of users exhibit different writing styles and use different vocabulary. We can observe from the second row of both Tables 8 and 9 that static features show the least impact on the performance of the trained models. It depicts the fact that users (either they are susceptible or not) generally setup an OSN account very carefully.

5.2.1 Comparative analysis

In the existing literature, the problem of modeling and profiling socialbot targets is understudied. In this direction of research, Wald et al. [40] injected a socialbot on Twitter which trapped 610 users. Finally, authors extracted different features and trained various machine learning models to segregate the two categories of users. They also performed feature ranking to identify the list of traits that are most relevant for classifying the two categories of the users. This section presents a comparative performance evaluation of our approach with Wald et al. [40], which characterizes and predicts Twitter users who are susceptible to be a socialbot trap. We present the comparison of the proposed approach for both ternary and binary classification problems using the same set of classifiers and evaluation metrics over the dataset described in Section 3.2.3. To this end, we implemented the Wald et al. [10] approach, trained all three classifiers, and observed the values of the aforementioned evaluation metrics. It should be noted that Wald et al. is not a ternary classification problem because it classifies the users as either interacted or communicated with the socialbots. Therefore, to model it as a ternary classification problem, we added the inactive users as the third class for the Wald et al. approach. Figure 2 presents the comparative evaluation results. It shows that our approach significantly outperforms Wald et al. [10] method for all classifiers and evaluation metrics when we model it as a ternary classification problem. Among the classifiers, REP decision tree performs best in terms of all four evaluation metrics, and other two classifiers also perform better than the Wald et al. [10] method. We also present a comparison of our approach with Wald et al. [10] method for the binary classification problem. Figure 3 presents the comparative evaluation results, and it can be observed that in this case too our proposed approach outperforms the Wald et al. [10] method for all three classification models.

Fig. 2

Performance comparison results when modeled as ternary classification problem.

Fig. 3

Performance comparison results when modeled as binary classification problem.

5.2.2 Features ranking

In this section, we aim to identify the most relevant features based on their discriminative power for the classification models. To this end, we perform experiments using two feature ranking algorithms – mutual information (MI) [41] and correlation attribute evaluation (CAE). The MI shows the dependency between two random variables based on their joint probability distribution. For any feature F_i and class label l_k, the MI between them is calculated using Equation 11, where f and l represent one of the values of feature F_i and class label l_k, respectively; p (f, l) represents the joint probability of F_i and l_k, and p (f) and p (l) are the marginal probabilities of the feature and label, respectively.

$MI (F_{i}, l_{k}) = \sum_{f \in F_{i}} \sum_{l \in l_{k}} p (f, l) log \frac{p (f, l)}{p (f) p (l)}$ (11)

The mutual information determines how similar the joint distribution p (f, l) is to the product of the factored marginal distributions. On the other hand, the CAE ranking algorithm is based on Pearson’s correlation coefficient between the features and class labels, and it is defined using Equation 12, where F_i represents the i^th feature of the feature vector F, and l_k is one of the user categories.

$ρ_{(F_{i}, l_{k})} = \frac{Cov (F_{i}, l_{k})}{σ_{F_{i}}, σ_{l_{k}}}$ (12)

Table 10 presents the top-10 features identified by the two ranking algorithms. We can observe that the following rate, multimedia ratio in tweets, and follower rate are the most relevant features identified by the two ranking algorithms. On examination, we found that active targets have the highest following rate with a value of 22.35 users per day, and they used multimedia content in their tweets most frequently. On the other hand, reactive targets have the highest tweet rate, and they posted approximately 12 tweets per day. Therefore, we can conclude that users who are actively engaged in the network in terms of following, tweeting, and other activities are more prone to the socialbots’ trap. We can also observe that static profile features are not relevant to discriminate among different user categories, whereas network structure-based features are the most relevant for socialbot targets detection.

Table 10

Top-10 features selected by two feature ranking algorithms

	Ranking algorithms
Rank	MI	CAE
1	Following rate	Following rate
2	Follower rate	URL ratio
3	Tweets similarity	Multimedia ratio in tweets
4	Follower to following ratio	Special character ratio in handle
5	Multimedia ratio in tweets	Entity ratio
6	Timespan	Tweets similarity
7	Average entity weight	Average entity weight
8	Anger score	Follower rate
9	Advertising keyword ratio	Profile description length
10	Entity ratio	Entity count

6 Analysis

In this section, we discuss several interesting observations based on the content and behavior analysis of the three categories of the socialbot targets. The following sub-sections present a detailed description of different analyses and underlying analytical results.

6.1 Topical analysis

Why a user joins an OSN? A user’s intention behind joining a network can be inferred by analyzing the topics discussed in the tweets. The $D_{u}^{P}$ component of the dynamic profile has modeled this aspect of a user’s personality. To this end, we perform a topic analysis of user tweets. We extract the topics from the tweets of each user and sort them in descending order based on their relevance scores generated by NLU. Thereafter, top-10 topics are selected for analysis. In NLU, topics are arranged hierarchically, starting from high-level topics to more specific topics at the lower levels. For example, if health and fitness is a high-level topic, then lower-level sub-topics can be rural health, doctors, medicine, and they can have further fine-grained sub-topics. Figure 4 (reproduced from the conference version of this paper [11]) presents the highest-level and lowest-level topics distribution for all three categories of users. On analysis of 10 highest-level and 10 lowest-level topics, we found that active users frequently post regarding internet technologies, education, bank, etc., as shown in Figures 4(a) and 4(d). On the other hand, analysis of the lowest-level topics shows that reactive users are less frequent than active users in terms of the tweeting activity.

Fig. 4

Frequency distribution of top-10 highest-level topics discussed by three groups of users (reproduced from [11]) – (a) active users, (b) reactive users, and (c) inactive users, and top-10 lowest-level topics discussed by three groups of users – (d) active users, (e) reactive users, and (f) inactive users

6.2 Suspicious behavior analysis

Are the active and reactive socialbot targets malicious? To find the answer to this question, we analyze and evaluate the two categories of users based on some parameters that are the established indicators of spamming. First, we analyze the use of URLs and spam words in tweets because these two indicators are the important features in existing spammer detection approaches [9 , 42]. On analysis, we found that active and reactive users frequently use URLs in tweets, and thereby they have higher url ratio at 0.50 and 0.39, respectively. On the other hand, inactive users moderately use URLs in their tweets with a value of 0.29 per tweet. Figure 5(a) presents the cumulative distribution of url ratio for the three categories of users. This figure shows that approximately 40% of reactive and inactive users have url ratio less than 0.20, whereas only few active users have url ratio less than 0.20. It is obvious from the Figure 5(a) that active and reactive users have higher probability of using URLs in their tweets. However, when we analyze the advertising keyword ratio, we found that the three categories of users show identical behavior, as shown in Figure 5(b). Moreover, active and reactive users frequently use images and videos in their tweets and, therefore, they show higher image and video rate, as shown in Figure 5(c). It further confirms the malicious nature of the active and reactive users. We also analyze the connection formation behavior through following activity and found that active users most frequently create connections, which is obvious from Figure 5(d). Finally, we analyze the current status of the three categories of users (in terms of whether they are active or suspended), and the corresponding statistic is given in Table 11. On Twitter, in response to an API request for a user’s information, an error code of 63 represents that the user is suspended 7 due to the violation of the Twitter’s terms of service, whereas an error code of 50 represents that user has either deleted the account or changed its handle. In terms of suspicious behavior, suspended is serious than user not found because suspension is due to malicious activities, whereas account deletion is generally personal rather than malicious. Table 11 shows that the lowest percentage of active users exist, as shown in the third column using bold typeface, and they have the highest suspension percentage, as shown in the last column using bold typeface. On the other hand, inactive users have the highest percentage of existence and only 2.4% are suspended, representing their fair and benign behavior. Therefore, based on the analyses of engagement and content, active and reactive users appear to be more suspicious than inactive users.

Fig. 5

Cumulative distribution of spammy features for three categories of socialbot targets.

Table 11

A brief statistic of socialbot targets

User type	Total users	Existing users	User not found	Suspended users
Active	250	186 (74.40%)	26 (10.40%)	38 (15.20%)
Reactive	250	202 (80.80%)	27 (10.80%)	21 (8.40%)
Inactive	250	207 (82.80%)	37 (14.80%)	6 (2.40%)

6.3 Tone analysis

In addition to topical and suspicious behavior analysis, this section performs personality traits and tone analysis of the three categories of users. In terms of emotional characteristics, three categories of users show similar behavior. However, some emotions are more prevalent than others. On analysis, we found joy as the most dominant emotion among the three groups of users. For the visualization proof, Figures 6(a), 6(b) and 6(c) show the cumulative distribution of three emotion scores viz. joy, sadness, and anger, respectively. These figures also show that joy is the most dominant emotion. Similarly, we also analyze the big-five personality traits and found that agreeableness is the most dominant personality trait which is equally prevalent among the three categories of the users, as shown in Figure ??. We can observe from this figure that approximately 90% of the users have agreeableness score greater than 0.5. The dominance of agreeableness trait represents the prevalence of sympathetic, warm, and considerate behavior among the Twitter users, which can be abused by socialbots to gain followers and likes to their tweets, and eventually making trust in the network.

Fig. 6

Cumulative distribution of emotion scores for three categories of socialbot targets.

7 Conclusion and future work

In this paper, we have presented a machine learning approach for characterizing and detecting socialbot targets in Twitter. The proposed approach is modeled as a ternary classification problem which categorizes socialbot targets into three groups viz. active, reactive, and inactive users, and profile them using static and dynamic characteristics based on their identity, interaction, textual, and personality-based attributes. We have also modeled the socialbot targets detection problem as a binary classification problem, wherein active and reactive users are combined into a single category (termed as trapped users) and inactive users are considered as the second category. The performance of our approach is compared with a state-of-the-art method for susceptible users detection. We have also performed a features ablation analysis to observe the discriminative power of the feature categories in segregating socialbot targets. On empirical analysis, we found that the network structure-based features are most discriminative, whereas the identity-based features are least discriminative. We have also analyzed the relevance of all identified features and found that following rate, multimedia ratio in tweets, and follower rate are the most relevant features for detecting socialbot targets. Further, we have presented a detailed topical and behavioral analysis of the socialbot targets. On analysis, active users are found to be suspicious, and both joy and agreeableness are the most dominating personality traits among the three categories of users. Analyzing the temporal evolution of trapped users and their ecosystem seems an interesting problem to extend the work reported in this paper.

Footnotes

References

Boshmaf

, Muslukhov

, Beznosov

and Ripeanu

, The Socialbot Network: When Bots Socialize for Fame and Money. In Proceedings of the 27th ACM Annual Computer Security Applications Conference, Orlando, USA, (2011), 93–102.

Fazil

and Abulaish

, Why a Socialbot is Effective in Twitter? A statistical insight. In Proceedings of the 9th International Conference on Communication Systems and Networks, Social Networking Workshop, Bengaluru, India, (2017), 562–567.

Elyashar

, Fire

, Kagan

and Elovici

, Homing Socialbots:Intrusion on a Specific Organization’s Employee using Socialbots. In Proceedings of the International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, Canada, (2013), 1358–1365.

Chavoshi

, Hamooni

and Mueen

, DeBot: Twitter Bot Detection via Warped Correlation. In Proceedings of the 16th IEEE International Conference on Data Mining, Barcelona, Spain, (2016), 817–822.

Cresci

, Pietro

, Petrocchi

, Spognardi

and Tesconi

, DNA-Inspired Online Behavioral Modelling and Its Application to Spambot Detection, IEEE Intelligent System 31(5) (2016), 58–64.

Pan

, Liu

and Hu

, Discriminating bot Accounts based Solely on Temporal Features of Microblog Behavior, Physica A 450(3) (2016), 193–204.

Davis

, Varol

, Ferrara

, Flammini

and Menczer

, BotOrNot: A System to Evaluate Social Bots. In Proceedings of the 25th ACM International Conference on World Wide Web, Montreal, Canada, (2016), 273–274.

, He

, Jiang

, Cao

and Li

, Combating the Evasion Mechanisms of Social bots, Computers & Security 58(3) (2016), 230–249.

Wagner

, Mitter

, Korner

and Strohmaier

, When Social bots Attack: Modeling Susceptibility of Users in Online Social Networks. In Proceedings of the WWW12 Workshops on Making Sense of Microposts, Lyon, France, (2012), 41–48.

10.

Wald

, Khoshgoftaar

T.M.

, Napolitano

and Sumner

, Which Users Reply to and Interact with Twitter Social bots? In Proceedings of the 25th IEEE International Conference on Tools with Artificial Intelligence, Herndon, USA, (2013), 135–144.

11.

Fazil

and Abulaish

, Identifying Active, Reactive, and Inactive Targets of Socialbots in Twitter. In Proceedings of the 16th ACM International Conference on Web Intelligence, Leipzig, Germany, (2017), 573–580.

12.

Bindu

and Thilagam

, Mining Social Networks for Anomalies: Methods and Challenges, Journal of Network and Computer Applications 68(2) (2016), 213–229.

13.

Ahmed

and Abulaish

, A Generic Statistical Approach for Spam Detection in Online Social Networks, Computer Communications 36(10) (2013), 1120–1129.

14.

Zhoua

, Zhang

, Sun

, Zhenga

and Liu

, Analyzing and Modeling Dynamics of Information Diffusion in Microblogging Social Network, Journal of Network and Computer Applications 86(2) (2017), 92–102.

15.

Chen

, Wen

, Zhang

, Xiang

, Oliver

, Alelaiwi

and Hasan

, Investigating the Deceptive Information in Twitter Spam, Future Generation Computer Systems 72(6) (2017), 319–326.

16.

Zhang

, Zhang

and Yan

, On the Impact of Social Botnets for Spam Distribution and Digital Influence Manipulation. In Proceedings of the 6th IEEE International Conference on Communications and Network Security, National Harbor, USA, (2013), 46–54.

17.

Aiello

L.M.

, Deplano

, Schifanella

and Ruffo

, People are Strange when you’re a Stranger: Impact and Influence of bots on Social Networks. In Proceedings of the 6th International Conference on Weblogs and Social Media, Dublin, Ireland, (2012), 10–17.

18.

Boshmaf

, Muslukhov

, Beznosov

and Ripeanu

, Design and Analysis of Social Botnet, Computer Networks 57(2) (2013), 556–578.

19.

Abokhodair

, Yoo

and McDonald

D.W.

, Dissecting a Social Botnet: Growth, Content and Influence in Twitter. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work and Social Computing, Vancouver, Canada, (2015), 839–851.

20.

Kuflik

and Shoval

, Generation of User Profiles for Information Filtering-Research Agenda. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval, Athens, Greece, (2000), 313–315.

21.

Oentaryo

R.J.

, Murdopo

, Prasetyo

P.K.

and Lim

, On Profiling Bots in Social Media. In Proceedings of the International Conference on Social Informatics, Bellevue, USA, (2016), 92–109.

22.

Pennacchiotti

and Popescu

, A Machine Learning Approach to Twitter User Classification. In Proceedings of the 5th AAAI International Conference on Weblogs and Social Media, Barcelona, Spain, (2011), 281–288.

23.

Esparza

S.G.

, Mahony

M.P.

and Smyth

, CatStream: Categorising Tweets for User Profiling and Stream Filtering. In Proceedings of the ACM International Conference on Intelligent User Interfaces, Santa Monica, USA, (2013), 25–36.

24.

Boshmaf

, Ripeanu

, Beznosov

and Neto

E.S.

, Thwarting Fake OSN Accounts by Predicting their Victims. In Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, Denver, USA, (2015), 81–89.

25.

Chu

, Gianvecchio

, Wang

and Jajodia

, Detecting Automation of Twitter Accounts: Are you a Human, Bot, or Cyborg? IEEE Transactions on Dependable and Secure Computing 9(6) (2012), 811–824.

26.

Lee

, Eoff

B.D.

and Caverlee

, Seven Months with the Devils:ALong-Term Study of Content Polluters on Twitter. In Proceedings of the 5th ACM International Conference on Weblogs and Social Media, Santa Monica, USA, (2011), 185–192.

27.

Singh

, Bansal

and Sofat

, Behavioral Analysis and Classification of Spammers Distributing Pornographic Content in Social Media, Social Network Analysis and Mining 6(1) (2016), 1–18.

28.

Elovici

, Fire

, Herzberg

and Shulman

, Ethical Considerations when Employing Fake Identities in Online Social Networks for Research, Science and Engineering Ethics 20(4) (2014), 1027–1043.

29.

Blei

D.M.

, Ng

A.Y.

and Jordan

M.I.

, Latent Dirichlet Allocation, Journal of Machine Learning Research 3(1) (2003), 993–1022.

30.

Jaro

M.A.

, Advances in Record-Linkage Methodology as Applied to Matching the Census of Tampa, Florida, Journal of the American Statistical Association 84(406) (1989), 414–420.

31.

Kwak

, Lee

, Park

and Moon

, What is Twitter, a Social Network or a News Media? In Proceedings of the ACM 19th International Conference on World Wide Web, Raleigh, NC, USA, (2010), 561–600.

32.

Chu

, Gianvecchio

, Wang

and Jajodia

, Who is Tweeting on Twitter: Human, Bot, or Cyborg? In Proceedings of the 26th ACM Annual Computer Security Applications Conference, Austin, USA, (2013), 21–30.

33.

Yang

, Harkreader

and Gu

, Empirical Evaluation and New Design for Fighting Evolving Twitter Spammers, IEEE Transactions on Information Forensics and Security 8(8) (2013), 1280–1293.

34.

Amleshwaram

A.A.

, Reddy

, Yadav

, Gu

and Yang

, CATS: Characterizing Automation of Twitter Spammers. In Proceedings of the 5th IEEE International Conference on Communication Systems and Networks, Bengaluru, India, (2013), 1–10.

35.

Abulaish

, Kumari

, Fazil

and Singh

, A Graph-Theoretic Embedding-Based Approach for Rumor Detection in Twitter. In Proceedings of the 18th IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece, (2019), 466–470.

36.

Rish

, An Empirical Study of the Naive Bayes Classifier. In Proceedings of the 2001 AAAI Workshop on Empirical Methods in Artificial Intelligence, Seattle, USA, (2001), 41–46.

37.

Quinlan

J.R.

, Programs for Machine Learning. Morgan Kaufmann. (1994)

38.

Breiman

, Random Forests, Machine Learning 45(1) (2001), 5–32.

39.

Fawcett

and Hoos

H.H.

, Analysing Differences between Algorithm Configurations through Ablation, Journal of Heuristics 22(4) (2016), 431–458.

40.

Wald

, Khoshgoftaar

T.M.

, Napolitano

and Sumner

, Predicting Susceptibility to Social bots on Twitter. In Proceedings of the 14th IEEE International Conference on Information Reuse and Integration, San Francisco, USA, (2013), 6–13.

41.

Mitchell

T.R.

, Machine Learning. McGraw Hill. (1997).

42.

Fazil

and Abulaish

, A Hybrid Approach for Detecting Automated Spammers in Twitter, IEEE Transactions on Information Forensics and Security 13(11) (2018), 2707–2719.

43.

Fazil

and Abulaish

, A Socialbots Analysis-Driven Graph-based Approach for Identifying Coordinated Campaigns in Twitter, Journal of Intelligent & Fuzzy 38(3) (2020), 2961–2977.

44.

Beskow

D.M.

and Carley

K.M.

, Bot-Match: Social Bot Detection with Recursive Nearest Neighbors Search. arXiv, (2020), 1–22.

45.

Zhao

, Zhou

, Li

, Tang

and Zeng

, DeepEmLAN: Deep Embedding Learning for Attributed Networks, Information Sciences 543(7) (2020), 382–397.

46.

Chu

, Wang

, Liu

and Liu

, Social Network Community Analysis based Large-Scale Group Decision Making Approach with incomplete Fuzzy Preference Relations, Information Fusion 60(7) (2020), 98–120.

47.

Yang

K.C.

, Varol

, Hui

and Menczer

, Scalable and Generalizable Social Bot Detection through Data Selection. In Proceedings of the 34th AAAI International Conference on Artificial Intelligence, New York, USA, (2020), 1096–1103.

48.

, Liu

, Zheng

, Tang

, He

and Du

, Bi-Labeled LDA: Inferring Interest Tags for Non-famous Users in Social Network, Data Science and Engineering 5(1) (2020), 27–47.