Abstract
Majority of the studies on counter-terrorism using social network analysis consider homogeneous networks built over either terrorists or terrorist organizations. However, terrorist attacks are often defined using heterogeneous attributes such as location, time, target type, organization, etc. This paper constructs a heterogeneous terrorist attack network considering heterogeneous attributes and captures heterogeneous influences propagated from different attributes using Personalized PageRank. Personalized PageRank is a flexible model capable of propagating supervised information while traversing over a network using different personalized parameters. This study investigates effects of various parametric setups to study influence of various factors such as news media discussions, historical activities, temporal behavior etc., on one of the important counter-terrorism problems; prediction of future attacks of a terrorist organization. From various experimental observations, it is evident that news media discussion and network’s temporal behavior have a positive influence on future activities of a terrorist organization. Further, this paper investigates responses of various node proximity based link prediction methods on predicting future relationships between a terrorist organization with other attributes (such as country, city and target types). Majority of the studies on link prediction using node proximity ignore node importance. However, in a heterogeneous environment, nodes from different classes may have different importance. This paper proposes new variants of four proximity based link prediction methods, namely, Adamic Adar, Jaccard Coefficient, Resource Allocation, and Common Neighbor, which have the capability to incorporate node importance. With suitable experiments, we show that the proposed variants of link predictors are more accurate at predicting relations than their state of art counterparts.
Keywords
Introduction
With the increase in relevance of applying complex network analysis solutions for addressing real world problems, a spur in applying social network analysis (SNA) techniques in counter-terrorism and homeland security has been witnessed in recent time [19,36,37]. SNA solutions to the problems such as node centrality, edge betweenness, node influence, community detection etc., have direct relevance to the intelligence questions such as (i) identification of criminal or the communication channel so that elimination of which will result in organization break down, (ii) determining hidden relationships between activists and terrorist organizations, (iii) systematic evolution of community, etc. Although analogy between intelligence/counter-terrorism problems and the social network analysis problems is established long before
Majority of the earlier works on SNA for counter-terrorism have mainly exploited homogeneous networks (network with only one class of nodes and one class of ties). For example, network of terrorists have been used to study
An event of terrorist attack is defined by various attributes such as date and time of attack, place of attack, type of attack, target type, weapon used, terrorist organization involved etc. [31]. This paper considers similar attributes and constructs heterogeneous attack network using Global Terrorism Database (GTD1
Several link prediction methods have been proposed in the literature, e.g. proximity based [21], path based [21], tie weight based [29,38], random walk based [24] etc. Except for few random walk based approaches, a majority of the methods do not consider node importance while predicting future relations. However, in the context of heterogeneous network (terrorist attack network in particular), node importance may play an important role. For instance, a terrorist organization may prefer to attack a country which is popular across the globe to achieve more attention. To capture the node importance, this paper further re-formulates the existing proximity based link prediction methods and investigates the effects.
For understanding the structural changes of a network, information diffusion plays an important role. From papers [29] and [3], it is observed that information propagation in a network is susceptible to exogenous information too, such as, news and social media, structural changes over time, etc. Moreover, external sources such as, news and social media play an important role in populating ideology, motivations and hate messages of terrorist organizations [15,30,34]. Therefore, there is a need for incorporating effects of external information for mining the terrorist attack network. As stated above, PPR can be used as a single model which is capable of incorporating different scenarios (for details refer Section 3.2). Therefore, we exploit PPR to capture the effects of two types of external information sources, namely, (i) news media, and (ii) temporal dynamics of underlying network on estimating the node centrality. The centrality score thus obtained from PPR is further used as node importance by reformulating common neighbor (CN), Jaccard Coefficient (JC) [14], Adamic Adar (AA) [1] and resource allocation (RA) [49]. We observe that incorporating temporal dynamics improves the performance of predicting relations between heterogeneous attributes such as, terrorist group to country, city, targets, etc. However, news media discussion helps in predicting relations between the terrorist organizations.
We summarize main contributions of this paper as follows:
Projecting personalized PageRank as a potential solution to capture various heterogeneous scenarios (without changing underlying model) while estimating network centrality.
Projecting personalized PageRank as a potential framework to seamlessly incorporate information from external sources.
Reformulating existing proximity based link prediction methods to incorporate node weight.
Predicting relationship between heterogeneous attributes of the network.
Rest of the paper is organized as: Section 2 discusses existing studies on SNA in counter-terrorism, the relation between media and terrorism, link prediction and centrality measures in heterogeneous information network. We present proposed framework in Section 3 which gives the detailed reformulation strategies of PPR and proximity-based link prediction methods. This is followed by dataset and experimental setups in Section 4 while experimental observation and discussion is given in Section 5. Section 6 concludes the paper with presenting future possibilities.
This section reviews four aspects of studies related to this paper, namely, (i) SNA for counter-terrorism, (ii) Media and Terrorism, (iii) Link prediction on Heterogeneous Information Network (HIN), and (iv) Centrality measures in HIN.
SNA for counter-terrorism
Social Network Analysis was exploited by Sparrow [40] for the first time to identify terrorists/criminals that need to be neutralized for destabilizing the network of conspirators. Though this study has demonstrated the strength of using SNA techniques in counter terrorism prior to
Media and terrorism
In recent era, terrorism has evolved to a new level which may be visualized as a consequence of the availability of various easy to communicate media, such as, News, Internet, and Social Media. Availability of such platforms enable terrorist organizations to perform various activities such as recruitment, finding sympathizers, populating propaganda, arranging campaigns etc. [15,30].
Mutual dependency or symbiotic relationship between news media and terrorism is widely visible. Media industries are often attracted by the sensational and trending news created by terrorist organizations and the reporting acts as free publicity for terrorists [34]. According to the study [26], huge news media reporting may sometimes result as an Enabler of Act of Terrorism. At the same time, media attention towards an attack often depends on the modus operandi of the attack, geographical importance and political orientation [16].
Link prediction in Heterogeneous Information Network (HIN)
Link prediction in HIN is an important problem and has got much popularity in recent past [39]. However, there is no study exploiting a HIN for answering questions related to counter-terrorism, such as, (i.) which terrorist organization can be the major threat to a country in future, (ii.) which city can be the target city by a terrorist organization in future, (iii.) which target types can be attacked by a terrorist group in future, etc. Existing studies related to link prediction over HIN can broadly be grouped into two approaches; (i) meta path based models [5,6,41,42,48], and (ii) probabilistic models [7,46,47]. Link prediction algorithms that exploit meta paths generally consider two-step process. In the first step, meta path based feature vectors are extracted, and the second step employs a machine learning model (such as regression, classification, etc.) to determine the existence of links. For instance, in study [41], a model named
Probabilistic modeling for link prediction in HIN has also been well explored. The study [46] proposes an influence propagation based link prediction method by exploiting conditional probability in a multi-relational network. In a similar approach, the paper [47] proposes a topical factor graph model to aggregate information from different heterogeneous data sources and designs a semi-supervised learning model to find relationships. Further, the study [7] proposes a transfer-based ranking factor graph model which combines different social patterns to predict links in a network.
Unlike above studies which focus on tie weight, this paper focuses on node importance while formulating link prediction methods.
Centrality measures in HIN
Centrality measures have been used in many aspects of network mining problems [4,13,18]. Like link prediction, several studies have also proposed centrality measures for HIN by considering meta paths [20,25]. For example, Liu et al. [25] propose a ranking framework for publications with pseudo-relevance feedback by exploiting several meta-paths in a heterogeneous bibliographic network. Further, Li et al. [20] propose HRank to rank objects and meta-paths simultaneously in HIN. However, all of these works are limited to incorporating the topological structures (in the form of meta-path).
Tsai et al. [44] propose SocialRank which uses external information like social hints for image search and ranking in social networks. In a similar direction, this paper also exploits external information such as news media and temporal dynamics of the network, and rank the nodes using Personalized PageRank (PPR).
Proposed framework
This section elaborates the proposed framework in details. As defined in Section 1, a terrorist attack can be defined by many attributes. In this study, we construct HIN of terrorist attacks by connecting various attributes defining the attacks (refer Section 4 for details).

Heterogeneous network with three types of weighted node and link classes, size of nodes and thickness of links show their importance.
A heterogeneous information network can be represented as
(Weighted HIN).
It is defined by a six tuple
If
Node importance and centrality measures
Ranking of nodes may contribute to the relationships they form with other attributes. Particularly in case of a terrorist attack network, the likelihood of forming a relationship between a high influence terrorist organization and target type of high reputation may be more. It is quite intuitive that attacking a target type of high repute would serve the motives of a terrorist way faster. Considering this motivation we incorporate the node importance to predict the relationships between different nodes.
Several centrality measures have been proposed in literature i.e. degree [9], betweenness [23], closeness [9], PageRank [4], Eigenvector [17] etc. As this study aims at incorporating the effects of external information on a heterogeneous terrorist attack network, we chose Personalized PageRank (PPR) as a method capable of incorporating external information as its personalization parameter.
For counter-terrorism, frequently appearing attributes defining a terrorist attack may be of more importance. To incorporate this intuition, we also propose Total Frequency and Recent Frequency of appearance (refer Section 4.1) by the nodes as personalization parameter for the proposed heterogeneous terrorist attack network.
Random walk based method
Given a Markov transition matrix
In the above PageRank formulation, a random walker jumps to any random node with equal probability i.e.,
Personalised PageRank for heterogeneous network
Unlike PageRank, personalised PageRank assumes that different user will have a different walking preference. A person in the film industry may prefer to visit web pages in entertainment domain more than literature. In such a scenario, traditional PageRank fails to capture user’s preferences. Personalized PageRank, as discussed in [10,12], incorporates user’s preferences and is defined as follows.
GTD used for constructing heterogeneous terrorist attack network
GTD used for constructing heterogeneous terrorist attack network
Let
The matrix
Since
Without loss of generality, we can realize that the centrality vector
As mentioned in the earlier discussion, apart from the topological information, we extend our study with other information such as appearance in news publications, temporal characteristics of network entities. Such information may be available in terms of a vector or a matrix. The proposed PPR has the capability to incorporate such information without any problem. If the information is available in the form of a node vector
Link prediction
Link prediction problem has a long history, which has been explored in several ways [22,28]. Given a network graph, the task of link prediction has been studied from various perspectives; (i) prediction of existing links, but yet unknown [1], (ii) prediction of non-existing links, but likely to appear in future [21], and (iii) prediction the life span of existing links (particularly in temporal and dynamic networks) [45]. In the past, researchers have proposed various unsupervised and supervised methods to address link prediction methods. Among the unsupervised approaches, existing methods include (i) local node proximity based methods [21], (ii) global path based methods [21], (iii) latent methods using matrix factorization [27] and (iv) random walk based methods [24]. On the other hand, authors have investigated the effect of various classification frameworks on link prediction by exploiting various features in the studies [43]. All of these studies consider homogeneous network. However, many of the real networks available today such as social networks, co-authorship networks, terrorism-related networks etc. are heterogeneous networks, where the network may have different types of nodes and links. Considering the heterogeneous nature of the networks, there is a need to adapt the existing methods or propose new methods suitable for heterogeneous networks. This paper focuses on adapting and reformulating existing methods (local node proximity based methods in particular) and study their effects on the heterogeneous network for link prediction. Given a weighted heterogeneous network G as defined in Definition 1, let
This study uses Global Terrorist Data (GTD) collected from National Consortium for the Study of Terrorism and Responses to Terrorism [31]. Each terrorist attack is defined by approximately hundred number of features. Out of these features, this study considers ten features which can potentially define an attack event; country, region, provincial state, city, attack type, target type, target subtype, terrorist organization, weapon type, and weapon subtype. These ten features represent ten different node classes in the experimental heterogeneous network. Table 1 illustrates a sample of experimental dataset extracted from GTD.
Based on Definition 1 and the dataset described in Table 1, the experimental heterogeneous network
The vector
Characteristics of dataset and external information
Characteristics of dataset and external information
Number of nodes in different classes for training data, TarType: target type, TarSubtype: target subtype, WeapType: weapon type, WeapSubtype: weapon sub type, and AttType: attack type
As reported in Section 2.2, external information such as news media, television, etc., share a symbiotic relationship with terrorist activities. Co-occurrence of different attributes defining a terrorist attack in news articles justifies the existence of relational dependencies among them. Therefore, to investigate the effect of news media this paper exploits the published news articles in popular newspaper from India namely, THE HINDU. We generate a new network which only considers nodes and edges from GTD network. If two nodes are co-occurring in an article we put an edge between the node pair. Since this network is built particularly over the GTD network with THE HINDU, we name this network as
From earlier studies [8,32], it can be inferred that incorporating temporal changes by different attributes and various relationships may result in better performance in estimating centrality, link prediction, etc. For example in counter-terrorism, recent activity of terrorist organizations may be of more importance than their past behavior. To incorporate this we further create a tensor GTD network (
Suppose the dataset is divided in
As discussed in Section 1, one of the objectives of this paper is to study the effects of external information such as, news media (
Experimental observations
This section investigates the ability of parametric Personalized PageRank formulated in Sections 3.2.2 and 3.2.3 for performing various counter-terrorism related analytical tasks without changing the underlying model, but by passing appropriate user-defined personalization parameters. We begin with investigating effect of different personalization parameters of PPR on node importance. Here we analyze the linear dependency between the rank obtained by the model for terrorist organizations with their actual activeness frequency in future (from 2010 to 2014) using Pearson Correlation Coefficient. The study further extends to understand the effect of node importance (node weight) on different proximity based link prediction methods. The link prediction analysis is focused mainly on three heterogeneous relations; (i) relation between terrorist organization and country indicating a future possible attack by organization on a country,
Centrality and its correlation with future activities
In network science, centrality measures define the level of prominence of a node in a network. Among various centrality measures, PPR is a flexible random walk based model where the walker visits the nodes under a supervised direction. This section investigates the effect of various formulations of PPR model parameters which potentially provide high correlation with the ground truth (i.e., future activities). For this task, we consider all the terrorist organizations who have attacked during the testing period (i.e., between 2010 to 2014) and ranked them by their frequency of attacks. This ordered list is considered as the ground truth. We, then, estimate Pearson’s correlation between the ground truth and the observed PPR centrality order (estimated from the training data) using various personalized parameters. We investigated five PPR model parameters as described in Section 4.1. These five model parameters are formulated to study the effect of four different aspects; (i) effect of media reporting (
Average AUC score by AA, JC, RA, and CN using weights from
,
and their Unweighted estimates
Average AUC score by AA, JC, RA, and CN using weights from
Average AUC score by AA, JC, RA, and CN using weights from PR,
The plots in Fig. 2 show correlation between ranking of terrorist organizations using PPR and ranking by the frequency of attacks between year 2010 to year 2014. Since, Fig. 2 is arranged in descending order by the future frequency of terrorist organizations, correlation with only top few is low, but positive. As the number of organization increases, the correlation score is also increased considerably. The motivation for the correlation analysis is to assess the linear dependency between the ranking by proposed models and actual activities by terrorist organizations in future. Further, we validate the ranking performance among all the proposed models over all the terrorist organizations. It is evident from Fig. 2 that appropriate personalization may enhance correlation performance. Moreover, it is observed that over all ranges of the top organization, at least three parametrized models outperform non-parametrized model.

Correlation of different ranking models with future attack frequencies (Ground Truth),
Further, from the plot, it is clearly visible that temporal based parameters (
Topological information of a network alone is not sufficient to predict future activities of a terrorist organization. Considering additional information such as historical activities of the organization, media appearance and recent activities etc. along with topological structure helps in predicting future activities.
In the above section, we have investigated correlation between future activities of an organization and topological centralities, historical activities, recent activities, and analyze their responses in predicting future activities. With the close inspection of GTD, we found that terrorist organizations mostly follow the earlier way of attack, target types, etc. To investigate this, we predict repeated future relationships between a terrorist organization and other attributes defining an attack. The proposed heterogeneous network constructed in Definition 1 using the GTD dataset consist of ten number of attribute classes. The proposed framework can potentially study link prediction between all
Table 4 and Table 5 compare the link prediction performances in terms of Average AUC score for six node centrality setups with the baseline prediction models without considering node weights (Unweighted). Out of 72 prediction cases (six centralities, four prediction models, and three types of relations to be predicted), the proposed node weighed version outperforms unweighted counterparts at least in 70% cases. It shows that node weight helps in improving prediction accuracy. Among the four prediction models, RA dominates the other three models (CN, JC, and AA) in all the setups except for

Average Precision for top 50 predicted relations by RA.

Terrorist Attack Distribution over 2000–2014.
While observing significant improvement in prediction performance after considering node importance (as compared to unweighted counterpart), it is observed that the six personalization parameters provide comparable AUC scores. Though we observe significant differences in correlation analysis as reported in Fig. 2, the comparable prediction performance in Table 4 and Table 5 may be due to the averaging effect while estimating AUC scores. To investigate this, we further evaluate the predicted relations using Average Precision at K (
Among the four link prediction methods we chose RA as it is at least 70% times better than others (see Table 4 and Table 5).
Figure 3 shows the
In the above discussion, we have averaged the scores over the entire set of terrorist organizations. However, attack frequency of different organization varies widely. Figure 4 shows attack frequency distribution of different organizations. It clearly shows that very few terrorist organizations carry out majority of the attacks globally. Characteristics of active and less active organizations may be different. To understand such possible patterns, we further analyze

Average Precision by RA for Top 50 predicted relations formed by Most Active and Rest of the terrorist organizations.
Top five Terrorist Organization Alliance predicted by RA using all the parametric variants of PPR as centrality measures listed in Section 4.1
Similar to the observation in Fig. 3, weighted methods perform better than unweighted in almost all the cases for predicting relations contributed by most active and the rest terrorist organizations (refer Fig. 5). It can be also be seen that
Predicting relationship gets better by considering the effect of external information as node weights. However different information sources have different effects. The recent information is important for most active terrorist organization while news media helps in tracking down other least active organizations.
The terrorist attack network and link prediction methods explored in this study can also be utilized for analyzing potential alliances among terrorist organizations without modifying the underlying framework. It is because, two terrorist organizations may be allies if they have common targets or share a common ideology. Table 6 lists the top five alliances obtained using RA for different parametric PPR as centrality measures. After referring to external sources; (i) Wikipedia pages related to concerned terrorist organizations, (ii) IDSA publications towards the possible alliance, we have manually verified the top observed alliances reported in Table 6. It is observed that almost 60% of the predicted alliances are found to be true (highlighted in Table 6). For example, it is easy to verify from above sources that Maoist, CPI-Maoist, and LTTE help each other in various needs, such as training, finances, etc., and are potential allies.2
This paper investigates the ability of personalized PageRank over a heterogeneous terrorist attack network for incorporating different forms of personalized parameters to address various aspects of counter-terrorism related issues. Effect of various parametric setups have been investigated using Global Terrorism Database for two important counter-terrorism tasks; (i) prediction of future attack from a terrorist organization, and (ii) prediction of the relationship between a terrorist organization with other attributes (country, city and target types). From various parametric setups, we observe that PPR when personalized with network’s temporal dynamics performs better in predicting future attacks of terrorist organizations. Further it is also observed that incorporating node importance can enhance relationship prediction performance. With RA as link predictor we found that relation prediction score is improved by 7.5% for
This study has limited its investigation only for three types of relationships. It can easily be extended to more relationships. Further, this paper has considered only one news publication to study the influence of media discussion on various counter-terrorism tasks. It may be extended to more number of media resources to capture wider spectrum of information and relationship.
