An in-depth study on key nodes in social networks

Abstract

In social network analysis, identifying the important nodes (key nodes) is a significant task in various applications. There are three most popular related tasks named influential node ranking, influence maximization, and network dismantling. Although these studies are different due to their own motivation, they share many similarities, which could confuse the non-domain readers and users. Moreover, few studies have explored the correlations between key nodes obtained from different tasks, hindering our further understanding of social networks. In this paper, we contribute to the field by conducting an in-depth survey of different kinds of key nodes through comparing these key nodes under our proposed framework and revealing their deep relationships. First, we clarify and formalize three existing popular studies under a uniform standard. Then we collect a group of crucial metrics and propose a fair comparison framework to analyze the features of key nodes identified by different research fields. From a large number of experiments and deep analysis on twenty real-world datasets, we not only explore correlations between key nodes derived from the three popular tasks, but also summarize insightful conclusions that explain how key nodes differ from each other and reveal their unique features for the corresponding tasks. Furthermore, we show that Shapley centrality could identify key nodes with more generality, and these nodes could also be applied to the three popular tasks simultaneously to a certain extent.

Keywords

Social network key node influential node ranking influence maximization network dismantling

1. Introduction

Social network analysis is a wide and hot research topic that uses graph structures to represent people and their relationships from both real and virtual worlds. Different people own different social circles, leading to different individuals possessing different positions and importance in a network. For example, a person who owns 1,000,000 followers on Twitter is usually more important to the network than a person with 100 followers. In social network analysis, identifying a set of key nodes is a very important task which inspires various research studies with different motivations. In real life, key nodes are generally recognized as celebrities in social activities, such as business leaders, research experts, outstanding players, movie stars and famous politicians. Identifying these key nodes can be useful for many applications, e.g. online advertising [1], expert finding [2], virus propagation prevention [3] and network anomaly detection [4]. Currently, there are three relevant popular key node research fields in social network analysis, i.e. Influential Node Ranking (also Node Ranking), Influence Maximization and Network Dismantling. Node ranking aims to rank influential nodes correctly (especially high influential nodes). Influence maximization [5, 6] focuses on maximizing the coverage of information through dissemination with a small number of initial spreaders. Network dismantling [7] asks to break the network into many disconnected pieces by removing a minimal group of nodes.

Although lots of methods have been proposed for these three fields, they have two inherent drawbacks: (1) Non-expert readers may easily confuse between these closely related domains. In fact, these three kinds of studies are intuitively quite similar in that they all intend to identify some kind of key nodes. Also, the same concepts may be adopted and improved in these studies to evaluate the importance of nodes, such as degree centrality and community structures. Moreover, readers could be misled when the titles of these papers give a wrong impression. For example, some papers titling “identifying influential nodes/spreaders” actually deal with the task of influence maximization [8, 9, 10, 11]. Also, the term “top-k influential nodes” in [12, 13] has different meanings, where one [12] implies Node Ranking and the other [13] refers to Influence Maximization. Particularly, a quite notable paper titled “Influence Maximization in Complex Networks Through Optimal Percolation” [14] published in Nature essentially presents a novel method for network dismantling. (2) Non-expert users may feel overwhelmed when trying to get key nodes in networks without a specific purpose since these methods cannot answer a simple but general question, i.e., what are the key nodes in networks? Although many existing studies focus on key nodes, each of these studies rather identifies key nodes with its own single motivation and only related metrics, resulting in that the identified key nodes are biased. Therefore, they are unable to give a general idea of key nodes since the concept of key nodes is limited in their own tasks.

In addition, few studies have explored the correlations between these three key node relevant tasks. For example, are the identified key nodes from these three tasks similar to each other? If they are similar, could the identified key nodes from one of the tasks be also applied to another? If not, what are the unique features behind these different key nodes that cause their limited adaptability to their own tasks? Intuitively, Influential Node Ranking and Influence Maximization are both related to a node’s influence, and thus they could be similar. Also, since the activation of the key nodes from Influence Maximization could maximize the coverage of information spread, the removal of them may also block the spread and thus dismantle a network. By analogy, since the removal of the key nodes from Network Dismantling could limit the information outbreak to a sub-extensive size, the activation of them may also benefit the spread of influence. In other words, these two kinds of key nodes may all act as hubs throughout a network. Although Radicchi et al. [15] have studied the difference between key nodes from Influence Maximization and Network Dismantling, the comparison algorithms they chose are not representative and the datasets they used are rather small in size.

In this paper, we clarify and formalize these three research tasks, as well as present their motivation and metrics respectively. Through comparing and analyzing the features of key nodes derived from different tasks, we collect and build a series of crucial metrics under a uniform standard. Based on the above, we propose a comparison framework that is a fair measurement system which generally evaluates a key node set from different aspects. With the proposed framework, we further evaluate the key nodes identified by existing studies to explore the correlations between them. Besides, we also discover a cooperative-game-theory based network centrality that could identify the most representative key nodes under the general comparison framework.

There are some challenges in this work compared with previous studies. First of all, though many studies have been focused on these three popular fields, there is still a deficiency in describing and formalizing them under a uniform standard. Second, although these three tasks have their own different motivation, the correlations between their identified key nodes remain unknown. Third, key nodes identified by different research studies may be distinct in features, such as ordering, output sizes and dependent models. These differences make it a hard task to evaluate them fairly and generally. At last, exploiting a method that identifies a group of more general key nodes is still a gap for existing research.

Overall, the main contribution of our work is to solve the above problems by conducting an in-depth survey of different kinds of key nodes, which is elaborated point by point as follows.

•
Formalization and comparison of existing key node relevant researches. We clarify the three popular key node relevant research tasks (Node Ranking, Influence Maximization and Network Dismantling) and formalize them under a uniform standard. From a large number of experimental analysis on 20 datasets and two specific case studies, we summarize the correlations between identified key nodes from different tasks and give further insightful conclusions within the three research fields.
•
Presentation of a fair key node comparison framework. Overcoming the distinct features of key nodes from different tasks, we propose a fair comparison framework which collects crucial metrics based on the analysis of existing studies to generally evaluate a key node set. Under this framework, we compare the key nodes from different research fields and show their unique features that benefit their own tasks.
•
Exploration of deep features for different key nodes. Through comparison and deep analysis, we give further insight into the crucial features of these key nodes. We also explore and infer their distribution characteristics, which are confirmed with visualization on several networks. Moreover, we verify all of our explorations via two concrete case studies on two real-world networks.
•
Identification of key nodes with generality. Based on all the experiments, we find that the Shapley value [16], derived from the cooperative-game theory, is the most suitable for identifying general key nodes. We show that key nodes identified by Shapley centrality is practically a combination of core parts from different studies and could also be applied to the three popular tasks simultaneously with good performances.

2. Related works

There are already some excellent studies concentrating on reviewing one of these three research fields. For example, Lü et al. [17] extensively review the approaches in influential node ranking, emphasizing on the physical concepts and related methods. Arora et al. [18] conduct an in-depth benchmarking study w.r.t. the start-of-the-art algorithms in influence maximization while Li et al. [19] review these algorithms theoretically and give a rigorous theoretical comparison of them. Wandelt et al. [20] compare the algorithms from network dismantling on many different types of networks, the results of which could serve as a reference for selecting an appropriate network dismantling method for a given network, considering both accuracy requirements and run time constraints.

However, since our work aims to explore the correlations between key nodes from three different tasks rather than compare algorithms within the same task, we only give a brief review on the existing algorithms, focusing on analyzing and classifying the basic ideas behind them. As shown in Fig. 1, we summarize existing key node relevant research studies, giving attention to their approaches that deal with their corresponding tasks. We also introduce some of their representative algorithms.

Figure 1.

Approaches and some representatives for identifying key nodes.

2.1 Influential node ranking algorithms

Algorithms for node ranking usually adopt classic network centralities and could be roughly classified into the local-view and the global-view. LocalRank [21] and H-index [22] are two classic local-view approaches which extend the degree centrality. Local-view approaches are fast and simple but less effective when used alone because they only focus on the local neighborhood. In terms of the global-view, global centralities often serve as the foundations. For example, LeaderRank [23] improves the classic PageRank, KS-IF [24] is an refinement of k-core and GLR [25] reduces the computational complexity of closeness.

Since this task aims to rank nodes on the basis of individuals’ influence spread, estimating single node influence is undoubtedly a feasible choice for this task. Chen et al. [26] summarize this kind of approach as single node influence (SNI) centrality and proposed ASNI-RR algorithm to fast estimate single node influence based on the reverse reachable (RR) set [27, 28, 29].

2.2 Influence maximization algorithms

Dealing with influence maximization, approximation algorithms and estimation algorithms are two kinds of main approaches. Based on the submodularity and monotonicity of this task, Kempe et al. [30] present the first approximation algorithm (Greedy) which iteratively picks node that could bring maximal gain in terms of influence as a seed node. The approximation algorithms could guarantee their performance to be at least $1-1/e-\epsilon$ of the optimal solution. After that, various approximation algorithms have been proposed to improve the Greedy aiming to reduce its time cost, such as NewGreedy [31], CELF [32], PMC [33], TIM [29] and IMM [27].

On the other side, estimation algorithms tend to estimate a score rather than influence spread for each node. Combined with certain seed selection strategies, these algorithms could be quite fast but without theoretical guarantees, such as DegreeDiscount [31], PMIA [34], EasyIM [35] and FNS [36].

2.3 Network dismantling algorithms

To dismantle a network, some algorithms iteratively remove nodes with the highest centrality, which we summarize as the node-level approach. The key to these algorithms is to design a structure-relevant centrality which could be calculated in a short time. Collective Influence (CI) [14] is a classic node-level method which presents its centrality based on optimal percolation. Also, Clusella et al. [37] propose Explosive Immunization (EI) algorithm that combines explosive percolation paradigm and the idea of maintaining a fragmented distribution of clusters. It is worth mentioning that the Recalculated Betweenness (RBW) [38] has shown great effectiveness in a comparative analysis [20] but demand heavy time cost.

The graph-level approaches firstly decycle (eliminate all cycles) a network into trees and then easily dismantle these trees with a greedy strategy. The key of these algorithms is to fast decycle a network with minimal cost. The representatives of this kind of approaches are Min-Sum [7], CoreHD [39] and BPD [40].

3. Tasks and their motivations

3.1 Social network and information propagation model

A social network with $N$ individuals and $M$ social relations can be represented by a graph $G=(V,E)$ where $V(N=\lvert V\rvert)$ is the set of vertices (also nodes) and $E(M=\lvert E\rvert)$ is the set of edges. An in-neighbor $u$ of a node $v$ is the node that has a direct link towards $v$ , and then $v$ is an out-neighbor of the node $u$ . We denote all in-neighbors of $v$ as $\Gamma_{in}(v)$ and all out-neighbors of $v$ as $\Gamma_{\textit{out}}(v)$ .

In information dissemination, each edge $(u,v)\in E$ is mapped to a spreading probability $p(u,v)\in(0,1]$ determined by specific propagation models. A propagation model also defines certain rules for dissemination process. For example, the Weighted Independent Cascade (WIC) model is one classical propagation model. In the WIC model, nodes are either active or inactive. At time 0, nodes in a given set are activated while the other nodes are inactive. Starting with these activated nodes, the spreading process proceeds as time unfolds in discrete steps. At time $t\geqslant 1$ , for each activated node $v$ at $t-1$ , it has one chance to activate each of its inactive out-neighbor $u$ with probability $p(v,u)=1/\lvert\Gamma_{in}(u)\rvert$ . The whole process keeps repeating and will end when there is no more activation. In a natural manner, we can get the $I(S)$ consisting of all nodes that have been activated during the process. In social network analysis, the influence spread of a node set $S$ (denoted as $\sigma(S)$ ) is usually defined as the expected number of activated nodes during a complete dissemination process started with $S$ , namely $\sigma(S)=\mathbb{E}[\lvert I(S)\rvert]$ . In addition, the single node influence of arbitrary node $v$ is denoted as $\sigma(\{v\})$ .

3.2 Influential node ranking

Influence Node Ranking (NR) aims to sort nodes monotonically in a network, whose result should be consistent with the ranking of single node influence as much as possible. Let $R^{\mathit{NR}}$ denote the ranking result of an NR algorithm and $R^{\sigma}$ denote the ranking derived from sorting $\sigma(\{v\})$ in descending order. $n_{c}$ and $n_{d}$ are used to denote the numbers of concordant and discordant pairs between $R^{\mathit{NR}}$ and $R^{\sigma}$ . Then the task of node ranking can be formalized as

$\displaystyle R^{\mathit{NR}}=\mathop{\arg\max}\tau(R^{\mathit{NR}},R^{\sigma}% )=\mathop{\arg\max}\frac{n_{c}-n_{d}}{0.5N(N-1)}.$ (1)

The Kendall’s coefficient $\tau$ [41] actually evaluates the similarity between two rankings. The value $\tau$ lies between $-$ 1 and 1 and higher $\tau$ indicates a more accurate ranking that an NR algorithm could generate. It should be noted that calculating or estimating $\sigma(\{v\})$ is not necessary for an NR algorithm. Since calculating explicit influence spread of every single node is a time-challenging task, node ranking tends to efficiently acquire an ordering of nodes which could be as much as consistent with the result of sorting $\sigma(\{v\})$ . In particular, ranking high influential nodes correctly is of more significance than ranking low influential nodes.

Improving existing centralities and estimating single node influence are two common directions in NR studies. For example, k-core centrality is frequently adopted and modified in the first direction of studies as it could identify a set of nodes that are tightly interlinked as the core of the graph. However, It could be easy for the second direction of algorithms to get a good ranking result as long as they could estimate $\sigma(\{v\})$ with high accuracy.

If required to present a set consisting of $\lambda$ key nodes, NR algorithms would first generate their rankings, and then provide the top $\lambda$ nodes as the result.

3.3 Influence maximization

Influence Maximization (IM) aims to find a node set $S$ consisting of $k$ nodes (called seed nodes), such that the influences spread $\sigma(S)$ could be maximal. Let $S^{\mathit{IM}}$ denote the identified node set of an IM algorithm. Then the task of influence maximization can be formalized as

$\displaystyle S^{\mathit{IM}}=\mathop{\arg\max}_{\lvert S\rvert=k}\sigma(S).$ (2)

Compared with node ranking, influence maximization concentrates on the collective influence which is closely associated with marginal gain rather than single node influence. The marginal gain of a node $v$ to a node set $S$ , denoted as $\sigma(v|S)$ , is the difference between influence spread if $v$ is activated in addition to $S$ . Mathematically, $\sigma(v|S)=\sigma(S\cup\{v\})-\sigma(S)$ .

It has been proved that the influence function $\sigma(\cdot)$ is both submodular and monotonic [30]. Thus, successively picking node with the maximal marginal gain as a seed node could theoretically guarantee its performance to be within $1-1/e-\epsilon$ of the optimal solution. A lot of algorithms have made efforts to better approximate $\sigma(v|S)$ with less time cost. There are also some IM algorithms based on network centralities combined with a proper seed selection strategy which usually changes the evaluating centralities or network structure when a seed node is picked. These algorithms can be fast and memory friendly but have no theoretical guarantees.

It is worth noting that the Shapley value [16], derived from a fundamental cooperative-game theory, formalized as Shpaley centrality and implemented by ASV-RR [26], evaluates the importance of nodes from a unique perspective. The Shapley centrality measures a node by its expected marginal contributions over arbitrary node permutations and provides each node with a score. ASV-RR could be viewed as an NR algorithm which concentrates more on the expected marginal gain.

If required to present a set consisting of $\lambda$ key nodes, IM algorithms would set $k=\lambda$ , and then provide the identified $k$ seed nodes as the result.

3.4 Network dismantling

Network Dismantling (ND) aims to find the minimal set of nodes whose removal leaves the network broken into many disconnected components with the size of their largest one limited to a preset threshold $\rho$ . Let $S^{\mathit{ND}}$ denote the identified node set of an ND algorithm, $G\{V-S\}$ denote a network if all nodes in $S$ are removed from $G$ , and $G_{\textit{lcc}}$ denote $G$ ’s largest connected component. Then the task of network dismantling can be formalized as

$\displaystyle S^{\mathit{ND}}=\mathop{\arg\min}_{\lvert G\{V-S\}_{\textit{lcc}% }\rvert<\rho}\lvert S\rvert.$ (3)

The threshold $\rho$ is usually set to $\lceil N/100\rceil$ or $\lceil\sqrt{N}\rceil$ .

Decycling is an efficient technique to make ND task much easier. Decycling seeks to find a subset of nodes whose removal could leave a network acyclic, i.e. eliminating all cycles in a network. An acyclic graph could be easily dismantled by adopting a greedy tree-breaking strategy [7]. Besides, reinsertion is a critical step in ND algorithms. No matter how ND algorithms remove nodes to make $\lvert G_{\textit{lcc}}\rvert$ meet the threshold $\rho$ , it is inevitable that some of the removed nodes could be reinserted into the graph without bringing any increment to $\lvert G_{\textit{lcc}}\rvert$ , even if they have to be removed in the initial stage. Thus, all existing ND algorithms will try to reinsert nodes at the last step to minimize their final $\lvert S^{\mathit{ND}}\rvert$ .

Unfortunately, it is impossible to require ND algorithms to present a set consisting of exactly $\lambda$ key nodes. On one hand, it is a collective effect to dismantle a network, excluding an arbitrary identified node would fail to accomplish the task. On the other hand, due to the reinsertion, all nodes remain removed are equally important, leading that to be unreasonable and inappropriate to rank nodes identified by ND algorithms. Therefore, when required to present a set of key nodes, ND algorithms would only give a set of $\lambda=f^{\mathit{ND}^{*}}(\rho,G)$ nodes, which means the number of key nodes is determined by a certain algorithm $\mathit{ND}^{*}$ , a threshold $\rho$ , and the network $G$ together.

3.5 Discussion

NR, IM and ND all intend to identify key nodes in a network with their own motivation and focus. In summary, NR aims to acquire the ranking of single node influence for all nodes (especially high influential nodes) whether actually calculates them or not. IM aims to find a given number of nodes whose co-acting influence spread would be maximal. ND tends to disintegrate a network by removing a minimal set of nodes from it. The features of these three key node sets are compared in the following Table 1.

Table 1
Features of key nodes identified from three tasks

Task	Ordering	Amount	Correctness	Spreading model
NR	Ordered	$N$	Available	Dependent
IM	Ordered	$k$	Unavailable	Dependent
ND	Unordered	$f^{\mathit{ND}^{*}}(\rho,G)$	Unavailable	Independent

First of all, key nodes identified by NR and IM are ordered while ND is not. Secondly, the number of key nodes is totally different under the three tasks. Thirdly, the correctness of key nodes identified by an NR algorithm is available while IM and ND are not. At last, NR and IM rely on the propagation model while ND is independent of the spreading model. Thus, building a fair and general comparison framework under which different kinds of key node sets could be evaluated together is a challenging work.

4. Comparison framework

In this section, we present our comparison framework which fairly evaluates a key node set $S^{*}$ from different aspects. We collect a group of crucial metrics from the three popular research fields (NR, IM and ND) respectively with a uniform comparing size $\lambda$ . The evaluating metrics collected in our general comparison framework are summarized in Fig. 2.

Figure 2.

General Comparison Framework. $S^{*}$ denotes a key node set for comparison. $\pi_{\beta}(S)$ denotes a top section of an ordered node set $S$ with the first $\beta$ nodes. $\lambda$ denotes the uniform comparing size. All the metrics have been used in experiments, whose results correspond successively to Table 3, Fig. 3, Table 4, Fig. 4, Tables 5 and 6.

4.1 NR relevant metrics

First, regarding the task of node ranking, key nodes of NR are viewed as the baseline. Since key nodes from ND have no ordering, i.e. all of them are almost equally important, it is only fair to include all of them in comparison. Thus, we use Jaccard index instead of the $\tau$ metric to evaluate the ability of a method that identifies high influential individuals. Let $S^{*}$ denote a key node set for comparison and $S^{\mathit{NR}}$ denote the key nodes identified by an NR algorithm, we calculate Jaccard Index between them by

$\displaystyle J(S^{*})=\frac{\pi_{\lambda}(S^{*})\cap\pi_{\lambda}(S^{\mathit{% NR}})}{\pi_{\lambda}(S^{*})\cup\pi_{\lambda}(S^{\mathit{NR}})}$ (4)

where $\pi_{\beta}(S)$ denotes a top section of an ordered node set $S$ with the first $\beta$ nodes.

Besides, we also test the top NR frequency for a key node set. Specifically, for certain proportions (e.g. $x=$ 1%, 5%, 10% and 20%) of top NR nodes, we check how many of them also appear in a key node set $S^{*}$ as follows.

$\displaystyle\textit{topf}_{x}(S^{*})=\lvert\{v|v\in\pi_{xN}(S^{\mathit{NR}})% \wedge v\in\pi_{\lambda}(S^{*})\}\rvert.$ (5)

Though both Jaccard Index and top NR frequency focus on the union part of two sets, they calculate the similarity from different aspects. Jaccard Index generally evaluates the similarity between $S^{*}$ and NR nodes under the same comparing size $\lambda$ , while top NR frequency gradually evaluates the number of top influential nodes identified by NR that appears in $S^{*}$ . Also, all $S^{*}$ and NR nodes share the same size $\lambda$ when calculating Jaccard Index. However, the NR nodes involved in calculating the top NR frequency is determined by $N$ of the testing network and could be either greater or less than the comparing size $\lambda$ .

4.2 IM relevant metrics

For the task of influence maximization, influence spread should be undoubtedly involved in comparison. Similarly, for each network, top $\lambda$ section of the nodes, i.e. $\pi_{\lambda}(S^{*})$ are used to simulate the influence spread (IS) as follows.

$\displaystyle IS(S^{*})=\sigma(\pi_{\lambda}(S^{*})).$ (6)

Moreover, we also concern the relations within a key node set. The relations refer to the number of edges existing between these nodes. As rich-club effect (a small number of nodes with large numbers of links are very well connected to each other) [42] is vital to the IM task, it makes great sense to check the relations and how they affect the influence spread. The relations in a key node set $\textit{rela}(S^{*})$ is defined as follows.

$\displaystyle\textit{rela}(S^{*})=\lvert\{(v,u)|v,u\in\pi_{\lambda}(S^{*})% \wedge(v,u)\in E\}\rvert.$ (7)

4.3 ND relevant metrics

In view of the network dismantling task, we check the ability of a key node set to dismantle networks, namely, we check the size of the remaining largest connected component when top $\lambda$ of a node set, i.e. $\pi_{\lambda}(S^{*})$ are removed from a network. Let $G\{V-S\}$ denote the remaining network after removing all nodes in $S$ from $G$ , we evaluate the dismantling ability (DA) of a key node set $S^{*}$ by

$\displaystyle DA(S^{*})=\lvert G\{V-\pi_{\lambda}(S^{*})\}_{\textit{lcc}}% \rvert/N.$ (8)

The smaller the $DA(S^{*})$ is, the better they dismantle a network.

Since reinsertion is a critical step for ND algorithms, we also keep removing nodes from the top in $S^{*}$ until $|G_{\textit{lcc}}|$ reaches the threshold $\rho$ and then utilize a greedy strategy to reinsert nodes. Let $\textit{reinsert}(\cdot)$ denote the number of reinserted nodes by a greedy strategy, we calculate the minimum removal (MR) of $S^{*}$ as

$\displaystyle MR(S^{*})=\mathop{\arg\min}_{\lvert G\{V-\pi_{\chi}(S^{*})\}_{% \textit{lcc}}\rvert<\rho}\chi-\textit{reinsert}(\pi_{\chi}(S^{*})).$ (9)

5. Experiments

5.1 Datasets and algorithms

In this subsection, we introduce twenty testing networks with different sizes and categories adopted for experiments. The summary of these datasets is shown in Table 2. All networks are pre-processed and their largest connected component is used instead of the original networks since an initially fragmented network would greatly reduce the significance of the ND task. The numbers of nodes and edges are represented by $N$ and $M$ . $D_{\textit{avg}}$ denotes the average degree. $\rho$ is the dismantling threshold for the network, which is set to $\lceil\sqrt{N}\rceil$ . As $\rho$ is settled, we simplified $f^{\mathit{ND}^{*}}(\rho,G)$ to $\psi^{\mathit{ND}^{*}}$ to represent the number of key nodes identified by a certain ND algorithm. The Category indicates the source type of a network.

Table 2
Summary of twenty testing networks

Net	$N$	$M$	$D_{\textit{avg}}$	$\rho$	Category
Dolphins (DP)	62	159	5.13	8	Animal
Polbooks (PB)	105	441	8.40	11	Social
Football (FB)	115	613	10.66	11	Sport
Airline (AL)	332	2126	12.81	11	Infrastructure
NetScience (NS)	379	914	4.82	20	Collaboration
USAir (UA)	500	2980	11.92	23	Infrastructure
Email (EM)	1133	5451	9.62	34	Communication
Bible (BB)	1707	9059	10.61	42	Lexical
Hamster (HS)	1788	12476	13.96	43	Social
Yeast (YE)	2224	6609	5.94	48	Biological
CaGrQc (CG)	4158	13422	6.46	65	Collaboration
Gnutella06 (GT)	6192	15757	5.09	79	Computer
CaHepTh (CH)	8638	24806	5.74	93	Collaboration
PGP (PP)	10680	24316	4.55	104	Social
CaCondMat (CC)	21363	91286	8.55	147	Collaboration
CAIDA (CA)	26475	53381	4.03	163	Computer
IntTopo (IT)	34761	107720	6.20	187	Computer
Brightkite (BK)	56739	212945	7.51	239	Social
WordNet (WN)	145145	656230	9.04	381	Lexical
Douban (DB)	154908	327162	4.22	394	Social

•

(i) Dolphins (DP) [43]: This is a network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand.

•

(ii) Poolbooks (PB):1

http://www.orgnet.com/divided2.html.

This is a frequent copurchasing network about US politics books published around the time of the 2004 presidential election and sold by the Amazon.com.

•

(iii) Football (FB) [44]: This is a network of American football games between Division IA colleges during regular season Fall 2000.

•

(iv) Airline (AL):2

http://vlado.fmf.uni-lj.si/pub/networks/data.

This is the air traffic network of flights between US airports in 1997.

•

(v) NetScience (NS) [45]: This is a collaboration network compiled by M. Newman in May 2006, covering scientific collaborations between scientists working on network theory and experiment.

•

(vi) USAir (UA) [46]: This network depicts the air travel connections among 500 US airports with the largest amount of traffic from publicly available data.

•

(vii) Email (EM) [47]: This is an email communication network at the University Rovira i Virgili in Tarragona in the south of Catalonia in Spain.

•

(viii) Bible (BB):3

http://konect.uni-koblenz.de/networks.

This network contains nouns (places and names) of the King James Version of the bible and information about their co-occurrences.

•

(ix) Hamster (HS) ${}^{3}$ : This network depicts the friendships between users of the website Hamsterster.com.

•

(x) Yeast (YE) [48]: This is a biology network consists of protein-protein interactions.

•

(xi) CaGrQc (CG) [49]: This is a collaboration network from the arXiv, covering scientific collaborations between authors of papers submitted to General Relativity and Quantum Cosmology category.

•

(xii) Gnutella06 (GT) [49, 50]: This network is the snapshot of the Gnutella peer-to-peer file sharing network from August 2002 where nodes represent Gnutella hosts.

•

(xiii) CaHepTh (CH) [49]: This is a collaboration network from the arXiv, covering scientific collaborations between authors of papers submitted to High Energy Physics – Theory category.

•

(xiv) PGP (PP) [51]: This is the interaction network of users of the Pretty Good Privacy (PP) algorithm.

•

(xv) CaCondMat (CC) [49]: This is a collaboration network from the e-print arXiv, covering scientific collaborations between authors of papers submitted to Condense Matter category.

•

(xvi) CAIDA (CA) [49]: This is a network of autonomous systems of the Internet connected with each other from the CAIDA project collected in 2007.

•

(xvii) IntTopo (IT) [52]: This is a network of connections between autonomous systems, i.e. collections of connected IP routing prefixes controlled by network operators, of the Internet.

•

(xviii) Brightkite (BK) [53]: This network depicts user-user friendships from Brightkite, a former location-based social network where users share their locations.

•

(xix) WordNet (WN) [54]: This is a lexical network containing words from the WordNet dataset.

•

(xx) Douban (DB):4

⁴

http://socialcomputing.asu.edu.

This is a social network on a Chinese online website www.douban.com.

In order to explore the correlations between key nodes derived from three popular research studies, we choose their representative algorithms with both effectiveness and efficiency. Throughout the whole experiments, ASNI-RR (shorten as SNI) [26] which approximates real single node influence with high probablity is chosen for Node Ranking (NR), IMM [27] which guarantees a $1-1/e-\epsilon$ lower bound as one of the best RIS-based algorithms is chosen for Influence Maximization (IM), and CoreHD (shorten as CHD) [39] which utilizes both fast decycling and high solution quality is chosen for Network Dismantling (ND). In particular, the Shapley centrality implemented via ASV-RR (shorten as SHP) [26] is also included for comparison. Besides, we adopt the Weighted Independent Cascade (WIC) model which needs no parameters to simulate the information dissemination process. Also, since the ND algorithm CoreHD is used in comparison, we set the uniform comparing size $\lambda$ to $\psi^{\mathit{CHD}}$ , if ever involved.

Intuitively, top NR nodes, SHP nodes and IM nodes are all related to the influence spread, which may cause their high similarity. However, top NR nodes usually possess a high degree and thus tend to connect with each other (rich-club effect) while IM nodes focus on collective influence and thus may tend to be scattered. Besides, removal of ND nodes leads to the fragmentation of the whole network and thus they may also scatter throughout the network. Moreover, since the activation of IM nodes causes the maximal of influence spread, the removal of them may also block the spread and thus dismantle a network, and so are the ND nodes.

5.2 From the perspective of NR

Table 3
Jaccard index between key node sets

Net	$\psi^{\textit{CHD}}$	SNI&SHP	SNI&IMM	SNI&CHD	IMM&CHD
DP	20	0.9048	0.4815	0.5385	0.4815
PB	35	0.7949	0.4583	0.6667	0.5909
FB	65	0.7105	0.4444	0.5294	0.4444
AL	52	0.7931	0.4054	0.7049	0.4054
NS	30	0.7143	0.4286	0.5000	0.2766
UA	60	0.7910	0.3793	0.7143	0.3953
EM	330	0.7647	0.4442	0.6667	0.4348
BB	281	0.8013	0.4674	0.7082	0.4635
HS	364	0.7500	0.4359	0.6621	0.4560
YE	330	0.7188	0.4103	0.6667	0.3953
CG	363	0.6315	0.3698	0.5513	0.3297
GT	1049	0.6314	0.5094	0.5610	0.4932
CH	1043	0.6322	0.3598	0.6259	0.3355
PP	469	0.4063	0.2540	0.4889	0.2088
CC	2660	0.6384	0.3578	0.6155	0.3592
CA	453	0.7905	0.6354	0.5784	0.4732
IT	796	0.6311	0.4473	0.5293	0.5076
BK	6335	0.6667	0.3439	0.6437	0.3412
WN	10384	0.8807	0.7300	0.8353	0.6877
DB	6306	0.6363	0.3808	0.6157	0.3657
Mean	1571.25	0.7144	0.4372	0.6201	0.4223

As NR, IM and ND all identify the key nodes in a network, it is intriguing to check the intersections between them. We calculate Jaccard Index for key node sets identified by SHP, IMM and CHD with SNI as the baseline. We also calculate Jaccard Index for IMM and CHD to explore the similarity between IM and ND. The comparing size $\lambda$ is set to $\psi^{\mathit{CHD}}$ and the results are shown in Table 3.

Though SHP evaluates the importance of individuals by expected marginal gain, it still shows a high similarity with SNI. Surprisingly, CHD nodes and SNI nodes are also similar at a certain level while their motivations are irrelevant. Besides, even SNI and IMM are both designed based on the influence spread, there is not so much intersection between them. This somehow reflects that finding influential individuals and maximizing group influence are essentially two different tasks. Also, IMM nodes and CHD nodes are distinct from each other the most even they are similar intuitively.

Next, we calculate the top NR frequency for $\lambda$ of SHP, IMM and CHD nodes, i.e. how many of them actually appear in certain portions of top SNI nodes. Top 1%, 5%, 10% and 20% SNI frequency are computed and shown in the following Fig. 3. The height of bars refers to the relative frequency w.r.t the percentage of top $x$ % SNI nodes that appear in other key node sets.

Figure 3.

Top SNI frequency of SHP, IMM and CHD.

Several findings can be taken from the above results. First, SHP nodes still show a high level of similarity with SNI nodes. Second, CHD nodes appear more frequently in top SNI nodes than IMM nodes, which implies that the nodes requested removal to dismantle a network are usually located in top ranks. At last, even provided with an adequate amount of top SNI nodes (e.g. top 10% for FB, UA, CG and BK, top 20% for AL, EM, YE and CC), the main body of IMM nodes is still unable to be covered. This also reveals that IM concerns more about how much marginal gain could an individual bring rather than how much influence spread it could achieve by itself.

5.3 From the perspective of IM

In view of IM, we first run the Monte-Carlo simulations 10,000 times to approximate the influence spread of each key node set with a size of $\lambda$ on all testing networks. Results are shown in Table 4.

Table 4
Influence spread of key node sets

Net	$\psi^{\mathit{CHD}}$	SNI	SHP	IMM	CHD
DP	20	46.72	46.72	48.55	46.94
PB	35	76.02	76.85	76.59	77.06
FB	65	92.95	93.65	94.37	94.90
AL	52	231.43	238.89	240.40	223.87
NS	30	164.92	171.0	176.46	153.28
UA	60	318.29	346.95	355.15	325.93
EM	330	828.68	857.10	870.75	836.98
BB	281	1044.20	1081.00	1110.87	1062.14
HS	364	1234.85	1304.11	1326.23	1242.61
YE	330	1515.55	1617.99	1677.70	1511.41
CG	363	1847.66	2041.06	2187.92	1862.75
GT	1049	4552.40	4921.70	4964.00	4505.02
CH	1043	4447.74	4960.34	5214.10	4459.87
PP	469	3560.77	4529.36	4733.68	3488.61
CC	2660	11053.58	12179.21	12673.13	11234.27
CA	453	18258.26	18564.54	18657.16	17724.93
IT	796	23723.64	24787.36	25038.92	24059.08
BK	6335	35452.11	38212.77	39562.94	35642.77
WN	10384	74056.67	79787.62	82131.92	72877.41
DB	6306	130663.66	134489.71	136672.39	122317.24

Undoubtedly, IMM nodes achieve the best influence spread as that is what IM algorithms designed to maximize. Besides, due to that Shapley centrality measures the expected marginal contributions of an individual over arbitrary permutations, SHP nodes perform much better than SNI and CHD nodes. SNI and CHD are equally poor in influence spread even though their nodes possess different characteristics. Top SNI nodes possess high individual influence while CHD nodes are comparatively scattered in the networks. We may infer that just having the high individual influence alone or the scattered distribution of a key node set alone is not sufficient for maximizing the influence spread. Both are needed.

Then, we look into the relations within each key node set (i.e. the number of edges that connect these key nodes, as calculated in Eq. (7)) and how it affects the influence spread. The more relations in a node set, the more rich-club effect it may bring, and thus reduce the influence spread. For easier comparison, we show the ratios rather than the actual numbers of relations within SNI, SHP and CHD nodes to that within IMM nodes.

Figure 4.

Ratios of relations between key node sets.

Obviously in Fig. 4, IMM nodes have much fewer relations than SNI, SHP and CHD nodes. The average ratios of SNI/IMM, SHP/IMM and CHD/IMM are 2.59, 1.83 and 1.85 respectively. SNI nodes have the most relations and it somehow explains why they act worse in terms of influence spread even though SNI is designed on the basis of single node influence. However, the relations in SHP and CHD nodes are rather close while SHP is much better than CHD in terms of the influence spread. It has strengthened our inference that single node influence and node distribution are both important for IM, lacking either of them will fail in this task.

5.4 From the perspective of ND

The dismantling ability of SNI, SHP, IMM and CHD nodes are checked and shown in Table 5 with a size $\lambda$ . The smaller their result is, the better they could dismantle a network.

Table 5
Dismantling ability of key node sets

Net	SNI	SHP	IMM	CHD	$\rho/N$
DP	0.3387	0.3710	0.3065	0.1129	0.1270
PB	0.1714	0.1905	0.5810	0.0952	0.0976
FB	0.4348	0.4348	0.4348	0.0870	0.0933
AL	0.2259	0.1325	0.3976	0.0542	0.0549
NS	0.1187	0.1293	0.5462	0.0475	0.0514
UA	0.1100	0.1020	0.3380	0.0440	0.0447
EM	0.4643	0.4342	0.4687	0.0291	0.0297
BB	0.4259	0.4265	0.5618	0.0240	0.0242
HS	0.4167	0.4077	0.4614	0.0235	0.0236
YE	0.4056	0.3862	0.4186	0.0211	0.0212
CG	0.4556	0.4162	0.4830	0.0042	0.0042
GT	0.4524	0.3545	0.3585	0.0126	0.0127
CH	0.5295	0.5058	0.5627	0.0107	0.0108
PP	0.5141	0.4460	0.4969	0.0096	0.0097
CC	0.5868	0.5481	0.6156	0.0068	0.0068
CA	0.2217	0.2369	0.2770	0.0061	0.0061
IT	0.3252	0.2517	0.2944	0.0053	0.0054
BK	0.4556	0.4162	0.4830	0.0042	0.0042
WN	0.4481	0.3903	0.5112	0.0026	0.0026
DB	0.1504	0.1271	0.1387	0.0025	0.0025
Mean	0.3626	0.3354	0.4368	0.0302	0.0316

Predictably, CHD is the best at dismantling a network as that is what ND algorithms designed for. Surprisingly, SHP beats both SNI and IMM in terms of the dismantling ability yet SHP nodes have shown to be similar with SNI nodes (regarding Jaccard index and top SNI frequency) and IMM nodes (regarding influence spread). IMM nodes are worst at this task, which once again reveals that IM and ND are totally different even that their tasks are similar intuitively.

The dismantling ability of CHD nodes in Table 5 is far better than other methods. The reason is that CHD contains the reinsertion step, which is a critical step for the ND task. Therefore, it is only fair for SNI, SHP and IMM to dismantle a network with the reinsertion step just like CHD. Thus, we check the minimum removal for SNI, SHP and IMM. Practically, we keep removing top SNI, SHP and IMM nodes until $\lvert G\{V-\pi_{\chi}(S^{*})\}_{\textit{lcc}}\rvert$ reaches the threshold $\rho$ . Then, a greedy strategy is adopted to reinsert nodes as many as possible. Results are shown in Table 6.

Table 6

Minimum removal of key node sets

Net	SNI	SHP	IMM	CHD	RBW
DP	19	19	19	20	20
PB	33	33	34	35	32
FB	68	66	70	65	62
AL	51	52	75	52	54
NS	25	26	32	30	27
UA	60	63	94	60	58
EM	329	330	372	330	321
BB	282	281	326	281	276
HS	380	369	448	364	346
YE	331	333	377	330	316
CG	350	359	378	363	347
GT	1107	1071	1133	1049	1074
CH	1035	1012	1162	1043	991
PP	457	442	553	469	405
CC	2633	2642	3011	2660	2525
CA	436	436	477	453	424
IT	774	774	815	796	752
BK	6299	6323	7011	6335	6017
WN	10249	10275	10869	10384	9851
DB	6322	6353	6335	6306	6172

In Table 6, the number indicates how many nodes that need to be removed to dismantle the network with the reinsertion step. The less the number, the better the performance. Though with reinsertion, IMM is still worse at this task. SNI and SHP show good performances and they are sometimes even better than CHD. We also include another ND algorithm, i.e. Recalculated Betweenness (RBW) [38] which achieves the smallest minimum removal in comparison. The RBW has shown to be superior among ND algorithms while rather time consuming [20]. As already revealed by Jaccard Index, NR nodes are more similar to ND nodes, and thus dismantling a network by removing top SNI and SHP nodes is kind of feasible.

5.5 Further insight

Overall, several significant findings could be drawn from the whole experiment above. First, NR only focuses on the evaluation of every single individual while IM and ND both tend to achieve different kinds of collective effects. The nature of these two kinds of tasks is basically different. Second, though NR and IM are both designed to pursue the influence spread, these two key node sets are not that similar. Merely identifying influential nodes makes no sense for IM task as IM actually concentrates on the marginal gain of influence spread, which may result in a scattered node distribution. Third, though both related to information dissemination, IM and ND are unable to swap their jobs. IM nodes are worse at dismantling a network and ND nodes are also not good at maximizing the collective influence. At last, Shapley centrality shows a surprising capacity that SHP nodes are similar with SNI, IMM and CHD nodes simultaneously, and they could be applied to all of the three tasks with good performances.

Intrigued by these phenomena, we are curious about what causes all of the above. Thus, in this subsection, we focus on these key node sets themselves rather than their devoted tasks. Some more experiments are conducted to further explore the features of these different key node sets, including degree distribution, subgraph feature, and key nodes visualization.

5.5.1 Degree distribution

As the degree is practically the most basic and important centrality, we investigate the average degree of key node sets identified by SNI, SHP, IMM and CHD. For each network, we also select $\psi^{\mathit{CHD}}$ largest degree nodes to serve as the upper bound and compute the ratio of the average degree of those key nodes to the corresponding upper bound. Besides, we are also interested in the neighbors of these key nodes that could reflect whether these key nodes tend to link to high-degree nodes. Therefore, for each key node, we first calculate the average degree of all its neighbors and then compute their mean value for comparison. Let $D_{\textit{avg}}(\textit{key})$ denote the average degree of a key node sets, $D_{\textit{avg}}(\textit{max})$ denotes the upper bound, and $D_{\textit{avg}}(\textit{nei})$ denotes the mean value w.r.t the average degree of neighbors linked to each key node set. We show the corresponding results in Fig. 5.

Figure 5.

Distribution of average degree for key nodes.

The distribution of average degree for these key nodes on 20 networks shows a strong regularity. Apparently, the $D_{\textit{avg}}(\textit{key})$ follows SNI $>$ SHP $>$ CHD $>$ IMM and the $D_{\textit{avg}}(\textit{nei})/D_{\textit{avg}}(\textit{key})$ almost follows SNI $>$ CHD $>$ SHP $>$ IMM. It is not surprising to see that the degree of SNI nodes is the highest (very close to the upper bound) and the degree of IMM nodes is the lowest. However, since SHP is designed based on the marginal gain of influence spread and shows good performance in terms of IM task, it is rather unexpected that the degree of SHP nodes comes the second, far above IMM nodes. Compared with these key node sets, the average degree of their neighbors are pretty close. The neighbors of SNI nodes are slightly higher than the other three in $D_{\textit{avg}}(\textit{nei})/D_{\textit{avg}}(\textit{key})$ , which shows that SNI nodes tend to link to each other, resulting in a high degree for both themselves and their neighbors. Besides, even though the $D_{\textit{avg}}(\textit{nei})/D_{\textit{avg}}(\textit{key})$ of IMM is close to SHP and CHD, the $D_{\textit{avg}}(\textit{key})$ of IMM is much lower, which reveals that the average degree of their neighbors is also lower. According to Fig. 5, SHP nodes and CHD nodes are rather similar while they act distinctly in the former IM relevant experiments.

Whether for these key node sets themselves or for their neighbors, the differences between SNI, SHP, IMM and CHD are not that large. The reason may be that there is a bunch of high-degree nodes that are identified as key nodes in all four methods. As we are more interested in the differences between these different key node sets, we calculate the overlapping part of these four sets and then separate them from each set. Similar to Fig. 5, we check both the average degree and the mean value of the neighbors’ average degree for non-overlapping key nodes in each method. For better comparison and analysis, we also check these two measures for the overlapping part. Corresponding results are shown in Fig. 6, where OVP represents the overlapping nodes.

Figure 6.

Distribution of Average Degree for Non-overlapping Key Nodes. OVP denotes the overlapping part identified by SNI, SHP, IMM and CHD simultaneously. The $D_{\textit{avg}}(\textit{max})$ for key node sets is the average degree of $\psi^{\mathit{CHD}}-\lvert\mathit{OVP}\rvert$ nodes whose degree is as large as possible when OVP nodes are excluded. The $D_{\textit{avg}}(\textit{max})$ for OVP nodes is the average degree of $\lvert\mathit{OVP}\rvert$ nodes whose degree is as large as possible without any limitation.

In Fig. 6, the differences between the four key node sets become more evident when the overlapping part is excluded. The average degree of these key nodes still follows SNI $>$ SHP $>$ CHD $>$ IMM while the distribution of $D_{\textit{avg}}(\textit{nei})/D_{\textit{avg}}(\textit{key})$ turns comparatively complex. Compared with Fig. 5a, the $D_{\textit{avg}}(\textit{key})/D_{\textit{avg}}(\textit{max})$ of these four key node sets all drops, especially for IMM. SNI shows the least decreasing after removing overlapping nodes, implying that SNI nodes are mostly composed of high-degree nodes. SHP is still above CHD, exhibiting a relatively higher degree distribution. This is also unexpected since that IMM, another marginal gain based method, drops sharply in its nodes’ average degree. Compared with Fig. 5b, the $D_{\textit{avg}}(\textit{nei})/D_{\textit{avg}}(\textit{key})$ of all four key node sets increase notably. The reason is that their overlapping part consists of mostly high-degree nodes, whose exclusion greatly reduces the $D_{\textit{avg}}(\textit{key})$ . Besides, the remaining nodes may also link to these excluded high-degree nodes, which keeps the $D_{\textit{avg}}(\textit{nei})$ still in a high level. According to Fig. 6, the average degree of OVP (overlapping) nodes is rather high (only lower than SNI nodes) while that of their neighbors is extremely low. These results indicate that the overlapping part of four key node sets is those comparatively high-degree nodes which are not frequently linked to each other.

In addition, we calculate the mean value of the two measures on twenty networks. Results are shown in Table 7. The “Case” indicates that whether the overlapping part is excluded from the key node sets (“OVP” means the overlapping part is excluded and “NOVP” means the opposite). $D/D_{\textit{max}}$ and $D_{\textit{nei}}/D$ are short for $D_{\textit{avg}}(\textit{key})/D_{\textit{avg}}(\textit{max})$ and $D_{\textit{avg}}(\textit{nei})/D_{\textit{avg}}(\textit{key})$ respectively.

Table 7

Mean distribution of average degree for key nodes on all networks

Case	Measure	SNI	SHP	IMM	CHD	OVP
NOVP	$D/D_{\textit{max}}$	0.9840	0.9345	0.8094	0.9198	/
	$D_{\textit{nei}}/D$	0.9261	0.8160	0.7816	0.8756	/
OVP	$D/D_{\textit{max}}$	0.9734	0.8557	0.4994	0.8006	0.9065
	$D_{\textit{nei}}/D$	1.7620	1.4444	1.8007	1.7920	0.6087

Generally, these mean results are in accordance with our former observation. When the overlapping part is not excluded (first two lines in Table 7), SNI shows the highest value in both two measures, proving that most SNI nodes are these clustered high-degree nodes in the networks. In contrast, IMM is the lowest in both. Compared with CHD, the degree of SHP nodes is even higher while the degree of their neighbors is somehow lower. Hence, there are more high-degree nodes in SHP while these nodes may not be so clustered. When only concentrating on the non-overlapping key nodes (last two lines in Table 7), the $D/D_{\textit{max}}$ all drops. Though with a slight decrease, SNI is still the highest. OVP reaches 0.9065, higher than SHP, CHD and IMM. As stated before, IMM drops sharply, even lower than 0.5. It reveals that a high degree is not necessary for Influence maximization task. In terms of $D_{\textit{nei}}/D$ , SHP is rather abnormal (only about 1.4) since this measure of other key node sets are all around 1.8. We assuming the reason behind this is that SHP nodes also do not tend to link with each other just like IMM nodes. Although IMM reaches 1.8007, it is mainly caused by their comparatively low degree. Actually, the $D_{\textit{nei}}/D_{\textit{max}}$ of IMM is 0.8993, still much lower than that of SHP (1.2360).

5.5.2 Subgraph feature

For SNI, SHP, IMM and CHD, we extract their corresponding subgraphs constituted of only key nodes and edges between them. Therefore, on each network, the number of nodes in four subgraphs is the same while the number of edges varies.

First, to check the clustering level of different key nodes, we count the number of triangles in these subgraphs. As already shown in Fig. 4, the SNI nodes have the most relations. Thus, analogously, we calculate the ratios of triangle number within SHP, IMM, CHD to that within SNI. Results are shown in Fig. 7.

Except for Dolphins network, SNI always has the largest number of triangles, showing a high clustering structure. As previously inferred, IMM nodes are scattered and relatively lower in degree. Thus the number of triangles in IMM is always the smallest (averaging 0.2791). Since the number of relations in CHD is a little more than that in SHP, it is interesting to see that the number of triangles in SHP (averaging 0.6765) is slightly bigger than that in CHD (averaging 0.6101) in most cases. As shown in Table 7, although SHP nodes have a higher average degree than CHD nodes, their neighbors are relatively lower in degree than the neighbors of CHD, which leads to our previous assumption that SHP nodes do not tend to link with each other. In summary, SHP nodes have a high average degree, fewer edges but with more triangles, and lower-degree neighbors. We thus infer that SHP nodes consist of two parts, one is these nodes that are relatively clustered in the high-degree area of a network, and the other one is some high-degree nodes that are rather isolated. This could explain even with a high degree and a larger number of triangles, SHP nodes still possess lower $D_{\textit{nei}}/D$ and could perform well in the IM task.

Table 8
Number of connected components in key-node subgraphs

Net	$\psi^{\mathit{CHD}}$	SNI	SHP	IMM	CHD
DP	20	1 (1.000)	1 (1.000)	6 (0.450)	3 (0.750)
PB	35	1 (1.000)	1 (1.000)	3 (0.943)	1 (1.000)
FT	65	1 (1.000)	1 (1.000)	1 (1.000)	1 (1.000)
AL	52	1 (1.000)	2 (0.981)	4 (0.942)	1 (1.000)
NS	30	1 (1.000)	2 (0.967)	14 (0.200)	7 (0.400)
UA	60	1 (1.000)	1 (1.000)	8 (0.883)	1 (1.000)
EM	330	1 (1.000)	1 (1.000)	11 (0.970)	1 (1.000)
BB	281	1 (1.000)	2 (0.996)	15 (0.943)	3 (0.993)
HS	364	1 (1.000)	8 (0.975)	28 (0.918)	3 (0.989)
YE	330	3 (0.994)	10 (0.970)	64 (0.797)	7 (0.982)
CG	363	5 (0.989)	27 (0.920)	119 (0.620)	18 (0.942)
GT	1049	3 (0.998)	17 (0.985)	54 (0.947)	39 (0.962)
CH	1043	13 (0.987)	43 (0.952)	250 (0.717)	45 (0.954)
PP	469	13 (0.966)	92 (0.759)	223 (0.486)	42 (0.902)
CC	2660	9 (0.997)	56 (0.976)	374 (0.842)	32 (0.987)
CA	453	4 (0.993)	6 (0.989)	15 (0.969)	10 (0.980)
IT	796	3 (0.997)	6 (0.994)	25 (0.969)	5 (0.995)
BK	6335	4 (0.998)	101 (0.982)	1135 (0.811)	46 (0.993)
WN	10384	38 (0.995)	164 (0.979)	1033 (0.888)	150 (0.978)
DB	6306	16 (0.998)	31 (0.995)	264 (0.958)	7 (0.999)

Figure 7.

Distribution of triangles in key-node subgraph.

Then, to further explore the topology structure of these subgraphs, we calculate the number of connected components in each subgraph as well as the size of their largest one. Results are shown in Table 8. The digit denotes the number of connected components and the fraction in brackets stands for the ratio of the size of the largest connected component towards $\psi^{\mathit{CHD}}$ (size of the subgraphs).

As previously pointed out, SNI nodes are those clustered high-degree nodes and IMM nodes are rather scattered. Therefore, the number of connected components in SNI subgraphs is the smallest and that in IMM subgraphs is the biggest. Also, the largest connected component in SNI subgraphs is pretty huge, almost always bigger than 0.99. In contrast, the largest connected component in IMM subgraphs is smaller. For SHP and CHD, the size of their largest connected components is close. However, in most networks, the number of components in SHP subgraphs is bigger than (or at least equal to) that in CHD subgraphs. It agrees with our inference that SHP nodes consist of a majority of clustered nodes and some totally isolated nodes.

5.5.3 Key nodes visualization

Figure 8.

Key nodes visualization on airline network.

To study the distribution pattern of different key nodes in networks, we draw the topology of these networks and mark SNI, SHP, IMM, and CHD nodes with red, purple, green, and cyan. We also painting edges that link these key nodes with corresponding colors. As most of them showing the common patterns, we only exhibit two networks in this subsection, i.e., Airline network (Fig. 8) and CaGrQc network (Fig. 9).

Figure 9.

Key nodes visualization on CaGrQc network.

Some common patterns can be drawn from Figs 8 and 9. First, most key nodes are located in the center areas of the networks. The center area usually consists of many clustered high-degree nodes. Second, SNI nodes are heavily clustered while IMM nodes are much more scattered. Compared with IMM, the distributions of SHP and CHD nodes are more similar to SNI, except that they are relatively sparser in the center areas. Last, although the distributions of SHP and CHD nodes show the most resemblance, there are a number of isolated nodes in SHP located in the outer layer of the network, especially in Fig. 9b.

In summary, NR nodes are mainly these clustered high-degree nodes in the center of networks. The key to node ranking algorithms is thus sorting these high-degree nodes more reasonably. IM nodes are scattered throughout the network and thus there are fewer edges between them. This also causes the average degree of IM nodes much lower since high-degree nodes tend to link with each other. ND nodes are actually clustered rather than scattered, indicating that network dismantling essentially extracts a “core chain” to break a network into pieces rather than finding the so-called “weak nodes” [14]. SHP nodes are quite special. They are similar to ND nodes in the center of networks while some of them are isolated just like IM nodes. Therefore, SHP nodes could simultaneously handle both IM and ND task.

6. Case study

6.1 Dolphins network

In this subsection, we take the Dolphins (DP) network as a specific case study to further analyze these key node sets and their features. Figure 10 shows the network topology as well as the distribution of different key node sets.

Figure 10.

Key Nodes in Dolphins Network ( $\psi^{\mathit{CHD}}=20$ ). (a) Top 20 NR key nodes identified by SHP. Nodes in pink are identified simultaneously by SHP, SNI, IMM and CHD. Nodes in orange are identified by SHP and SNI. Nodes in yellow are identified by SHP, SNI and IMM. Nodes in blue are identified by SHP, SNI and CHD. Node in brown is identified by SHP and CHD. (b) Top 20 IM key nodes identified by IMM (c) ND key nodes identified by CHD. (d) Top 5 NR key nodes identified by SNI and SHP. Nodes in purple are identified by SHP. Nodes in red are identified by SNI. Nodes in orange are identified by both SHP and SNI.

As in Fig. 10a–c, top 20 NR key nodes identified by SHP nodes totally overlap with either SNI, IMM or CHD nodes. In comparison, the top 20 IMM nodes are overall scattered while CHD nodes are more clustered in center areas. Due to the complicated network structures, it is unrealistic to cut connections between two parts by only removing a single node. Thus, it has to remove a group of clustered nodes locally to break down the topology. This phenomenon also explains why CHD nodes act badly at influence spread though they are the most similar with SHP nodes.

Although SHP nodes and SNI nodes overlap the most, they are indeed different if their orderings are taken into consideration. As revealed by Fig. 10d, top 5 SNI nodes locate in a limited scope of the network while top 5 SHP nodes are a lot more scattered. This phenomenon explains why SHP nodes could handle IM task well even they show a high similarity with SNI nodes.

6.2 DM collaboration network

In this subsection, we take a collaboration network consisted of 80 experts in Data Mining field as the second case study. These experts are selected according to their h-index from the AMiner5

⁵
https://www.aminer.org/.

and their co-authorship is extracted from the DBLP6

⁶

https://dblp.uni-trier.de/.

by July 8th, 2020. We draw their relationships in Fig. 11. An edge between two authors indicates that they have collaborated at least one academic paper. The size of a node is proportional to its degree.

Figure 11.

The DBLP collaboration network in data mining field.

In Table 9, we show top 10 authors identified by SNI, SHP and IMM respectively. The number in bracket denotes the ranking of an author in terms of degree. There are 80 nodes in this network, and CHD includes 27 of them. Except for Svetha Venkatesh (the 8th author in IMM) and Rakesh Agrawal (the 10th author in SNI), all authors shown in Table 9 are identified by CHD to dismantle this network.

Table 9

Top 10 authors from DBLP network identified by SNI, SHP and IMM

SNI	SHP	IMM
Jian Pei (1)	Jian Pei (1)	Jian Pei (1)
Philip S. Yu (3)	Philip S. Yu (3)	Jiawei Han (2)
Jiawei Han (2)	Jiawei Han (2)	Philip S. Yu (3)
Haixun Wang (4)	Qiang Yang (8)	Qiang Yang (8)
Charu C. Aggarwal (5)	Haixun Wang (4)	Johannes Gehrke (32)
Qiang Yang (8)	Jeffrey Xu Yu (7)	Haixun Wang (4)
Bing Liu (6)	Charu C. Aggarwal (5)	Ee-Peng Lim (26)
Jeffrey Xu Yu (7)	Bing Liu (6)	Svetha Venkatesh (80)
Xindong Wu (9)	Christos Faloutsos (11)	Jeffrey Xu Yu (7)
Rakesh Agrawal (10)	Xindong Wu (9)	Heikki Mannila (15)

Table 10

Intersection between one set and the union of the others

Net	SHP	SNI	IMM	CHD	$\psi^{\mathit{CHD}}$
DP	20	19	16	18	20
PB	34	33	28	33	35
FT	63	64	52	60	65
AL	52	48	36	45	52
NS	28	30	23	22	30
UA	60	55	40	53	60
EM	327	306	252	291	330
BB	279	265	209	250	281
HS	362	337	282	317	364
YE	329	307	247	283	330
CG	347	314	267	289	363
GT	1044	958	989	924	1049
CH	1003	948	761	872	1043
PP	432	372	347	336	469
CC	2571	2349	1933	2249	2660
CA	449	430	408	348	453
IT	786	639	681	649	796
BK	6195	5680	4433	5372	6335
WN	10012	9262	7774	8687	10384
DB	6306	6225	5855	6037	6306
Mean	0.979	0.917	0.778	0.852	1.000

As shown by the rankings, SNI is essentially resorting top-degree nodes. Actually, the top 15 authors in SNI are also the 15 authors with the largest degree in this network. IMM always selects the node that could bring maximal marginal gain, thus some lower-degree nodes rank even higher than those high-degree nodes. For example, Johannes Gehrke, whose degree ranked 32nd, located at 5th in IMM because he has no collaboration with the first four authors. Also, Ee-Peng Lim, whose degree ranked 26th, located at 7th in IMM because he has only collaborated with Philip S. Yu in the first six authors. Although top 10 authors in SHP are also high-degree nodes, their orders are more in accordance with IMM rather than SNI. The sequence of the top 6 authors in SHP is almost the same as their sequence in IMM. Besides, these top 10 authors in SHP are all included in CHD to accomplish the network dismantling task.

7. SHP node

As revealed by previous experiments, SHP nodes could be applied to NR, IM and ND tasks simultaneously with good performances. The reason may lies that SHP nodes totally overlap with the rest of key node sets. In other words, SHP may select the core parts from the union of them. Inspired by this assumption, we further check how many SHP nodes actually belong to the union of the other key node sets and how frequently that a key node derived from the intersection of SNI, IMM and CHD nodes is also a SHP node. For comparison, we also check these for SNI, IMM and CHD nodes in case that overlapping is a common phenomenon for all key nodes. Results are shown in Tables 10 and 11.

Table 11
Intersection of the other three key node sets

Net	SHP	SNI	IMM	CHD	$\Omega(G)$
DP	10	10	14	13	10
PB	21	22	26	21	20
FT	29	30	40	32	26
AL	28	30	41	30	28
NS	11	11	15	18	11
UA	31	34	48	33	31
EM	177	185	247	197	174
BB	164	172	218	177	162
HS	202	211	267	214	197
YE	172	178	234	188	169
CG	157	164	228	190	154
GT	524	616	613	676	497
CH	468	489	669	538	460
PP	142	158	207	190	142
CC	1222	1319	1762	1375	1206
CA	276	283	304	350	276
IT	443	512	528	492	443
BK	2882	3064	4361	3205	2856
WN	4909	5291	6744	5673	4871
DB	4842	4989	5422	5304	4825
Mean	0.983	0.927	0.733	0.850	1.000

Figure 12.

Correlations between key node sets.

In Table 10, $\psi^{\mathit{CHD}}$ indicates the size of each key node set. The numerical results represent how many nodes in one node set belong to either of the other three sets. The bigger the number, the wider that one set is covered by the others. In Table 11, $\Omega(G)$ is the size of the intersection of all four node sets. The numerical results represent how many nodes are in the intersection of the other three sets, thus they are always greater than or equal to $\Omega(G)$ . The smaller the number, the better that one set covers the intersection of others. The Mean at the bottom of two tables measures the ratio of nodes belonging to the union of the others towards $\psi^{\mathit{CHD}}$ and the ratio of $\Omega(G)$ towards the nodes derived from the intersection of the others. According to the results, if taking a key node from SHP nodes, it would belong to the union of SNI, IMM and CHD nodes with the probability of 97.9%. Also, if taking a key node from the intersection of SNI, IMM and CHD nodes, it would also be a SHP node with a probability of 98.3%. Thus SHP does select the core part from the other three key node sets.

According to the results, we describe the relationship between these four key node sets via a Venn diagram in Fig. 12. Obviously, almost all SHP nodes belong to the shaded part and almost all shaded part with meshes belong to SHP nodes, indicating that SHP nodes are actually a combination of core key nodes from NR, IM and ND tasks.

8. Conclusion

In this paper, we focus on key nodes and their features in social networks. Particularly, three popular research fields that identify key nodes are taken into considerations, namely influential node ranking (NR), influence maximization (IM) and network dismantling (ND). However, we find that non-expert readers may easily confuse between these three close domains and none of the existing studies aim to provide a set of general key nodes. Besides, though many studies have been devoted to finding key nodes in these three fields, the correlations between their identified nodes remain unknown. Thus, we conduct an in-depth survey of key nodes in social networks, so as to make researchers get a clear idea of different key node-relevant tasks and improve their deep understanding of these key nodes. First, we clarify and formalize the three tasks under a uniform standard, as well as summarize the features of their identified key node sets. We then propose a fair comparison system that could generally evaluate a key node set from different aspects. Through comprehensive experiments, we explore correlations between some representative key node sets and give insightful conclusions to the three tasks behind them.

•

NR focuses on evaluating an individual’s independent influence while IM and ND tend to achieve collective effects, which indicates they are basically different. NR nodes mainly consist of these clustered high-degree nodes in the center of networks. The key to NR algorithms is thus sorting these high-degree nodes more reasonably.

•

Although NR and IM both make use of influence spread, it is not appropriate to address influence maximization as finding influential spreaders/nodes. Actually, IM pays much more concern on the marginal contributions of a node, leading to that the influence spread of selecting $k$ seed nodes may not reduce at all even if some high influential individuals are forbidden to be picked.

•

Even IM and ND are similar intuitively and both information dissemination related, i.e. IM aims to promote the dissemination while ND aims to block it, they are completely unable to swap their jobs. IM nodes are scattered through the networks while ND nodes are much more clustered. Actually, ND usually extracts a “core chain” to break a network into pieces rather than finding the so-called “weak nodes”.

Furthermore, we find that key nodes identified by Shapley centrality are practically a combination of core key nodes from NR, IM and ND algorithms. SHP nodes consist of two parts, one is nodes that are relatively clustered in the high-degree areas like NR and ND nodes, and the other one is some high-degree nodes that are rather isolated as IM nodes. Therefore, SHP nodes could be applied to the three tasks simultaneously with good performances.

Footnotes

Acknowledgments

This work was supported by the Fundamental Research Funds for the Central Universities (No. 2022QN1093).

References

Lagrée

Cappé

Cautis

and Maniu

, Algorithms for online influencer marketing, ACM Transactions on Knowledge Discovery from Data (TKDD) 13(1) (2018), 1–30.

Dong

Yang

Zeng

Yuan

Jin

Hung

N.Q.V.

Cong

P.T.

and Zheng

, Misinformation-oriented expert finding in social networks, World Wide Web 23(2) (2020), 693–714.

Zhu

Huang

and Li

, MPPM: Malware propagation and prevention model in online SNS, in: 2014 IEEE International Conference on Communications Workshops (ICC), IEEE, 2014, pp. 682–687.

Fond

T.L.

Neville

and Gallagher

, Designing size consistent statistics for accurate anomaly detection in dynamic networks, ACM Transactions on Knowledge Discovery from Data (TKDD) 12(4) (2018), 1–49.

Domingos

and Richardson

, Mining the network value of customers, in: Proc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2001, pp. 57–66.

Richardson

and Domingos

, Mining knowledge-sharing sites for viral marketing, in: Proc. 8th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2002, pp. 61–70.

Braunstein

Dall’Asta

Semerjian

and Zdeborová

, Network dismantling, Proc. Natl. Acad. Sci. U.S.A. 113(44) (2016), 12368–12373.

Zhang

J.-X.

Chen

D.-B.

Dong

and Zhao

Z.-D.

, Identifying a set of influential spreaders in complex networks, Sci Rep 6 (2016), 27823.

Ullah

and Lee

, Identification of influential nodes based on temporal-aware modeling of multi-hop neighbor interactions for influence spread maximization, Physica A 486 (2017), 968–985.

10.

Rahimkhani

Aleahmad

Rahgozar

and Moeini

, A fast algorithm for finding most influential people based on the linear threshold model, Expert Syst. Appl. 42(3) (2015), 1353–1361.

11.

Wang

Cong

Song

and Xie

, Community-based greedy algorithm for mining top-k influential nodes in mobile social networks, in: Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 1039–1048.

12.

De Salve

Mori

Guidi

Ricci

and Pietro

R.D.

, Predicting Influential Users in Online Social Network Groups, ACM Transactions on Knowledge Discovery from Data (TKDD) 15(3) (2021), 1–50.

13.

Bian

Koh

Y.S.

Dobbie

and Divoli

, Identifying top-k nodes in social networks: A survey, ACM Computing Surveys (CSUR) 52(1) (2019), 1–33.

14.

Morone

and Makse

H.A.

, Influence maximization in complex networks through optimal percolation, Nature 524(7563) (2015), 65–68.

15.

Radicchi

and Castellano

, Fundamental difference between superblockers and superspreaders in networks, Phys. Rev. E 95(1) (2017), 012318.

16.

Shapley

L.S.

, A value for n-person games, Contrib. Theory Games 2(28) (1953), 307–317.

17.

Lü

Chen

Ren

X.-L.

Zhang

Q.-M.

Zhang

Y.-C.

and Zhou

, Vital nodes identification in complex networks, Phys. Rep.-Rev. Sec. Phys. Lett. 650 (2016), 1–63.

18.

Arora

Galhotra

and Ranu

, Debunking the myths of influence maximization: An in-depth benchmarking study, in: Proc. ACM SIGMOD Int. Conf. Manag. Data, 2017, pp. 651–666.

19.

Fan

Wang

and Tan

K.-L.

, Influence maximization on social graphs: A survey, IEEE Trans. Knowl. Data Eng. 30(10) (2018), 1852–1872.

20.

Wandelt

Sun

Feng

Zanin

and Havlin

, A comparative analysis of approaches to network-dismantling, Sci Rep 8(1) (2018), 1–15.

21.

Chen

Lü

Shang

M.-S.

Zhang

Y.-C.

and Zhou

, Identifying influential nodes in complex networks, Physica A 391(4) (2012), 1777–1787.

22.

Lü

Zhou

Zhang

Q.-M.

and Stanley

H.E.

, The H-index of a network node and its relation to degree and coreness, Nat. Commun. 7 (2016), 10168.

23.

Zhou

Lü

and Chen

, Identifying influential spreaders by weighted LeaderRank, Physica A 404 (2014), 47–55.

24.

Wang

Zhao

and Du

, Fast ranking influential nodes in complex networks using a k-shell iteration factor, Physica A 461 (2016), 171–181.

25.

Salavati

Abdollahpouri

and Manbari

, Ranking nodes in complex networks based on local structure and improving closeness centrality, Neurocomputing 336 (2019), 36–45.

26.

Chen

and Teng

S.-H.

, Interplay between social influence and network centrality: A comparative study on shapley centrality and single-node-influence centrality, in: Proc. 26th Int. Conf. World Wide Web, 2017, pp. 967–976.

27.

Tang

Shi

and Xiao

, Influence maximization in near-linear time: A martingale approach, in: Proc. ACM SIGMOD Int. Conf. Manag. Data, 2015, pp. 1539–1554.

28.

Borgs

Brautbar

Chayes

and Lucier

, Maximizing social influence in nearly optimal time, in: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, 2014, pp. 946–957.

29.

Tang

Xiao

and Shi

, Influence maximization: Near-optimal time complexity meets practical efficiency, in: Proc. ACM SIGMOD Int. Conf. Manag. Data, 2014, pp. 75–86.

30.

Kempe

Kleinberg

and Tardos

É.

, Maximizing the spread of influence through a social network, in: Proc. 9th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2003, pp. 137–146.

31.

Chen

Wang

and Yang

, Efficient influence maximization in social networks, in: Proc. 15th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2009, pp. 199–208.

32.

Leskovec

Krause

Guestrin

Faloutsos

VanBriesen

and Glance

, Cost-effective outbreak detection in networks, in: Proc. 13th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2007, pp. 420–429.

33.

Ohsaka

Akiba

Yoshida

and Kawarabayashi

K.-i.

, Fast and accurate influence maximization on large networks with pruned monte-carlo simulations, in: Proc. 28th AAAI Conf. Artif. Intell, 2014.

34.

Chen

Wang

and Wang

, Scalable influence maximization for prevalent viral marketing in large-scale social networks, in: Proc. 16th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2010, pp. 1029–1038.

35.

Galhotra

Arora

and Roy

, Holistic influence maximization: Combining scalability and efficiency with opinion-aware models, in: Proc. ACM SIGMOD Int. Conf. Manag. Data, 2016, pp. 743–758.

36.

Rui

Yang

Fan

and Wang

, A neighbour scale fixed approach for influence maximization in social networks, Computing 102 (2020), 427–449.

37.

Clusella

Grassberger

Pérez-Reche

F.J.

and Politi

, Immunization and targeted destruction of networks using explosive percolation, Phys. Rev. Lett. 117(20) (2016), 208301.

38.

Holme

Kim

B.J.

Yoon

C.N.

and Han

S.K.

, Attack vulnerability of complex networks, Phys. Rev. E 65(5) (2002), 056109.

39.

Zdeborová

Zhang

and Zhou

H.-J.

, Fast and simple decycling and dismantling of networks, Sci Rep 6 (2016), 37954.

40.

Mugisha

and Zhou

H.-J.

, Identifying optimal targets of network attack by belief propagation, Phys. Rev. E 94(1) (2016), 012305.

41.

Kendall

M.G.

, A new measure of rank correlation, Biometrika 30(1/2) (1938), 81–93.

42.

Zhou

and Mondragón

R.J.

, The rich-club phenomenon in the Internet topology, IEEE Commun. Lett. 8(3) (2004), 180–182.

43.

Lusseau

Schneider

Boisseau

O.J.

Haase

Slooten

and Dawson

S.M.

, The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Behav. Ecol. Sociobiol. 54(4) (2003), 396–405.

44.

Girvan

and Newman

M.E.

, Community structure in social and biological networks, Proc. Natl. Acad. Sci. U.S.A. 99(12) (2002), 7821–7826.

45.

Newman

M.E.J.

, Finding community structure in networks using the eigenvectors of matrices, Phys. Rev. E 74(3) (2006), 036104.

46.

Colizza

Pastor-Satorras

and Vespignani

, Reaction-diffusion processes and metapopulation models in heterogeneous networks, Nat. Phys. 3(4) (2007), 276–282.

47.

Guimera

Danon

Diaz-Guilera

Giralt

and Arenas

, Self-similar community structure in a network of human interactions, Phys. Rev. E 68(6) (2003), 065103.

48.

Zhao

Cai

Xue

Zhu

Zhang

Sun

Ling

Zhang

et al., Topological structure analysis of the protein-protein interaction network in budding yeast, Nucleic Acids Res. 31(9) (2003), 2443–2450.

49.

Leskovec

Kleinberg

and Faloutsos

, Graph evolution: Densification and shrinking diameters, ACM Trans. Knowl. Discov. Data 1(1) (2007), 2–es.

50.

Ripeanu

Iamnitchi

and Foster

, Mapping the Gnutella Network, IEEE Internet Comput, 2002, 50–57.

51.

Boguná

Pastor-Satorras

Díaz-Guilera

and Arenas

, Models of social networks based on social distance attachment, Phys. Rev. E 70(5) (2004), 056122.

52.

Zhang

Liu

Massey

and Zhang

, Collecting the Internet AS-level topology, ACM SIGCOMM Comp. Commun. Rev. 35(1) (2005), 53–61.

53.

Cho

Myers

S.A.

and Leskovec

, Friendship and mobility: user movement in location-based social networks, in: Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, ACM, 2011, pp. 1082–1090.

54.

Miller

, WordNet: An electronic lexical database, MIT press, 1998.

An in-depth study on key nodes in social networks

Abstract

Keywords

1. Introduction

2.2 Influence maximization algorithms

2.3 Network dismantling algorithms

3. Tasks and their motivations

3.1 Social network and information propagation model

3.2 Influential node ranking

Table 1 Features of key nodes identified from three tasks

5.1 Datasets and algorithms

Table 2 Summary of twenty testing networks

Table 3 Jaccard index between key node sets

Table 4 Influence spread of key node sets

Table 5 Dismantling ability of key node sets

5.5.1 Degree distribution

Table 8 Number of connected components in key-node subgraphs

6.1 Dolphins network

5 https://www.aminer.org/.

Table 11 Intersection of the other three key node sets

Footnotes

Acknowledgments

References

Table 1
Features of key nodes identified from three tasks

Table 2
Summary of twenty testing networks

Table 3
Jaccard index between key node sets

Table 4
Influence spread of key node sets

Table 5
Dismantling ability of key node sets

Table 8
Number of connected components in key-node subgraphs

⁵
https://www.aminer.org/.

Table 11
Intersection of the other three key node sets