CNLPSO-SL: A two-layered method for identifying influential nodes in social networks

Abstract

In networks, dynamic phenomena such as opinions, behaviors, and information are propagated through connections between entities. Indeed, one of the main issues about a dynamic process is to find a set of individuals with a high influence on other’s decisions which is defined as the “influence maximization” problem, and aims to find a subset of nodes to maximize the total number of adopters at the end of the process.

In this paper, by combining the community structure and influence maximization problem, we proposed a two-layered method for identifying influential nodes so that in the first layer an optimization-based method is applied to detect the potential communities. Then, in the second layer, a criterion is used which is a tradeoff between the low-relevant centralities and methods with high complexity. Our method is implemented on real social networks with different scales, and the performance is evaluated by using the total number of infected nodes at the end of the process. The experimental results indicate the superiority of our method in comparison to other considered approaches by considering the efficiency and scalability.

Keywords

Social network data mining influence maximization problem influential node diffusion model community structure heuristics

1. Introduction

Social networks are an abstract representation of social systems where information and ideas are propagated through the interaction between entities [58]. Moreover, everybody can use a social network based on its interests indicating the flexibility and high potentiality of these networks [23].

One of the most important tasks in analyzing social networks is the detection of community structure [38], which have been used in a wide range of applications [3, 4, 21, 22, 29]. In fact, a community is defined as a group of entities that the number of interactions within the group is much higher than the number of interactions outside the group [35]. Figure 1 represents an example of a social network with 23 nodes divided in 2 communities.

Figure 1.

A social network including 2 communities.

On the other hand, with the rapid development of online social networks such as Facebook, Twitter, YouTube, etc., the influence maximization problem has become one of the main issues in diffusion process. And, it is defined as a way to find influential nodes in order to propagate innovations and information so that at the end of the process a large part of individuals adopt the information (or product) [7]. Meanwhile, influential nodes refer to nodes who have more influence on their neighbor’s behaviors and decisions [5, 35] – in other words, they have proper spreading features in networks [34]. Indeed, studying influential nodes can help us better understand the features of a complex network necessary for controlling actions occurring through these networks [55]. Figure 2 represents a directed social network consisting of 7 nodes where node 2 is considered as an influential node based on degree centrality metric. Although the degree centrality can be defined based on both in-degree and out-degree of a node, but in the given example, just the out-degree of nodes are considered.

Figure 2.

An example of influential node based on degree centrality.

On the one hand, by considering the importance of the community structure, and on the other hand because of the considerable influence of nodes within a community on their neighbor’s decision, it can be a good estimation to choose influential nodes from candidate communities instead of the whole network. So limiting a social network to its proper extracted communities leads to a cost reduction for identifying influential nodes [35, 51].

Thus, in this paper, a two-layered method called CNLPSO-SL is proposed so that in the first layer the communities of a given social network are identified. Then, in the second layer, by applying a semi-local centrality, the influential nodes within candidate communities are detected.

The rest of the paper is organized as follows: Section 2 overviews related works of the influence maximizationproblem. In Section 3, the main concepts of influence maximization are stated. In Section 4, our method is proposed in two subsections: the method description and implementation. In Section 5, datasets, methods, final results of experiments, and the comparison between mentioned methods are presented. Finally, Section 6 concludes the paper.

2. Related works

In recent years, many methods have been proposed for identifying influential nodes in order to solve the influence maximization problem. Richardson and Domingos were the first who studied influence maximization as an algorithmic problem in which customers represent the nodes of the social network, and their influence was modeled as Markov Random field [8]. Then, in [42], they extended and applied their idea, inspired from their previous work, for knowledge-sharing sites. Indeed, they expressed how to find the optimal viral marketing plans, and also to reduce the computational costs. Also, authors in [20] were among the first who formulated influence maximization as a discrete optimization problem. They presented a greedy algorithm which obtains a solution with 63% of optimality for several classes of models. In [20], by applying a hill-climbing strategy, a simple greedy algorithm was proposed for influence maximization problem. In [10], a set greedy algorithm was proposed which deals with overlapping neighbors. In this method, nodes with the highest uncovered neighbors were selected as influential ones, and the process continues until all the K seed nodes were selected. In [25], a Cost Effective Lazy (CELF) algorithm was proposed which used a sub-modularity feature to reduce the number of evaluations in influence diffusion, and two improved versions of this method was proposed by [17, 31]. In [48], a genetic greedy algorithm was proposed which combines the strength of genetic and greedy algorithm. Many other greedy-based approaches were also proposed for influence maximization problem [49, 56, 59].

Furthermore, for detecting influential nodes, many other methods considered the importance of nodes which is a fundamental criterion to specify the structure and also dynamics of complex networks. Degree, betweenness, and closeness centrality are among the widely used criteria in the influence maximization problem [9, 14, 28, 33, 55]. Degree centrality [36] considers the degree of a node to select the influential users. In other words, more the number of the neighboring are, more the important the nodes are [55]. Another widely used criterion is betweenness centrality [11] defined as the number of shortest paths passing through a given node. The closeness centrality [12] is based on the distance between nodes which studies how long it will take information to spread from a given node to others. Authors in [28] proposed a neighborhood centrality which considers the centrality of a node and its neighbor’s. DIL method was proposed by [27] which is based on the degree and the importance of lines, and only the local features of nodes are required to evaluate the importance of them. Moreover, many extended versions of aforementioned centralities have been proposed by researchers [1, 18, 45, 47].

In addition to mentioned methods, some approaches were proposed by considering the community structure. For example, in [2], authors identified existing communities of a given social network by applying Pareto’s Law, and then considered degree, link, and closeness centrality for identifying influential nodes. In [51], CGA algorithm was proposed in which social network communities were selected based on label propagation method. Then, Mix-greedy algorithm was used to select the influential nodes in each candidate community, where was selected by dynamic programming method.

Because optimization-based methods have become one of the most effective ways for solving community detection problem, in our proposed method, CNLPSO-DE was applied to identify the communities of a given network. And also, because large communities include more nodes in comparison to smaller ones, we then considered the size criterion to select the candidate communities, and finally identified the influential nodes based on a semi-local centrality which is a trade-off between the low-relevant centralities and methods with high complexity.

3. Preliminaries

In this section, by assuming $G=(V,E)$ as a graph of a given social network where $V(G)=\{v_{1},v_{2},...,\linebreak v_{n}\}$ and $E(G)=\{e_{1},e_{2},...,e_{n}\}$ represent a finite set of vertices and edges, respectively, so that $e_{i}$ connects each two vertices of the graph, and $n$ is the total number of network’s nodes, some of the initial concepts of influence maximization such as influence graph and diffusion models are described.

3.1 Influence maximization problem

The issue is defined as a discrete optimization problem in which at first an initial “seed set” with cardinality $\left|S\right|=k$ is selected to maximize the number of influenced nodes involved in discussions or adopting a product under a specified diffusion model [19, 30, 50].

Figure 3.

Structure of the influence graph [50].

3.2 Influence graph

The influence graph is a directed graph representing the relation of who influences whom. Figure 3 shows a social influence graph in which the edge $e_{1}$ denotes the influence of user $v_{1}$ on user $v_{2}$ [50] so that it can change the decision of $v_{2}$ in adopting a specified product or an idea. Also numbers inside each parentheses represents the out degree of the node.

3.3 Diffusion models

Diffusion models are essential elements for simulating the propagation process of ideas, products, information, disease, etc. For this purpose, in this section, we describe two of the most widely used diffusion models named independent cascade and linear threshold model. In both cases, a node can be in one of two states either “active” or “inactive”. Indeed, in these models, all the nodes are inactive at the beginning so that after adopting the information the node will become active. Also only the conversion from “inactive” state to “active” state is allowed in these models [58].

3.3.1 Independent cascade (IC) model

In this model, a random value $P_{uv}\in[0,1]$ is assigned to each link $(u,v)$ representing the diffusion probability from node $u$ to $v$ . Given an initial active set of nodes named $A$ , the model acts as follows:

When a typical node $u$ is activated at the first time step $t$ , it has only a single chance for activating each of its inactive neighbors $v$ (for example) with probability $P_{uv}$ . If it succeeds, node $v$ becomes active at time step $t+1$ . Whether $u$ becomes active or not, it has no other chances for further activation at other time steps. Also, if $v$ has multiple parents activated at time $t$ for the first time, then their attempt for activation are sorted in a typical order. The process ends when no more activation is possible. Figure 4 represents the algorithm of this model [57].

Figure 4.

Independent cascade model [57].

Figure 5.

Influence propagation in linear threshold model [24].

3.3.2 Linear threshold (LT) model

In this model, a uniformly distributed threshold $\theta$ and a weight $w$ is assigned to each node and link, respectively, where $\theta$ indicates the amount of the required influence for activating a node, and $w_{ij}$ reflects the strength of the relation. Indeed, the decision of node $i$ for being active depends on the total influence of the weight of its neighbors. If the total weight exceeds the defined threshold $\theta_{i}$ so that $\sum_{j\in N_{i}}{w_{ij}\geqslant\theta_{i}}$ , then node $i$ becomes active and adopts the product [43]. The process ends when no more activation is possible [58]. Also, for all of the nodes, the weight is normalized so that [43, 58]:

$\displaystyle\sum w_{ij}\leqslant 1$ (1)

Figure 5 shows an influence propagation process for LT model. Authors considered a same threshold value for each node $\theta=0.3$ . The propagation starts from node A having nodes B and D as its neighbors. According to the mechanism of linear threshold model, and also by considering the value of $\theta$ , the next influenced node is B.

4. Proposed method

By considering the importance of community structure feature, and its considerable effect on influence maximization issue, in this paper, we proposed a two-layered method, CNLPSO-SL, for detecting influential nodes by combining community detection and influence maximization problem so that in the first layer the existing communities of a given social network are identified, and then in the second layer by applying the size criterion candidate communities were selected. Finally, we used a heuristic based measure for identifying influential nodes within candidate communities. In fact, this section is presented in two sub-sections. First, in Section 4.1, each part of CNLPSO-SL method is described, and then the implementation is given in Section 4.2.

4.1 Method description

In our proposed method, an optimization-based algorithm named CNLPSO-DE was applied for network partitioning. Then, by considering a size criterion, the candidate communities were selected, and the influential nodes were identified by applying a semi-local centrality (within candidate communities). An overall architecture of CNLPSO-SL is represented in Fig. 8 which each of its part is explained in detail as follows.

4.1.1 Network partitioning

Recently, using optimization-based methods have become one of the most effective ways for solving the challenges of community detection [16]. So, in the network partitioning phase, a multi-objective optimization method CNLPSO-DE was applied [39]. In fact, it utilizes some of the benefits of evolutionary algorithms such as strength and flexibility on the one hand, and convergence speed along with high performance of particle swarm optimization method on the other hand. And also, in comparison to single-objective approaches has the following benefits [46]:

•
Detected solutions are always as well as or better than the single objective ones.
•
Evaluates community structure from different aspects.
•
The number of communities can be automatically determined during the process.

The algorithm starts with a population of particles which each particle is a partition $\Omega$ of the network, and represents a solution for the given problem coded as an integer string:

$\displaystyle X=\{x^{1},x^{2},...,x^{N}\}$ (2)

Where $N$ is the number of vertices, and $x^{i}$ is the label of the community which node $i$ belongs to. Also each vertex is labeled by an integer number between 1 to $N$ so that vertices with the same community label are located in one community (Fig. 6).

First of all, particles are initialized by applying an initialization method based on opposition-learning strategy [41]. Then, the particles move toward the search space to find the optimal solution, and update their current states by updating the position and the velocity of particles. Also, in order to maintain the diversity of the population, and escape from the local optima, a mutation operator was applied. The accuracy of the partition was evaluated by the most common multi-objective function (for further details, readers can refer to [39]).
4.1.2 Selecting candidate communities

After detecting communities in the given social network, the main challenge is to select the suitable communities for identifying the influential nodes. One of the criterion for solving the issue is to select large communities because they include a large number of nodes in comparison to smaller communities, and selecting initial nodes within this communities will result in more activation in diffusion process [54].

Figure 6.

Decoding process of a particle [39].

Figure 7.

An example of a network consisting of 23 nodes and 40 edges. Although node 23 has lower degree than node 1, it may have even higher influence [6].

Figure 8.

The architecture of CNLPSO-SL.

4.1.3 Identifying influential nodes within candidate communities by semi-local centrality

Detecting influential nodes is a main challenge in complex networks where a large number of centrality criteria were proposed by researchers. In addition to their application in solving influence maximization problem, they face some challenges [9]. For example, the high-degree centrality is one of the most widely used criterion which considers the number of directly connected links to a node as its degree [27], and also it has a reasonable runtime performance [45]. But, it is still a less relevant measure. In return, global measures such as betweenness and closeness centrality may have a better ability for detecting influential nodes, but because of their high computation complexity they are not suitable for large scale networks [6].

On the other hand, experimental results indicate that measures considering the neighbor of nodes are more accurate for identifying influential nodes [28]. So, we applied a semi-local centrality [6] which is a trade-off between the low-relevant degree centrality and the time-consuming criteria such as betweenness and closeness centrality. Indeed, it outperforms high-degree centrality [32] because of considering more structure information, and also in comparison to closeness centrality has a lower computation complexity [6]. The criterion considers both the nearest and the next nearest neighbor of a given node, and is computed as follows:

$\displaystyle Q(u)=\sum\limits_{w\in\Gamma(u)}N(w)$ (3) $\displaystyle C_{L}(v)=\sum\limits_{u\in\Gamma(v)}Q(u)$ (4)

Where $\Gamma(u)$ is the set of the nearest neighbors of node $u$ and $N(w)$ is the number of the nearest and the next nearest neighbors of node $w$ . Figure 7 represents an example of semi-local centrality.

After detecting candidate communities, semi-local centrality is computed for each of the nodes within candidate communities, and then the computed values are sorted in a decreasing order where the node with the maximum centrality value is selected as the initial active node. After determining the initial active nodes set, by applying IC diffusion model [10, 44, 57], propagation unfolds through the given network.

4.2 Method implementation

In this section, the implementation of the proposed method is presented, consisting of two parts where the overall implementation of the method is presented in the first part, and in the second part the implementation of applied heuristic is proposed.

4.2.1 Implementation of CLPSO-SL approach

Figure 9 represents the pseudocode of CNLPSO-SL where a social network $G$ , the number of seeds $K$ , and the diffusion probability $P$ are given as the inputs of the method. In the first step, CNLPSO-DE method is called to identify the existing communities. Next, communities are sorted in a decreasing order based on their size, and then candidate communities are selected based on the given value of $k$ (if $k=$ 5, then 5 of the biggest communities are selected). In other words, the number of the candidate communities is determined by the value of $k$ . Finally, semi-local centrality is applied for selecting influential nodes is considered, and by applying IC model the diffusion propagates through the network. The total number of influenced nodes at the end of the process is given as the output.

Figure 9.

Pseudocode of CNLPSO-SL algorithm.

Figure 10.

Pseudocode of the heuristic [7].

4.2.2 Implementation of heuristics

Because selecting seeds from large communities in comparison to smaller ones can trigger more adoptions of a product or information, some configurations were adjusted [7] where their basic idea is to use heuristics “ $\textit{add}\_\textit{node}$ ” and “ $\textit{delete}\_\textit{node}$ ” so that “ $\textit{add}\_\textit{node}$ ” refers to nodes selected from large communities, and “ $\textit{delete}\_\textit{node}$ ” from smaller ones. Also a large community is defined as a community with a size larger than $\textit{AvgSC}=\textit{avg}\left({\sum\limits_{i=1}^{K}\textit{size}(sc_{i})}\right)$ .

Indeed, the adjustment is for investigating whether selecting nodes from large communities will result in more influence spread where the influence spread $I_{S}(t)$ is defined as the expected number of influenced nodes at time $t$ . Figure 10 displays the pseudocode of the process in which heuristics “ $\textit{add}\_\textit{node}$ ” and “ $\textit{delete}\_\textit{node}$ ” are indicated by “ $a\_\textit{node}$ ” and “ $d\_\textit{node}$ ”, respectively. According to the pseudocode, after sorting the candidate communities, the second node with the maximum value (based on semi-local centrality) is selected from the first largest community, and the replacement is applied. If the number of influenced node after the replacement is larger than before the replacement, then $d\_\textit{node}$ is replaced with $a\_\textit{node}$ .

Figure 11.

Dolphin social network [15].

5. Experiment results

This section is organized in 5 parts. First, the datasets are described in Section 5.1. Then, the main idea of the comparing methods is explained in Section 5.2. In Section 5.3, the evaluation metric is stated. The results of experiments are presented in Section 5.4. And finally, the discussion is proposed in Section 5.5.

5.1 Dataset

In our experiments, datasets with three different scales (small, medium, large) were used which some of them are ground truth so that the exact number of their communities is specified, such as dolphin social network, but in contrast there exists other datasets with unknown number of communities like Netscience and Hep-physics (Datasets were extracted from http://www-personal.umich.edu/%7Emejn/netdata/).

The first dataset is a network with 62 dolphins (Fig. 11) obtained by Lusseau during 7 years studying the behaviors of dolphins. The network is naturally divided into two communities, and each edge was established based on frequent associations [15, 37].

Netscience, the second dataset, is a network with 1589 nodes where nodes represent scientists, and each edge connects two authors having collaboration on the same article. Indeed, is a network of co-authorship of scientists working on network theory and experiment [16, 46].

The third dataset including 10748 nodes is a weighted network of co-authorship between scientists posting preprints about high-energy theory [18, 35, 46], where an unweighted version of the network was handled in this paper. These datasets have been widely used by researchers in both community detection and influence maximization problem [16, 27, 28, 32, 46, 52], and their details are denoted in Table 1.

Table 1
Dataset description

Dataset	Number of	Number of	Number of
	community	edges	nodes
Dolphin social network	62	159	2
Netscience	1589	2742	unknown
HEP-physics	10748	52992	unknown

5.2 Comparing methods

Degree centrality is one of the widely used heuristics proposed for estimating the influence of a node in complex network in which a node with maximum degree is selected as an influential node. Indeed, it represents a large number of connections between the node and its neighbors, and is defined as follows [13]:

$\displaystyle C_{d}(i)=\left|{NB_{h}(i)}\right|=\sum\limits_{j=1}^{n}a_{ij}$ (5)

Figure 12.

Dolphin network with $p=$ 0.05.

Figure 13.

Dolphin network with $p=$ 0.01.

Where $C_{d}(i)$ indicates the degree centrality of node $i$ , and $\left|{NB_{h}(i)}\right|$ represents the set of neighbors of node $i$ at an h-hop distance where most of the times the distance is equal to 1, $h=$ 1. Also, $n$ is the total number of nodes within a given network, and $a_{ij}$ is defined as Eq. (6):

$\displaystyle a_{ij}=\left\{{\begin{array}[]{ll}1&\textit{if there is a link % between }v_{i}\textit{ and }v_{j}\\ 0&\textit{otherwise}\\ \end{array}}\right.$ (6)

Figure 14.

Net-science network with $p=$ 0.01.

Figure 15.

Netscience network with $P=$ 0.01.

Figure 16.

Hep-physics network with $p=$ 0.01.

Figure 17.

Hep-physics network with $p=$ 0.05.

Figure 18.

Hep-physics network with $p=$ 0.1.

Figure 19.

Hep-physics network with $P=$ 0.15.

This method is widely used as a benchmark by many researchers for identifying influential nodes [9, 26, 27, 28, 32, 40, 53, 55]. The ranking random method was the next method for comparison [2, 6, 43, 45, 52] in which $k$ nodes were selected randomly within the entire population as the seed set (referred as RN in this paper). Finally, we chose semi-local centrality proposed by [6] as the other comparing method to study the difference of results with or without considering the community structure(named SL in our experiments). All the methods are implemented under the same operational environment.

5.3 Evaluation with IC-model

Influence spread is a function for evaluating the performance of methods so that by assuming an initial set of active nodes $A$ the influence spread $R(A)$ is defined as follows:

$\displaystyle R(A)=\frac{V_{A}}{N}$ (7)

Where $V_{A}$ is the total number of nodes influenced by $A$ during the diffusion process [51].

5.4 Results

After network partitioning, selecting candidate communities, and identifying the influential nodes, the diffusion model (IC) is applied in order to propagate the information or product over the network. Eventually, based on the total number of influenced nodes, the influence spread is calculated.

Furthermore, there should be a relation between the number of communities and the given value of $k$ which in our paper $k$ specifies the number of candidate communities. Also, due to the random nature of the diffusion process, the experiments were repeated 100 times, and the average of results was considered as the output.

The results of the experiments are represented from Figs 12 to 19 where x-axis and y-axis denote the number of seeds and the average number of influence spread, respectively.

Our first dataset is the dolphin network divided naturally in two communities while our method had identified 4 potential communities. Thus, we considered the maximum value of $k=$ 4. By considering the diffusion probability $P=$ 0.05, results are denoted from $k=$ 1 to $k=$ 4. As Fig. 12 represents, the random model has the smallest influence spread among the mentioned methods. The results of degree and semi-local centrality are almost similar, and CNLPSO-SL (denoted as C-SL) outperforms other methods.

The next data set is related to a network with medium scale, Netscience, consisting of 1589 nodes. The results are represented in Figs 14 and 15 by considering two different diffusion probabilities: $P=$ 0.01 and $P=$ 0.05. By increasing the number of seeds, a larger value of influence spread is obtained. Also, on this dataset, our method outperforms others by considering both diffusion probabilities. For parameter $k$ , we had set values from 0 to 30 with a span of 5. Furthermore, as it is shown in Figs 14 and 15, by increasing the diffusion probability, more nodes are influenced at the end of the process. In this dataset, the average influence spread is as follows: C-SL $>$ DC $>$ SL $>$ RN.

The last dataset is Hep-physics network with a large scale including 10748 nodes and 52992 edges shown in Figs 16 to 19 while four diffusion probabilities $P=$ 0.01, 0.05, 0.1, 0.15 were considered. By increasing the diffusion probability and also the number of seeds, the influence spread increases. This is obvious because having more initial seed results in more propagation. According to the results, the average influence spread of C-SL (CNLPSO-SL) $>$ DC (Degree centrality) $>$ SL (semi-local centrality heuristic) $>$ RN (Random). Also, in this dataset, degree centrality outperforms semi-local centrality which can be due to the structure of physics network because semi-local centrality is sensitive to the structure of the network.

5.5 Discussion

In this research, because of combining the community detection and influence maximization problems, our method has gained benefits in comparison to other mentioned approaches. First of all, instead of searching the whole network for identifying influential nodes, only candidate communities were studied which leads to the cost reduction while in degree centrality, ranking random method, and semi-local centrality the whole network was searched. Also, our method detects the communities which may trigger more adoption of information or products, and in turn can maximize the influence spread. Besides, as experiment results shown, by increasing the scale of the networks (from a medium-scale network to a large-scale network), our method still maintains its good efficiency, and outperforms the comparing methods so that the scalability can be the next advantage of our proposed method. Furthermore, in order to study whether selecting nodes from large communities will result in more influence spread, some heuristics were considered.

Our method was implemented under IC diffusion model with different diffusion probability, and was run on three real social networks which results shown the good efficiency of CNLPSO-SL in comparison to other mentioned approaches. According to the experiments, the random ranking method has the worst performance in almost all datasets. The results of degree centrality is close to semi-local centrality in most datasets. Nevertheless, the degree centrality performs better than semi-local centrality because of the nature of considered datasets. Indeed, the semi-local centrality is sensitive to the structure of the network, and is more suitable for heterogeneous networks. However, it gained acceptable results in experiments. And, finally, our method by considering the community structure had the best performance on all networks.

6. Conclusion

In this research, by combining community detection and influence maximization, a two-layered method CNLPSO-SL was proposed to identify the influential nodes so that in the first layer an optimization-based algorithm named CNLPSO-DE was applied for network partitioning. Then, the main challenge was to select the candidate ones for identifying influential nodes which the size criterion was considered and communities were sorted in a decreasing order, and then the $k$ biggest communities were selected. Finally, a semi-local centrality was used for identifying influential nodes because it is a trade-off between low-relevant centralities and methods with high complexity. The experiment results on real networks with different scales indicated the good performance and scalability of CNLPSO-SL in comparison to the mentioned methods.

References

Agryzkov

Tortosa

and Vicent

J.F.

, New highlights and a new centrality measure based on the Adapted PageRank Algorithm for urban networks, Applied Mathematics and Computation 291 (2016), 14–29.

Anjerani

and Moeini

, Selecting influential nodes for detected communities in real-world social networks, in: 2011 19th Iranian Conference on Electrical Engineering, 2011, pp. 1–6.

Bedi

and Sharma

, Community detection in social networks, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 6 (2016), 115–135.

Bello

G.A.

Angus

Pedemane

Harlalka

J.K.

Semazzi

F.H.M.

Kumar

and Samatova

N.F.

, Response-Guided Community Detection: Application to Climate Index Discovery, in: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2015, Porto, Portugal, September 7–11, 2015, Proceedings, Part II Appice

Rodrigues

P.P.

Santos Costa

Gama

Jorge

Soares

, eds., Springer International Publishing, Cham, 2015, pp. 736–751.

Chen

D.-B.

Gao

Lü

and Zhou

, Identifying Influential Nodes in Large-Scale Directed Networks: The Role of Clustering, PLoS ONE 8 (2013), e77455.

Chen

Lü

Shang

M.-S.

Zhang

Y.-C.

and Zhou

, Identifying influential nodes in complex networks, Physica A: Statistical Mechanics and Its Applications 391 (2012), 1777–1787.

Chen

Y.-C.

Chang

S.-H.

Chou

C.-L.

Peng

W.-C.

and Lee

S.-Y.

, Exploring Community Structures for Influence Maximization in Social Networks, in: The 6th SNA-KDD Workshop’12, ACM, Beijing, China, 2012, pp. 1–9.

Domingos

and Richardson

, Mining the network value of customers, in: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, California, 2001, pp. 57–66.

Gao

Mahadevan

and Deng

, A new method of identifying influential nodes in complex networks based on TOPSIS, Physica A: Statistical Mechanics and Its Applications 399 (2014), 57–69.

10.

Estevez

P.A.

Vera

and Saito

, Selecting the Most Influential Nodes in Social Networks, in: 2007 International Joint Conference on Neural Networks, 2007, pp. 2397–2402.

11.

Freeman

L.C.

Borgatti

S.P.

and White

D.R.

, Centrality in valued graphs: A measure of betweenness based on network flow, Social Networks 13 (1991), 141–154.

12.

Friedkin

N.E.

, Theoretical Foundations for Centrality Measures, American Journal of Sociology 96 (1991), 1478–1504.

13.

Y.-H.

Huang

C.-Y.

and Sun

C.-T.

, Using global diversity and local topology features to identify influential network spreaders, Physica A: Statistical Mechanics and Its Applications 433 (2015), 344–355.

14.

Gao

Wei

Mahadevan

and Deng

, A modified evidential methodology of identifying influential nodes in weighted networks, Physica A: Statistical Mechanics and Its Applications 392 (2013), 5490–5500.

15.

Gong

Jiao

and DU

, Memetic algorithm for community detection in networks, Phys. Rev. E. 84 (2011), 1–9.

16.

Gong

Cai

Chen

and Ma

, Complex Network Clustering by Multiobjective Discrete Particle Swarm Optimization Based on Decomposition, IEEE Transactions on Evolutionary Computation 18 (2014), 82–97.

17.

Goyal

and Lakshmanan

, CELF++: optimizing the greedy algorithm for influence maximization in social networks, in: Proceedings of the 20th International Conference Companion on World Wide Web, ACM, Hyderabad, India, 2011, pp. 47–48.

18.

Hou

Yao

and Liao

, Identifying all-around nodes for spreading dynamics in complex networks, Physica A: Statistical Mechanics and Its Applications 391 (2012), 4012–4017.

19.

Jung

Heo

and Chen

, IRIE: Scalable and Robust Influence Maximization in Social Networks, in: Proceedings of the 2012 IEEE 12th International Conference on Data Mining, IEEE Computer Society, 2012, pp. 918–923.

20.

Kempe

Kleinberg

and Tardos

, Maximizing the Spread of Influence through a social Network, Theory of Computing 11 (2015), 104–147.

21.

Keyvanpour

and Azizani

, Classification of Approaches and Challenges of Frequent Subgraphs Mining in Biological Networks, CoRR abs/1207.3543 (2012).

22.

Keyvanpour

M.R.

Javideh

and Ebrahimi

M.R.

, Detecting and investigating crime by means of data mining: a general crime matching framework, in, 2011, pp. 872–880.

23.

Keyvanpour

M.R.

Moradi

and Hasanzadeh

, Digital Forensics 2.0, in: Computational Intelligence in Digital Forensics: Forensic Investigation and Applications Muda

A.K.

Choo

Y.-H.

Abraham

Srihari

S.N.

, eds., Springer International Publishing, Cham, 2014, pp. 17–46.

24.

Kumari

and Singh

S.N.

, Online influence maximization using rapid continuous time independent cascade model, in: 2017 7th International Conference on Cloud Computing, Data Science & Engineering – Confluence, 2017, pp. 356–361.

25.

Leskovec

Krause

Guestrin

Faloutsos

VanBriesen

and Glance

, Cost-effective outbreak detection in networks, in: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Jose, California, USA, 2007, pp. 420–429.

26.

Liu

Jiang

and Liu

, Identifying social influence in complex networks: A novel conductance eigenvector centrality model, Neurocomputing 210 (2016), 141–154.

27.

Liu

Xiong

Shi

and Wang

, Evaluating the importance of nodes in complex networks, Physica A: Statistical Mechanics and Its Applications 452 (2016), 209–219.

28.

Liu

Tang

Zhou

and Do

, Identify influential spreaders in complex networks, the role of neighborhood, Physica A: Statistical Mechanics and Its Applications 452 (2016), 289–298.

29.

Sun

Wen

Cao

and Porta

T.L.

, Algorithms and Applications for Community Detection in Weighted Networks, IEEE Transactions on Parallel and Distributed Systems 26 (2015), 2916–2926.

30.

and Guo

, Mining communities in social network based on information diffusion, IEEJ Transactions on Electrical and Electronic Engineering 11 (2016), 604–617.

31.

Guo

Yang

Zhang

and Jocshi

, Improved Algorithms OF CELF and CELF++ for Influence Maximization, Journal of Engineering Science and Technology Review 7 (2014), 32–38.

32.

L.-l.

Zhang

H.-F.

and Wang

B.-H.

, Identifying influential spreaders in complex networks based on gravity formula, Physica A: Statistical Mechanics and Its Applications 451 (2016), 205–212.

33.

and Ma

, Identifying and ranking influential spreaders in complex networks with consideration of spreading probability, Physica A: Statistical Mechanics and Its Applications 465 (2017), 312–330.

34.

Malliaros

F.D.

Rossi

M.-E.G.

and Vazirgiannis

, Locating influential nodes in complex networks, Scientific Reports 6 (2016).

35.

Narayanam

and Narahari

, A Shapley Value-Based Approach to Discover Influential Nodes in Social Networks, Transactions On Automation Science and Engineering 8 (2011), 130–147.

36.

Pastor-Satorras

and Vespignani

, Epidemic Spreading in Scale-Free Networks, Physical Review Letters 86 (2001), 3200–3203.

37.

Pizzuti

, A Multiobjective Genetic Algorithm to Find Communities in Complex Networks, IEEE Transactions on Evolutionary Computation 16 (2012), 418–430.

38.

Pourkazemi

and Keyvanpour

, A survey on community detection methods based on the nature of social networks, in: ICCKE 2013, 2013, pp. 114–120.

39.

Pourkazemi

and Keyvanpour

, Community Detection in Social Network by Using a Multi-objective Evolutionary algorithm, Intelligent Data Analysis 21 (2017), 385–409.

40.

Rahimkhani

Aleahmad

Rahgozar

and Moeini

, A fast algorithm for finding most influential people based on the linear threshold model, Expert Systems with Applications 42 (2015), 1353–1361.

41.

Rahnamayan

Tizhoosh

H.R.

and Salama

M.M.A.

, A novel population initialization method for accelerating evolutionary algorithms, Computers & Mathematics with Applications 53 (2007), 1605–1614.

42.

Richardson

and Domingos

, Mining knowledge-sharing sites for viral marketing, in: Proceedings of the Gighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Edmonton, Alberta, Canada, 2002, pp. 61–70.

43.

Roelens

Baecke

and Benoit

D.F.

, Identifying influencers in a social network: The value of real referral data, Decision Support Systems 91 (2016), 25–36.

44.

Saito

Nakano

and Kimura

, Prediction of Information Diffusion Probabilities for Independent Cascade Model, in: Knowledge-Based Intelligent Information and Engineering Systems: 12th International Conference, KES 2008, Zagreb, Croatia, September 3–5, 2008, Proceedings, Part III Lovrek

Howlett

R.-J.

Jain

L.-C.

, eds., Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 67–75.

45.

Sheikhahmadi

Nematbakhsh

M.A.

and Shokrollahi

, Improving detection of influential nodes in complex networks, Physica A: Statistical Mechanics and Its Applications 436 (2015), 833–845.

46.

Shi

Yan

Cai

and Wu

, Multi-objective community detection in complex networks, Applied Soft Computing 12 (2012), 850–859.

47.

Srinivas

and Velusamy

R.L.

, Identification of influential nodes from social networks based on Enhanced Degree Centrality Measure, in: 2015 IEEE International Advance Computing Conference (IACC), 2015, pp. 1179–1184.

48.

Tsai

C.W.

Yang

Y.C.

and Chiang

M.C.

, A Genetic NewGreedy Algorithm for Influence Maximization in Social Network, in: 2015 IEEE International Conference on Systems, Man, and Cybernetics, 2015, pp. 2549–2554.

49.

Ventresca

and Aleman

, Efficiently identifying critical nodes in large complex networks, Computational Social Networks 6 (2015).

50.

Wang

Jin

Lin

Cheng

and Yang

, Influence maximization in social networks under an independent cascade-based model, Physica A: Statistical Mechanics and Its Applications 444 (2016), 20–34.

51.

Wang

Cong

Song

and Xie

, Community-based greedy algorithm for mining top-K influential nodes in mobile social networks, in: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Washington, DC, USA, 2010, pp. 1039–1048.

52.

Wang

Zhao

and Du

, Fast ranking influential nodes in complex networks using a k-shell iteration factor, Physica A: Statistical Mechanics and Its Applications 461 (2016), 171–181.

53.

Wasserman

and Faust

, Social Network Analysis: Methods and Application, Cambridge University Press, Cambridge, United Kingdom, 1994.

54.

Wei

Yajun

and Siyu

, Efficient influence maximization in social networks, in: Proceedings of the 15th ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, ACM, Paris, France, 2009, pp. 199–208.

55.

Yang

and Xie

, Efficient identification of node importance in social networks, Information Processing & Management 52 (2016), 911–922.

56.

Zang

Wang

Yao

and Guo

, A Fast Climbing Approach for Diffusion Source Inference in Large Social Networks, in: Data Science: Second International Conference, ICDS 2015, Sydney, Australia, August 8–9, 2015, Proceedings Zhang

Huang

Shi

Zhu

Tian

Zhang

, eds., Springer International Publishing, Cham, 2015, pp. 50–57.

57.

Zhang

Zhu

Wang

and Zhao

, Identifying influential nodes in complex networks with community structure, Knowledge-Based Systems 42 (2013), 74–84.

58.

Zhao

and Jin

, Identification of influential nodes in social networks with community structure based on label propagation, Neurocomputing 210 (2016), 34–44.

59.

Zhou

Zhang

Guo

and Guo

, An upper bound based greedy algorithm for mining top-k influential nodes in social networks, in: Proceedings of the 23rd International Conference on World Wide Web, ACM, Seoul, Korea, 2014, pp. 421–422.

CNLPSO-SL: A two-layered method for identifying influential nodes in social networks

Abstract

Keywords

1. Introduction

3. Preliminaries

3.1 Influence maximization problem

3.3 Diffusion models

3.3.1 Independent cascade (IC) model

4.1 Method description

4.1.1 Network partitioning

4.2.1 Implementation of CLPSO-SL approach

5.1 Dataset

Table 1 Dataset description

5.5 Discussion

6. Conclusion

References

Table 1
Dataset description