Abstract
The purpose of the influence maximization is to find the top
Introduction
Nowadays, with the increasing popularity of online social networks, such as Sina Weibo, WeChat and FaceBook, the business finds that social networks play an important role in information spreading. In the studies of information spreading, one of the most popular topics is influence maximization, which selects a small number of individuals as seed set that can influence nodes as much as possible. In addition, the influence maximization has many significant applications, such as viral marketing and career predicting. Considering an application in fortune teller as an example. A people’s career path consists of several career stages. We can collect the current characteristics and topics to ensure the most influential factors that decide the proper job positions, which is crucial to the employees, employers, and headhunters.
The influence maximization, as one of the most important themes in social network analysis, is first defined by Domingos and Richardson [1]. Since then, Kempe et al. [2] propose two kinds of information spreading model: Independent Cascade (IC) model and Linear Threshold (LT) model. Meanwhile, they prove that the problem under spreading model is NP-hard and present a greedy algorithm [3] to obtain an optimal influence spread, which depends on the Monte-Carlo simulations. However, their algorithm is still impractical when facing large-scale social networks.
A large number of following works aim to deal with the context problem from efficient heuristic algorithms [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] perspective, which can improve the running time. The research [16] finds that the community structure reveals the fundamental properties of the social network. Inspired by this, studies [17, 18, 19, 20, 21, 22, 23] tackle the influence maximization problem by using the community structure. However, they ignore the reality that in general, the real-world social networks are dynamic and the community structure changes timely. Recently, studies [28, 29] mine the community from a dynamic social network. Inspired by this, based on the community structure, we deal with the influence maximization problem on the dynamic networks.
In this paper, we propose a Community-Based algorithm for Influence Maximization on Dynamic social networks, called D-CBIM algorithm. In our algorithm, we first detect the nodes community. Then, we evaluate the importance of nodes according to the nodes location analysis in each community. Moreover, we partition the dynamic node into two types: add to the social network or delete from the social network. After that, we select the influential nodes as the candidate seeds. And finally, we use the CELF (Cost Effective Lazy Forward) algorithm to obtain the final influential seed set. More specially, the major contributions of this paper are summarized in the following part:
Considering the dynamic networks, we propose a novel heuristic algorithm, D-CBIM algorithm, which divides the dynamic changes into two situations: add a node or delete a node. We present a new method to evaluate the importance of nodes, which takes the nodes location analysis into account. Extensive experiments have been carried out to estimate our algorithm on real social networks. The experimental results show that our algorithm performs better than other heuristic algorithms in both influence spread and running time and runs less time than the CELF algorithm for we consider the dynamic networks.
The rest of the paper is organized below. Firstly, we retrospect the related work on influence maximization in Section 2. Then, in Section 3, we describe the preliminaries of our paper. After that, we present our proposed D-CBIM algorithm based on community structure for influence maximization on dynamic networks in Section 4. In addition, in Section 5, we evaluate our proposed algorithm compared with many other algorithms. Finally, in Section 6 we expose our conclusions and the future directions.
Since Kempe et al. [2] prove the influence maximization problem is NP-hard and present a basic greedy algorithm to approximate the optimal solution, a series of studies have been proposed to deal with this problem. In general, there are mainly two categories: greedy algorithm and heuristic algorithm.
The greedy algorithm
Kempe et al. [2] propose a basic greedy algorithm, which repeatedly selects the node with the largest marginal influence as the seed set. There is no doubt that it is most accurate in terms of influence spread. Unfortunately, it has a serious efficiency problem, which is inappropriate to large scale networks. In order to reduce the time complexity, taking the submodularity into account, Leskovec et al. [3] propose the CELF algorithm, which significantly reduces the running time. Since then, NewGreedy and MixGreedy are proposed in the study [4]. The NewGreedy algorithm evaluates the influence spread of nodes in the same iteration by the Monte Carlo simulations. In addition, MixGreedy combines the advantages of CELF and NewGreedy algorithm. However, although those improved greedy algorithms reduce the time complexity, they are still unsuitable for large-scale networks. In conclusion, the greedy algorithm can guarantee the accuracy of influence spread, but it is impractical in real social networks.
The heuristic algorithm
Diverging from the greedy algorithm, aiming at solving the efficiency problem, the following heuristic algorithms [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15] are proposed. Chen et al. [4] propose the Degree Discount algorithm, which regards the node with the largest degree as the influential seed. In order to reduce the influence spread, if a node is selected as a seed, the neighbors will be discounted. However, this method ignores the propagation probability. In addition, they propose PMIA (Prefix Excluding Maximum Influence Arborescence) algorithm [5], which utilizes the local influence arborescences to evaluate the influence spread. And Kyomin Jung et al. [6] propose Influence Ranking Influence Estimation(IRIE), which runs two orders of magnitude faster than the PMIA algorithm. In addition, Kistak et al. [7] assume that the node who locates in the core of the network tends to have a larger influence spread. Based on this assumption, Cao et al. [8] propose the Core Covering Algorithm (CCA) algorithm, which selects the nodes with both large degree and core as seeds. However, the CCA algorithm leads to low influence spread for it may cover too many influential nodes.
Different from the above algorithms, many researchers [17, 18, 19, 20, 21, 22, 23] deal with the influence maximization problem from the community structure perspective. Cao et al. [17] are the first to deal with influence maximization based on the community structure, which assumes that different community is independent. Chen et al. [18] propose the CIM (Community-based Influence Maximization), which consists of three phases: community detection, candidate seeds, and final seeds. Basing on multi-neighbor potential in community networks, Shang et al. [19] propose an influence maximization framework, called CoFIM, which consists of two parts: seeds expansion and influence spreading. In conclusion, the heuristic algorithms based on the community structure are generally faster than the traditional algorithms. Unfortunately, we notice that a few studies focus on the dynamic social networks, ignoring the reality that the real world social networks are dynamic and community structure changes timely, which may have an effect on influence maximization problem. For example, we select
From the above discussion, to deal with the context situation, we should take the dynamic network into account. In this paper, we divide our algorithm into three phases: (i) community detection, (ii) candidate seed set on the dynamic network, and (iii) final seed set. In the first phase, we use the Louvain [24] algorithm to obtain the community structure. Then, we analyze the node location to judge the importance of the node. The candidate nodes are those nodes who locate in the core of the community. In addition, we consider the situation that a node may add to or delete from the social network. And in the third phase, we select
Preliminaries
In this section, at first, we formally describe the influence maximization and diffusion model. After that, we describe the Louvain algorithm which is utilized to detect community. And finally, we present the CELF algorithm, which can optimize the final seed set.
Influence maximization problem and diffusion model
Influence maximization is aimed to specify
Up to now, there are mainly two diffusion models: IC model and LT model. The IC mode not only is easy to understand but also accords with the information diffusion process. Thus, in this paper, we select the IC model as our diffusion model. The following is a simple definition of the IC model.
In the IC model, every node only has one state: active or inactive. Initially, all the seeds
Louvain algorithm
To the best of our knowledge, the Louvain [24] algorithm is a fast and effective modularity optimization algorithm to detect community for its time complexity is
Modularity optimization. We assume that each node is a community firstly. Then, for each node Community aggregation. We regard the new communities as an independent node and repeat the first step. The two steps will be repeated until no movement of the node.
The main idea of CELF [3] is that using the submodularity to reduce the time complexity of the greedy algorithm.
The proposed algorithm
In this section, we introduce the implement of our algorithm, which is based on the community structure to deal with the influence maximization problem on the dynamic networks. The framework of our proposal is shown in Fig. 1. At a high level, the algorithm can be summed up in three stages blew. At first, we utilize the Louvain algorithm to detect community. Then, we conduct each community to select the influential nodes by analyzing the location. Moreover, we use the CELF algorithm to find the final seed set from the candidate seed set, which can improve both running time and influence spread.
The structure of our D-CBIM algorithm.
Community structure [16] is characterized as a partition of network nodes into subgroups. The nodes within the community are closely connected while the nodes between the communities are sparsely connected. Newman et al. [25] propose the definition of modularity, using to access the quality of the community structure. In general, a social network is modeled as
Where
For Louvain algorithm is an effective and accuracy algorithm, we choose the Louvain algorithm to detect community. We have introduced the process of Louvain algorithm in Section 3, which is a modularity-based algorithm. The study [24] gives the description of the Louvain algorithm, whose gain in modularity is denoted following.
Where
In order to find the most influential nodes in each community, we assign the nodes into three types:
From the above discussion, we define the propagation degree to evaluate the node influence. Degree centrality is a straightforward and significant measurement for selecting the influential nodes during information diffusion, which selects nodes with maximum degree as initial seeds [26, 27]. However, it ignores the propagation rate. Thus, in this paper, we regard the nodes with the largest propagation degree as the influential nodes, which combines the degree and propagation rate. In order to improve accuracy, we consider the two-hop neighbor nodes. The propagation degree (
Where
After we conduct the Louvain algorithm and identify the kinds of nodes in each community, in this section, we select the peak node as the candidate seed set for its propagation degree higher than its two-hop neighbor nodes. The Louvain algorithm performs well in detecting community on the static networks. Unfortunately, in reality, the social network is dynamic and changes every time. Due to the goal of our algorithm is to select the influential nodes, thus, based on the Louvain algorithm, we update the set of peak node when the social network changes. There could be two situations for peak node:
Add node. When a node adds to the social network, we need to judge the kinds of this node. If this node is a peak node, we should add this node to the candidate seed set, which is shown in Fig. 2.
Changes when adding a peak node to social network. Delete node. When a node is removed from the social network, we need to judge the kinds of this node. If this node is a peak node, we need to find another node to replace it. For the node may have a similar attribute with its neighbor nodes, thus, we select the neighbor node with the largest propagation degree replace the removed peak node as shown in Fig. 3.
Changes when deleting a peak node from social network.


In conclusion, the candidate seed set consists of all the peak node although the network changes.
After obtaining the candidate seed set, how to find the optimal solution is the goal of this section. CELF algorithm [3] is a representational greedy algorithm, which runs 700 times faster than the basic greedy algorithm. Therefore, in this paper, we utilize the CELF algorithm to find the most influential seeds from the candidate seed set.
The D-CBIM algorithm is shown as follows:
Where
The goal of our algorithm is to find the most influential final seed set on the dynamic social network. Our algorithm consists of three parts. Firstly, we utilize the Louvain algorithm to detect the community (line 1). Then, we calculate the propagation rate to identify the kinds of nodes (in lines 2–4). After that, we add the peak node to the candidate seed set as shown in line 5. Considering the dynamic social network, we identify the kinds of the changed node (lines 6–10). And the last past is that we use the CELF algorithm to select the k influential seed set from the candidate seed set (lines 11–12).
Time complexity
The total calculation of Algorithm 2 mainly consists of three parts: detect community, candidate seeds calculation, and final seeds calculation.
In the first part, we utilize the Louvain algorithm to detect community, whose time complexity is
Results
We carry out experiments to access the performance of our algorithm over four real-world datasets and five compared algorithms: CELF, PMIA, CoFIM, CCA, and IRIE. All the codes are implemented in Python 2.7, and all the tests are run on CPU 2.8 GHz and 128 GB memory.
Experiment setup
Dataset preparation
We utilize four public real-world networks downloaded from the SNAP (
ca-astroPh. The dataset contains scientific collaborations between authors papers submitted to Condense Matter category. Email-Enron. This dataset is derived by the CALO Project, which covers about 150 users, and most of them are senior management. NetPHY. This dataset is from the scientific research cooperation network of physics. Amazon. The dataset is collected by crawling Amazon website, which covers product metadata, reviews, and links.
These networks are selected in our experiment for they cover various types of networks in many fields, whose statistical properties are summaried in Table 1. In addition, as to the dynamic social network, we suppose that when a node adds to the social network, it randomly connects with other nodes. And when a node removes from the social network, it will remove the connected edges.
Basic information of the context datasets
We compare our algorithm with five traditional algorithms including one greedy algorithm and four heuristic algorithms, which contains the representative of the greedy algorithm – CELF algorithm, path-based heuristic algorithm – PMIA algorithm, core-based heuristic – CCA, community-based algorithm – CoFIM algorithm and two steps heuristic – IRIE algorithm. The descriptions of the implementation details of these algorithms are summarized below.
D-CBIM: Our algorithm contains three phases: community detection, candidate seed set on dynamic social network and CELF algorithm. CELF: As a typical representative of the greedy algorithm, CELF gets higher effectiveness while not suits for large social networks. PMIA: A heuristic algorithm, PMIA, based on the maximum influence arborescence, runs faster than many heuristic algorithms. In addition, we run the PMIA algorithm with a fixed CCA: Integrating the hierarchy and influence radius based on the K-shell method, CCA is not very effective in the aspect of influence spread. Here, we set influence CoFIM: CoFIM, a community-based algorithm, conducts seeds expansion between different communities firstly, then propagates on the intra-community. IRIE: IRIE is an effective algorithm for influence maximization, which incorporates a new message passing and influence estimation. In addition,
To accurately obtain the influence spread of the above six algorithms, we run 10000 times for each targeted set as done by the previous works and then present the average of the influence spread in the following subsections.
In this section, we first conduct the change for the influence spread when a node adds to the social network or deletes from the social network. After that, we analyze the experimental results of our proposal on four real-world datasets from the aspects of influence spread and running time.
Dynamic social network
Tables 2 and 3 present the changes on ca-astroPh and NetPHY dataset when a node adds to or deletes from the social network, where the dynamic time is the time that updates the structure of the social network to find the candidate seed set. Obviously, the changes between delete and add is similar in the aspects of dynamic time and influence spread. Thus, in the following comparison with other baseline algorithms, we adopt the situation that a peak node is removed from the social network.
Changes on ca-astrtoph dataset
Changes on ca-astrtoph dataset
Changes on netphy dataset
Influence spread on four different datasets.
Figure 4 shows the performance of the six algorithms introduced in Section 5.1 over four datasets under the IC model. In order to obtain a persuasive comparison, we simulate the experiments with different size of seed set, which ranges from 10 to 50.
As a whole, Fig. 4 shows that our D-CBIM algorithm outperforms other heuristic algorithms while obtains a close influence spread to CELF algorithm, which illustrates the superiority of our algorithm in tradeoff the influence spread and running time. In general, the CELF outperforms the other five heuristic algorithms for it can guarantee within (
Running time on different datasets for selecting 100 seed nodes.
The running time of the algorithms D-CBIM, CELF, PMIA, CCA, CoFIM, and IRIE are
Discussion on the results
We can draw several conclusions from the experimental results. In the first place, taking the nodes location analysis into account, we present a new method to evaluate the importance of nodes that is utilized to divide the nodes into three kinds: peak node, valley node, and slope node. In addition, in order to apply to dynamic social networks, we partition the dynamic changes into two situations: add nodes or delete nodes. Moreover, empirical evaluations show that our proposed algorithm achieves a better performance on influence spread compared with the four baseline algorithms: PMIA, CCA, CoFIM, and IRIE. The reason is probably that these algorithms only consider the static social network, ignoring the dynamic of joining and deleting nodes, which leads to inaccurate results. Moreover, our algorithm obviously improves the running time of CELF and obtains a similar influence spread as CELF. Unfortunately, one limitation of D-CBIM is that it has an unsatisfactory advantage in running time.
Conclusions
In this paper, we propose a new community-based algorithm to address the influence maximization problem, which is applicable to dynamic social networks. Firstly, we utilize a simple and effective algorithm, call Louvain algorithm, to detect community. Then, in each community, propagation degree is utilized to identify the kinds of nodes, which can ensure the candidate seed set. Especially, considering the change of social network, we divide the dynamic nodes into two situations: delete nodes or add nodes. Finally, we adopt the CELF algorithm to optimize the solution from the candidate seed set.
In the future, we will try the best to look for a better method to deal with the influence maximization problem for the dynamic social network. As an example, we can from the aspects of the kinds of edges when a new node adds to the network.
Footnotes
Acknowledgments
This paper is supported by the Nature Science Foundation of China (No. 71772107).
