Abstract
Community structure, a foundational concept in understanding networks, is one of the most important properties of dynamic networks. A large number of dynamic community detection methods proposed are based on the temporal smoothness framework that the abrupt change of clustering within a short period is undesirable. However, how to improve the community detection performance by combining network topology information in a short period is a challenging problem. Additionally, previous efforts on utilizing such properties are insufficient. In this paper, we introduce the geometric structure of a network to represent the temporal smoothness in a short time and propose a novel Dynamic Graph Regularized Symmetric NMF method (DGR-SNMF) to detect the community in dynamic networks. This method combines geometric structure information sufficiently in current detecting process by Symmetric Non-negative Matrix Factorization (SNMF). We also prove the convergence of the iterative update rules by constructing auxiliary functions. Extensive experiments on multiple synthetic networks and two real-world datasets demonstrate that the proposed DGR-SNMF method outperforms the state-of-the-art algorithms on detecting dynamic community.
Keywords
Introduction
Community structure, one of the most prominent features of networks, plays an important role in network analysis [1, 2]. Accordingly, a large number of community detection techniques have been proposed to identify accurate community structure of networks [3, 4]. However, most of these efforts focus on dealing with community detection in static networks, which have overlooked the temporal feature of community in dynamic networks. In reality, networks especially the online social networks present a high degree of dynamics when the users (nodes) join and leave optionally at any time, which brings noises to networks. Consequently, traditional methods, used in static networks, are not competent for community detection in dynamic networks that are ambiguous and subject to noise. Therefore, a growing number of efforts have been made on community detection in dynamic networks recently [5].
Although a lot of work has been done to address the problem of Dynamic Community Detection (DCD), it is still highly not trivial to design effective and efficient algorithms for DCD. Several reasons are responsible for this issue. Firstly, since no unified definition is given for dynamic community, the varied definitions will make different DCD algorithms incomparable so as to reduce the scalability of algorithms. Secondly, the introduction of time dimension increases the model complexity. Unlike the traditional community detection, DCD must take into account evolution of network structure which stems from the variation of nodes and edges in networks over time. Lastly but not least, the diversity of evolution events for community brings great challenges to DCD. According to the existing literature [6, 7], there are eight kinds of evolution events for community, namely birth, death, growth, contraction, merge, split, continue, and resurgence. At present, no any DCD algorithms can fully identify and track all these evolution events. Despite various difficulties mentioned above, some excellent techniques have been proposed to improve the performance of DCD. Among them, evolutionary clustering, first proposed by Chakrabarti et al. [8], is one of the most widely used approaches for DCD. This method detect communities at time
where the snapshot cost measures how well a community structure fits the network at time
Non-negative Matrix Factorization (NMF), widely used in information retrieval, computer vision and pattern recognition, has currently been introduced to solve the problem of community detection by presenting its excellent performance [12, 13]. As for the DCD, some researchers [14, 15, 16] have incorporated NMF into the temporal smoothness framework, forming many effective DCD models. However, though studies [17, 18] have shown that NMF is optimal for learning the parts of objects, it cannot capture the geometrical structure of the data space which is essential for clustering. According to spectral graph theory [19] and manifold learning theory [20], graph regularization method can obtain the geometric structure of data. In addition, under the assumption of temporal smoothness, the geometric structure of networks will not change sharply in a short time period. Therefore, we argue that it will be intuitively effective to use graph regularization to model the temporal cost function.
Our main contributions can be summarized as follows:
First of all, we propose a novel Dynamic Graph Regularized Symmetric Non-negative Matrix Factorization (DGR-SNMF) model for DCD, which introduces the graph regularization term as the temporal cost function at the previous time, holding the smoothness of geometric structure. Secondly, we design efficient update rules which can adapt to the change of number of nodes and number of communities over time, and provide the theoretical analysis on their convergence. Finally, extensive experiments on dynamic synthetic networks and real-world networks show the superior performance of DGR-SNMF over the other state-of-the-art methods.
The rest of the paper is organized as follows. In Section 2, we discuss related work. Section 3 describes the details of the proposed model. Section 4 depicts the DGR-SNMF algorithm and proves its convergence. In Section 5 experimental results on synthetic and real-world dynamic networks are presented, followed by a comparison with other existing methods. Finally in Section 6, we give some conclusions and discuss the future directions.
Although it’s not been a long time since the concept of community in complex network was introduced, a large number of approaches have been proposed to solve the problem of community detection in networks from different perspectives. As an intrinsic nature of network, temporal dimension always plays a significant role in the network analysis. Therefore, dynamic community detection has attracted wide attentions recently, so that many approaches have been summarized and classified. According to Aynaud et al. [21], DCD methods can be divided into three categories: two-stage approaches, evolution clustering and coupling graphs. While Hartmann et al. [22] argued that all the existing DCD methods could be identified as either online or offline method. Rossetti and Cazabet [5] proposed a latest survey about community detection in dynamic networks, which presented the distinctive features and challenges of DCD and proposed a novel classification of published approaches. Since our method is based on the temporal smoothness framework, we mainly focus on the related work based on evolutionary clustering.
Lin et al. [23] introduced the evolutionary clustering to analyze communities and their evolutions in dynamic networks. They proposed a novel framework called FacetNet, which can discover communities by jointly maximizing the fit to the observed data and the temporal evolution. Chi et al. [24] proposed two frameworks based on evolutionary spectral clustering in which temporal smoothness is incorporated. The two frameworks, called Preserving Cluster Quality (PCQ) and Preserving Cluster Membership (PCM), are different in how the temporal cost is defined. PCQ measures the consistence between the current clustering result and the network at the previous time step. While PCM represents the smoothness of clustering results between the current and the previous snapshots. Folino and Pizzuti [25] formulated DCD as a multi-objective optimization problem, which maximizes clustering accuracy at the current time step and minimizes clustering drift from one time step to the successive one simultaneously. Ma and Dong [26] proposed two Evolutionary Nonnegative Matrix Factorization (ENMF) frameworks and proved the equivalence relationship among ENMF, the evolutionary modularity density and evolutionary spectral clustering. In addition, they introduced a semi-supervised method called sE-NMF which incorporates a priori information into ENMF. Gao et al. [14] presented a DCD model using nonnegative matrix factorization, which incorporated the community membership transition matrix into the temporal cost term to regularize the smooth changes of community structure. Jiao et al. [15] proposed a similar DCD model with Gao’s. However, the difference between them lies in the snapshot cost. The former used the standard NMF method as the snapshot cost function, while the latter adopted the symmetric NMF (SNMF). Liu and Yuan [27] proposed a DCD model using triple nonnegative matrix factorization which introduced the node weight matrix to achieve good interpretability. Yang et al. [16] presented a DCD method based on NMF considering the strength between nodes. Their basic idea is that node pairs with stronger connection strength have more possibility to be in the same community. Different from other methods mentioned above, they constructed a smoothed node strength matrix representing the connection strength of nodes. Then, the smoothed matrix is decomposed by standard NMF to detect communities in dynamic networks. Our work differs from all these studies in that we employ the graph regularization to capture the historical geometric structure information, which has not been utilized in the existing DCD methods.
Proposed model
Notation
In this subsection, we present the notations that will be used in the remaining of the paper. Let
Snapshot cost
A simple toy example of the stochastic block model: (a) the original graph, (b) the bipartite graph with two communities, and (c) how to generate an edge 
According to the existing literature, many approaches have been presented to measure how well a community structure fits the given network. Therein, the stochastic block model [28] is a good measurement which introduces bipartite graph to factorize graph. Similar to Lin et al. [23], we adopt a toy example with 6 nodes and 2 communities (see Fig. 1) to illustrate the main idea of this model. The model employs a special bipartite graph (Fig. 1b), where an edge can only occur between a node
where
where
where the lower the value of
In the evolutionary process, the underlying network is continuously updated over time by either inserting or removing nodes or edges. Under the assumption of temporal smoothness framework, it is almost impossible that nodes or edges change drastically between two consecutive time slices. Therefore, the geometric structure of a graph will not change dramatically between two neighboring time steps. Based on this idea, we argue that it is beneficial to employ the intrinsic geometry to regularize the temporal evolution of network. According to related studies on spectral graph theory [19] and manifold learning theory [20], we introduce graph Laplacian to capture the intrinsic geometrical structure of the data space. The graph Laplacian is defined as follows
where
Here, we can get a mapping function which is sufficiently smooth on the data manifold by minimizing Eq. (6).
Consequently, we construct a temporal cost function as follows
where
According to Eqs (1), (4), and (7), we can derive the total cost function
Here, the optimal community structure at time
Considering a given set of dynamic networks
The update rules
In this section, we introduce an iterative approach to solve the optimization objective function. As to the first temporal slice of a given dynamic networks
where the parameter
For temporal slices after the first one, the objective function can be reformulated as
Let
Due to
According to the Karush-kuhn-Tucker condition
Consequently, we can obtain the following update rule for networks at time
where the effect of the parameter
According to the update rules Eqs (10) and (15), we design a Dynamic Graph Regularized Symmetric Non-negative Matrix Factorization algorithm (DGR-SNMF) to detect community structure of a sequence of dynamic networks. The brief flow of DGR-SNMF algorithm is presented in Algorithm 1. Some technique details are described below.
Firstly, due to its unavailability, especially in real world, how to determine the number of community is a challenging problem. Among the existing approaches, we choose to employ an objective function guided strategy proposed by Ma and Dong [26], which is based on the matrix spectrum theory. According to the theory, given a network
where
Secondly, we must determine how to handle the case of inserting or removing nodes, which is common in real dynamic networks. From Eq. (15), we can find that the history geometric structure information is captured by
Although the change of the number of communities cannot interrupt the update process based on Eq. (15), we should consider how to initialize the community indicator matrix
In order to testify the convergence of our iterative update algorithm, we employ the method proposed by Lee and Seung [32], which makes use of an auxiliary function similar to that used in the Expectation-Maximization algorithm [33, 34]. So, we begin with the definition of the auxiliary function.
are satisfied.
where the superscript
According to the lemma of the auxiliary function, it is clear to see that constructing an appropriate auxiliary function is the key to prove the convergence of our algorithm under the update step in Eq. (15). Considering the terms of Eq. (8) which are only relevant to
Limited to space, we omit the proof for Lemmas 2 and 3.
is an auxiliary function for
Proof. It is straightforward to see
In addition, according to Lemma 2 and the inequality of
For the negative terms in Eq. (22), we employ the inequality of
and
Based on Eqs (23)
In order to find out the minimum of
In addition, according to the same derivation as in [35, 37, 38, 39], the Hessian matrix of
is a diagonal matrix with positive entries. Thus
Comparing Eqs (15) and (29), it is obvious that they are equivalent. Similarly, the update rule from Eq. (10) can also be derived by applying the same steps mentioned above. Therefore, we can guarantee the proposed algorithm is convergent.
To evaluate the performance of DGR-SNMF algorithm, we employ two existing algorithms for comparison, including DYNMOGA [25] and FacetNet [23]. The reason why these two are selected is that, as far as we know, DYNMOGA is renowned for its excellent performance, and FacetNet is among the most well-known algorithms for evolutionary clustering. For each algorithm, 50 runs are conducted on all experiments and the average results are reported.
Evaluation measures
In experiments, we employed two widely used evaluation metrics, which are Normalized Mutual Information (NMI) [40] and modularity [41], to measure the quality of dynamic communities detected by algorithms. Among them, NMI and modularity are applicable to networks with real communities and without ground truth respectively. Then, we will briefly describe the details of these two metrics.
NMI, stemming from information theory, can be defined as
where
The other metric modularity, an elegant concept proposed by Newman and Girvan [41], can be formulated as
where
Dynamic LFR benchmark
To generate realistic benchmark networks, Greene et al. [42] developed a dynamic benchmark generator1
Parameter effects of in terms of NMI on synthetic switch datasets. (a) 
Due to the existence of the weighted parameter
In order to testify the performance of DGR-SNMF, we first compare our algorithm with 2 baseline methods mentioned above, i.e., FacetNet and DYNMOGA, in dynamic LFR networks with switch events. Moreover, we set the best parameters in these two algorithms according to related literature. Due to the existence of the ground truth for the community memberships at each time step, we employ NMI to represent the performance. Figure 3 gives the performance on dynamic synthetic benchmarks with
To furthermore investigate the performance of DGR-SNMF, we apply these algorithms on the dynamic synthetic networks with some main events such as birth and death, expansion and contraction, merging and splitting, which may characterize the evolution of dynamic networks. By the generator proposed by Greene et al. [42], we generate three dynamic synthetic networks with 10 consecutive time steps for the three different types of events. Each dynamic network is constituted by 100 nodes, average node degree 10, maximum node degree 20, and mixing parameter 0.2. For birth and death events, 20% of communities are created by removing nodes from existing communities, and 10% of the existing communities are randomly removed. To generate expansion and contracion events, 10% of communities are randomly expanded and contracted by 25% of their size at each time step, and nodes are chosen at random. Among the networks embedded by merging and splitting events at each subsequent time step, 10% of communities are split, another 10% of communities are chosen at random, and couples of them are merged. Figure 4 depicts the NMIs obtained by compared algorithms on the three types of dynamic networks, which clearly shows that DGR-SNMF achieves the best performance in all three evolution events. In particular, it is worth to note that the performance of FacetNet decreases dramatically as time goes for the network with birth and death events, while the other two algorithms are stable. The reason for this is that the FacetNet algorithm is based on the PCM framework, in which the error in communities at the previous time step can be passed down to the detected communities at the next neighboring time steps. However, DGR-SNMF and DYNMOGA algorithms rely on the geometric structure and network topology at two consecutive time steps respectively, which avoids this limitation. In addition, we notice that the DYNMOGA algorithm is not always superior to FacetNet as can be seen in Fig. 4b. On the contrary, DGR-SNMF outperforms the other two algorithms at almost all time steps in all three types of events, except for the fifth time steps in merging and splitting events (as seen in Fig. 4c).
NMI on dynamic LFR synthetic datasets with switch events. (a) 
NMI on dynamic LFR synthetic datasets with (a) birth and death events, (b) expansion and contraction events, (c) merging and splitting events.
In this section, we introduce two other kinds of data sets, named SYN-FIX and SYN-VAR from [44], to verify the performance of our algorithm. SYN-FIX is a dynamic network with a fixed number of communities, while SYN-VAR is a dynamic network with a variable number of communities over times. SYN-FIX consists of 128 nodes divided into 4 communities where each of them contains 32 nodes. Every node has an average degree of 16 and shares
We run DGR-SNMF and FacetNet on the SYN-FIX and SYN-VAR, and compare the results with DYNMOGA as seen in Figs 5 and 6. It should be noted that the results reported for DYNMOGA are from [25], provided by the authors. Figure 5 demonstrates that when the community structure is clear (Fig. 5a), the NMIs of DGR-SNMF and DYNMOGA are comparable, up to 1.0 at almost all time steps. However, as the fuzziness increases, DGR-SNMF shows better performance than DYNMOGA (Fig. 5b). Although the performance of DYNMOGA is better than that of DGR-SNMF at fifth and sixth time steps, its NMI fluctuates dramatically, ranging from 0.9 to 1.0. However, the DGR-SNMF is relatively stable, maintaining an NMI value above 0.98. Overall, compared with DYNMOGA, our algorithm increases averagely 3.4% in terms of NMI on SYN-FIX networks when
NMI on SYN-FIX networks when (a) 
NMI on SYN-VAR networks when (a) 
In this section, a commonly used synthetic dataset, proposed by Lin et al. [23], is employed to investigate the effectiveness of our algorithm. This network consists of 128 nodes clustered into 4 communities of 32 nodes each. Each node shares
From the comparison results given in Fig. 7, it can be seen that our algorithm shows excellent performance, especially when the community structure is distinct, i.e.,
NMI on commonly used synthetic datasets when avgdegree 
RDyn,2
Figure 8 gives a detailed comparison of the performance of various algorithms for RDyn benchmark. We can observe that DGR-SNMF achieves the best performance in most cases followed by FacetNet. In order to see clearly the performance difference between DGR-SNMF and FacetNet, we analyze the distribution of the difference of NMI obtained by the two algorithms separately (Fig. 8b, d, f). Although the performance improvement of DGR-SNMF is not obvious, the proportion of positive values increases with the decrease of the community quality threshold
Comparison of the DCD algorithms on RDyn in terms of NMI ((a) 
In foregoing paragraphs, we analyze the effectiveness of DGR-SNMF on some synthetic dynamic networks, where the ground truth community is known. In this section, we will check whether our algorithm is effective on two real-world dynamic networks t namely Cell Phone Calls (CPC)3
We first apply DGR-SNMF on the CPC networks to detect dynamic communities. For comparison, besides DYNMOGA, we also employ Adaptive Label Propagation Algorithm (ALPA) [46] with excellent performance to verify the performance of our method. In addition, SNMF algorithm is also applied to the CPC networks to verify whether the geometric structure is effective in detecting dynamic communities. Since the true community structure is unknown, we employ modularity to measure the community quality. Table 1 reports the results of these algorithms and some statistical information regarding the CPC networks. As can be seen from Table 1, DGR-SNMF achieves optimal performance, which is represented by the larger modularity values, in all networks except the first one where there is no historical geometric structure to use. Therefore, our algorithm, incorporating geometric structure based on SNMF, performs better than SNMF, showing that the historical geometric structure is indeed effective for community detection in dynamic networks. In addition, it is worth noting that although ALPA performs well on some synthetic dynamic networks, DGR-SNMF is superior to ALPA on real networks.
The detail statistic information of CPC network
T is the number of time steps,
Although the true community structure is unknown, we can follow the same approach of Lin et al. [23] to generate the overall network by aggregating all edges over all ten time steps into a single network, and apply our algorithm to detect communities. After that, the communities obtained on the aggregated network can be considered as the ground truth partition. Then, we can compute the NMI between the ground truth partition and the communities obtained at each time step, as reported in Fig. 9a. It can be seen that the similarity between the communities obtained on the aggregated network and those calculated at each time step is around 0.5, which represents a balance between the snapshot and the temporal costs as described by [25]. However, different from DYNMOGA, the number of communities detected at each time step is close to that obtained on the overall network, which is consistent with intuition.
NMI for CPC network (a) and EM network (b).
Similarly, we perform the same analysis on the EM network as shown in Table 2 and Fig. 9b. As can be seen from Table 2, DGR-SNMF is still the best of the four algorithms, although it is slightly inferior to DYNMOGA at the fifth time step. In addition, we also observe that the number of community division on overall network is 7, which is close to the average number of communities for all 12 networks. As for the NMI between the ground truth division on the overall network and the communities detected for each network, we come to the same conclusion as the CPC network.
The detail statistic information of EM network
T is the number of time steps,
Dynamic community detection represents one of the most challenging problems with broad applications. In this paper, we present a graph regularized non-negative matrix factorization (DGR-SNMF) based approach to address the issue of DCD. In addition, we design an effective iterative algorithm based on multiplicative update rules, and provide the detailed proof process. Experimental results on multiple kinds of synthetic dynamic networks and two real-world datasets show that DGR-SNMF achieves better performance than other state-of-the-art algorithms, providing an insight into understanding the effect of historical geometric structure in detecting dynamic communities.
Even though our algorithm outperforms the other algorithms in terms of overall performance, the improvements on some networks are not significant. Therefore, it is worth further study about how to make best use of the historical geometric structure to detect dynamic communities. Furthermore, extending DGR-SNMF to detect overlapping communities in dynamic networks is also interesting and meaningful.
Footnotes
Acknowledgments
The authors wish to thank Francesco Folino for providing us the codes of DYNMOGA, including the generator of dynamic LFR benchmark and the results obtained by his method on these synthetic dynamic networks, Giulio Rossetti for the software generator of RDyn benchmark.
