USI-AUC: An evaluation criterion of community detection based on a novel link-prediction method

Abstract

Modularity Evaluation (ME) is usually used in community detection for evaluating the disjoint and overlapping communities. In this paper, two obvious defects of ME are revealed and proved, including the non-decreasing contribution of isolated nodes to modularity and lacking of appropriate measures on overlapping community. We also propose a new evaluation criterion, the USI-AUC, which is the Area Under the Curve (AUC), originated from link-prediction of Uniform-Structure-Information (USI) model. We test the new criterion on various datasets, and find that such criterion can avoid the issues exposed in ME.

Keywords

Evaluation of communities modularity evaluation USI model link prediction USI-AUC criterion

1. Introduction

Community structure is an important property in complex networks, with a variety of different types of networks showing strong community effects in society and nature[1, 2], especially in this Big Data era [3]. Taking the social software WeChat as an example, various individuals with common interests form multiple WeChat groups.

In recent years, researches in community detection have been very active, and how to discover valuable community structures in a network remains a hot topic in the field [4]. Researchers have proposed numerous community detection algorithms, which are mainly divided into two categories according to whether they allow overlapping, i.e., disjoint-community detection and overlapping-community detection. For disjoint-community detection, modularity optimization [5, 6], spectral-clustering[7, 8], hierarchical-clustering[9], and label-propagation methods are proposed[10, 11]. The disjoint-community detection algorithm was extended by Palla et al. [12] which allows nodes to belong to different communities. Thereby, it expands the field of overlapping-community researches. Typical methods include the clique-percolation[13], line graph and link partition[14], local expansion and optimization[15], and fuzzy detection methods[16].

With the continuous development of community detection algorithms, evaluation criterion of communities has also become a research hotspot. Because of the unknown community information in the actual network, evaluation strategies from various angles are utilized. However, there is no generally accepted evaluation method for community detection. There exist different evaluation criteria, such as Newman’s typical-modularity criterion[5] and omega index for disjoint communities[17, 18]. For the evaluation of overlapping communities, Pizzuti [19] proposed an overlapping-community evaluation formula based on vertex and edge density; however, the value increases along with the increasing number of communities. Additionally, some studies extended the evaluation metrics from disjoint communities to overlapping communities. Nicosia et al. [20] extended the modularity criterion in directed networks, and Nepusz et al.[16] and Shen et al.[21, 22] extended the modularity in undirected networks with widespread applications. Modularity and extended modularity are of great significance to study community structure. However, for evaluating communities, the ME (containing modularity and extended modularity criteria) also exhibits theoretical defects, including the increasing contribution of isolated nodes to modularity and lacking of appropriate measures on overlapping community. Taking the overlapping community as an example, Fig. 1 shows the original division and overlapping division of the Zachary dataset[23]. Obviously, it is more reasonable to consider nodes 3,9 as common nodes; however, a kind of extended modularity, EQ, is 0.351, which is lower than the modularity of original division (0.371).

Figure 1.

Diagram of original and overlapping communities. Nodes 3,9 are active in both the left community and the right community; therefore, it is reasonable to serve nodes 3,9 as common nodes in both communities. However, EQ cannot provide a reasonable evaluation.

Though this phenomenon was mentioned in Ref. [24], they did not do any theoretical analysis. In this study, the theoretical defects of ME are analyzed in detail through two theorems accompanied by theoretical proofs and data verification. In the analysis of the theoretical defects of ME, the area under the curve (AUC) accuracy of the link-prediction method based on a Uniform-Structure-Information (USI) model is proposed as an evaluation criterion defined as USI-AUC. This criterion allows a unified and effective evaluation of the disjoint and overlapping communities and we find that such criterion can avoid the issues exposed in ME.

The remainder of the paper is organized as follows. In Section 2, we report the related work on evaluating overlapping communities, especially the evaluation criteria without known real community, i.e., ME. In Section 3, we propose two theorems to verify the defects of ME. In Section 4, we present the USI-AUC-evaluation criterion based on the link-prediction method. In Section 5, we provide experimental verification of the two theorems and comparative analysis of USI-AUC and other criteria.

2. Related work

Unlike disjoint community detection, where a number of measures have been proposed for comparing identified partitions with the known partitions, only a few measures are suitable for a set of overlapping communities[24]. These measures are mainly divided into 2 categories, one is based on the data of labeled network, the other is the evaluation criterion which does not need the labeled data.

2.1 Evaluation criterion based on labeled network

In the case of labeled network, accuracy evaluation criterion of precision, recall and F1-score can be used to evaluate the community detection algorithm. Precision is the number of correctly detected overlapping nodes $N_{c}$ divided by the total number of detected overlapping nodes $N_{a}$ ; Recall is the number of correctly detected overlapping nodes $N_{c}$ divided by the true number of overlapping nodes $N$ ; F1-score is the harmonic mean of precision and recall.

$\displaystyle\text{precision}=\frac{N_{c}}{N_{a}},$ (1) $\displaystyle\text{recall}=\frac{N_{c}}{N},$ (2) $\displaystyle\text{F1-score}=\frac{2\text{precision}\cdot\text{recall}}{\text{% precision $+$ recall}},$ (3)

Lancichinetti et al. [15] has extended the notion of Normalized Mutual Information (NMI)[25] within the framework of information theory to account for the evaluation of overlapping communities. Omega Index[17] is the overlapping version of the Adjusted Rand Index. It is based on pairs of nodes in agreement in two communities. For labeled network datasets, NMI and omega are most wildly used measures.

2.2 Evaluation criterion without labeled network

In community detection studies, most occasions we do not know the true communities, so the evaluation criteria based on modularity are usually adopted. In the unweighted-undirected network, the value of modularity is defined as the proportion of inner edges in communities in the network minus an expected value which is the proportion of inner edges in communities under the same community distribution when the network is set to random network[5]. The underlying assumption of modularity metric is that the higher the densities of edges in communities are, the better the result of community division is. Assuming that $l$ communities have been divided as $P=\{C_{1},C_{2},\ldots C_{l}\}$ , the modularity $Q$ is defined as

$\displaystyle Q=\frac{1}{2m}\sum_{i=1}^{l}\sum_{v\in C_{i},w\in C_{i}}\left(A_% {vw}-\frac{k_{v}k_{w}}{2m}\right),$ (4)

where $A_{vw}$ is the adjacency matrix. $m$ represents the numbers of edges of network. $v, w$ mean the number of node. $k_{v}$ expresses the degree of node $v$ , and $C_{i}$ expresses the article $i$ community.

Figure 2.

Example of modularity calculation.

As shown in Fig. 2, the network contains 8 nodes and 11 edges, and its adjacency matrix is

$\displaystyle A=\left[\begin{matrix}0&1&1&1&0&0&0&0&\\ 1&0&0&1&0&0&0&0&\\ 1&0&0&1&0&0&0&0&\\ 1&1&1&0&1&1&0&0&\\ 0&0&0&1&0&1&1&1&\\ 0&0&0&1&1&0&0&0&\\ 0&0&0&0&1&0&0&1&\\ 0&0&0&0&1&0&1&0&\end{matrix}\right],$ (5)

The node degree of $v_{1}$ is ${{k}_{1}}=\sum^{8}_{w=1}{{A}_{1w}}=3$ . Similarly, ${{k}_{2}}=2,{{k}_{3}}=2,{{k}_{4}}=5,{{k}_{5}}=4,{{k}_{6}}=2,{{k}_{7}}=2,{{k}_{% 8}}=2$ . If the network is divided into two communities, according to Eq. (4) the modularity Q is calculated as:

$\displaystyle Q=\frac{1}{2\times 11}\left\{\left[\left(3-\frac{3\times\left(3+% 2+2+5\right)}{22}\right)+\left(2-\frac{2\times\left(3+2+2+5\right)}{22}\right)% \right.\right.\left.+\left(2-\frac{2\times\left(3+2+2+5\right)}{22}\right)+% \left(3-\frac{5\times\left(3+2+2+5\right)}{22}\right)\right]+\left[\left(3-% \frac{4\times\left(4+2+2+2\right)}{22}\right)+\left(1-\frac{2\times\left(4+2+2% +2\right)}{22}\right)\right.\left.\left.+\left(2-\frac{2\times\left(4+2+2+2% \right)}{22}\right)+\left(2-\frac{2\times\left(4+2+2+2\right)}{22}\right)% \right]\right\}=0.3140$ (6)

For overlapping-community detection, Nepusz et al.[16] proposed an extension method, $Q_{f}$ , to the modularity, $Q$ , by weighting each node associating with a community:

$\displaystyle Q_{f}=\frac{1}{2m}\sum_{i=1}^{l}\sum_{v\in C_{i},w\in C_{i}}% \alpha_{iv}\alpha_{iw}\left(A_{vw}-\frac{k_{v}k_{w}}{2m}\right),$ (7)

where

$\displaystyle\alpha_{iv}\in[0,1]\ \text{for all}\ 1\leqslant i\leqslant l,1% \leqslant v\leqslant N,$ $\displaystyle\sum_{i=1}^{l}\alpha_{iv}=1\ \text{for all}\ 1\leqslant v% \leqslant N,$ $\displaystyle 0<\sum_{v=1}^{N}\alpha_{iv}<N\ \text{for all}\ 1\leqslant i% \leqslant l.$

The total number of nodes in the network is $N$ , and the weighted coefficient, $\alpha_{iv}$ , is a measure of the strength of association between node $v$ and community $i$ . When the weighted coefficients are all 1, the extended modularity, $Q_{f}$ , degenerates into the modularity, $Q$ .

To simplify the computing process of weighted coefficient, Shen et al.[21] proposed a specific extension of the modularity, $Q$ :

$\displaystyle EQ=\frac{1}{2m}\sum_{i=1}^{l}\sum_{v\in C_{i},w\in C_{i}}\frac{1% }{O_{v}O_{w}}\left(A_{vw}-\frac{k_{v}k_{w}}{2m}\right),$ (8)

where $O_{v}$ expresses the number of communities in which node $v$ is subordinate.

3. The theoretical defects of ME

Based on the definition of modularity, $Q$ , and extended modularity, $Q_{f}$ or EQ, we propose the following definition:

.

The contribution of a single node, $v_{0}$ , to the extended modularity, $Q_{f}$ , in an arbitrary community, $C_{i}$ , is defined as

$\displaystyle\Delta_{v_{0}}^{C_{i}}=\frac{1}{2m}\sum_{w\in C_{i}}\alpha_{iv_{0% }}\alpha_{iw}\left(A_{v_{0}w}-\frac{k_{v_{0}}k_{w}}{2m}\right){\rm I}_{C_{i}}(% v_{0}),$ (9)

where

$\displaystyle{\rm I}_{C_{i}}(x)=\left\{\begin{array}[]{rcl}&&1,x\in C_{i}\\ &&0,x\notin C_{i}\\ \end{array}\right..$ (10)

It is easy to know from the definition 1, $Q_{f}=\sum_{i}\sum_{j}\Delta_{v_{j}}^{C_{i}}$ .

Figure 3.

Example of calculating the contribution of a single node to the extended modularity.

Take Fig. 3 as an example, the modularity of the contribution of node $v_{4}$ to the community $C_{1}$ is

$\displaystyle\Delta_{{{v}_{4}}}^{{{C}_{1}}}=\frac{1}{2\times 11}\times\left[{{% \alpha}_{14}}{{\alpha}_{11}}\left({{A}_{41}}-\frac{{{k}_{4}}{{k}_{1}}}{22}% \right)+{{\alpha}_{14}}{{\alpha}_{12}}\left({{A}_{42}}-\frac{{{k}_{4}}{{k}_{2}% }}{22}\right)\right.\left.+{{\alpha}_{14}}{{\alpha}_{13}}\left({{A}_{43}}-% \frac{{{k}_{4}}{{k}_{3}}}{22}\right)+{{\alpha}_{14}}{{\alpha}_{14}}\left({{A}_% {44}}-\frac{{{k}_{4}}{{k}_{4}}}{22}\right)\right]=\frac{1}{22}\left(\frac{7}{2% 2}{{\alpha}_{14}}{{\alpha}_{11}}+\frac{12}{22}{{\alpha}_{14}}{{\alpha}_{12}}+% \frac{12}{22}{{\alpha}_{14}}{{\alpha}_{13}}-\frac{25}{22}{{\alpha}_{14}}{{% \alpha}_{14}}\right)$ (11)

If the weight coefficients are given in accordance with EQ, i.e., ${{\alpha}_{14}}=\frac{1}{2},{{\alpha}_{11}}={{\alpha}_{12}}={{\alpha}_{13}}=1$ , then $\Delta_{{{v}_{4}}}^{{{C}_{1}}}=0.0191$ , similarly, the modularity of the contribution of node $v_{4}$ to the community $C_{2}$ is $\Delta_{{{v}_{4}}}^{{{C}_{2}}}=-0.0191$ (this value is independent with $\Delta_{{{v}_{4}}}^{{{C}_{1}}}=0.0191$ ), and then the contribution of node $v_{1}$ to the modularity in community $C_{2}$ is 0. The reason is that node $v_{1}$ does not belong to community $C_{2}$ , ${\rm I}_{C_{2}}(v_{1})=0$ .

On the basis of Definition 1 as well as the modularity definition of Nenwman, and the extension-modularity definition of Nepusz et al.[16] or Shen et al.[21], we obtain the following two theorems.

.

There must be a disjoint community structure, in which an arbitrary single node is a non-negative contributor to the whole modularity, and modularity does not increase when deleting arbitrary nodes in this community structure. Specifically, when the node is isolated, deleting the isolated node does not increase modularity.

Proof..

Assuming that $l$ communities have been divided as $P=\{C_{1},C_{2},\ldots C_{l}\}$ , for arbitrary node $v_{0}$ , its degree is $k_{v_{0}}$ . Assume that node $v_{0}$ belongs to the article $i$ community. The contribution of $v_{0}$ to the whole modularity in community $C_{i}$ is

$\displaystyle\Delta_{v_{0}}^{C_{i}}=\frac{1}{2m}\sum_{w\in C_{i}}\left(A_{v_{0% }w}-\frac{k_{v_{0}}k_{w}}{2m}\right),$ (12)

given that

$\displaystyle\sum_{i=1}^{l}\sum_{w\in C_{i}}\left(A_{v_{0}w}-\frac{k_{v_{0}}k_% {w}}{2m}\right)=0.$ (13)

Therefore, there must be a disjoint community structure, where node $v_{0}$ is a non-negative contributor to the whole modularity.

If $\sum_{w\in C_{i}}(A_{v_{0}w}-\frac{k_{v_{0}}k_{w}}{2m})$ is 0 for all $i=1,2,\ldots,l$ , then the modularity is constant when deleting node $v_{0}$ ; otherwise there must be $\sum_{w\in C_{i}}(A_{v_{0}w}-\frac{k_{v_{0}}k_{w}}{2m})>0$ , resulting in decreased modularity when deleting node $v_{0}$ . In summary, modularity does not increase when deleting arbitrary nodes in this community structure. ∎

This theorem indicates that modularity is sensitive to isolated nodes, although the division of isolated nodes is reasonable, the criterion of modularity also decreases, which cannot reflect the rationality in the division of isolated nodes.

.

If moving a node to a different community does not cause changes in the weighted coefficients of other nodes, the influence of the node on the whole modularity is only related to the change in contribution of this node, and the subsequent change in contribution of the other nodes remains the same as that of this node.

Proof..

Upon changing the community to which arbitrary node $v_{0}$ belongs, the change in contribution to modularity of arbitrary community $C_{i}$ is

$\displaystyle\sum_{w\in C_{i}^{new}}\Delta_{w}^{C_{i}^{new}}-\sum_{w\in C_{i}^% {old}}\Delta_{w}^{C_{i}^{old}}$ $\displaystyle=\Delta_{v_{0}}^{C_{i}^{new}}-\Delta_{v_{0}}^{C_{i}^{old}}+\sum_{% w\in C_{i}^{new}\setminus{v_{0}}}\Delta_{w}^{C_{i}^{new}}-\sum_{w\in C_{i}^{% old}\setminus{v_{0}}}\Delta_{w}^{C_{i}^{old}}$ $\displaystyle=\frac{1}{2m}\sum_{w\in C_{i}^{new}\setminus{v_{0}}}\alpha_{iv_{0% }}^{new}\alpha_{iw}^{new}\left(A_{v_{0}w}-\frac{k_{v_{0}}k_{w}}{2m}\right){\rm I% }_{C_{i}^{new}}(v_{0})$ $\displaystyle-\frac{1}{2m}\sum_{w\in C_{i}^{old}\setminus{v_{0}}}\alpha_{iv_{0% }}^{old}\alpha_{iw}^{old}\left(A_{v_{0}w}-\frac{k_{v_{0}}k_{w}}{2m}\right){\rm I% }_{C_{i}^{old}}(v_{0})$ $\displaystyle+\frac{1}{2m}\sum_{v\in C_{i}^{new}\setminus{v_{0}}}\sum_{w\in C_% {i}^{new}\setminus{v}}\alpha_{iv}^{new}\alpha_{iw}^{new}\left(A_{vw}-\frac{k_{% v}k_{w}}{2m}\right){\rm I}_{C_{i}^{new}}(v)$ $\displaystyle-\frac{1}{2m}\sum_{v\in C_{i}^{old}\setminus{v_{0}}}\sum_{w\in C_% {i}^{old}\setminus{v}}\alpha_{iv}^{old}\alpha_{iw}^{old}\left(A_{vw}-\frac{k_{% v}k_{w}}{2m}\right){\rm I}_{C_{i}^{old}}(v)$ $\displaystyle=2\left(\Delta_{v_{0}}^{C_{i}^{new}}-\Delta_{v_{0}}^{C_{i}^{old}}% \right),$ (14)

where $\Delta_{w}^{C_{i}^{new}}$ expresses the contribution of node $w$ to $Q_{f}$ in community $C_{i}$ after moving node $v_{0}$ and $\Delta_{w}^{C_{i}^{old}}$ expresses the contribution of node $w$ to the $Q_{f}$ in community $C_{i}$ before moving node $v_{0}$ . Derivation results showed that changes in modularity of community $C_{i}$ were only related to changes in the contribution of node $v_{0}$ .

Because Eq. (14) holds for arbitrary community $C_{i}$ , the influence of node $v_{0}$ on the other involved communities is only related to the change in contribution of node $v_{0}$ to the involved communities; the influence of node $v_{0}$ on the whole modularity is only related to the change in contribution of

Lemma 1 shows that investigating the impact of moving a node to a different community in the absence of changes in the coefficients of other nodes is only necessary to consider changes in the contribution of the coefficient of the moved node without regard for the subsequent change in contribution of other nodes.

For example, in Fig. 3, moving the common node $v_{4}$ to community $C_{1}$ and $C_{2}$ , respectively, according to Definition 1, not only the contribution of node $v_{4}$ to the modularity is changed, the contributions of nodes $v_{1},v_{2},v_{3},v_{5},v_{6},v_{7},v_{8}$ in the communities where node $v_{4}$ moves before and after are changed. The significance of this lemma is that we only need to examine the node $v_{4}$ on their own contribution to the change of modularity, without regard to the contributions of other nodes, and facilitate the proof of Theorem 2. In this example, the influence of node $v_{4}$ on the whole modularity when moving the common node $v_{4}$ to community $C_{1}$ is $2\ (\Delta_{v_{4}}^{C_{1}^{\textit{new}}}-\Delta_{v_{4}}^{C_{1}^{\textit{old}}% })+2(\Delta_{v_{4}}^{C_{2}^{\textit{new}}}-\Delta_{v_{4}}^{C_{2}^{old}})=2(0.0% 382-0.0191)+2(0-(-0.0191))=0.0764$ . The influence of node $v_{4}$ on the whole modularity when moving the common node $v_{4}$ to community $C_{2}$ is $2(\Delta_{v_{4}}^{C_{1}^{\textit{new}}}-\Delta_{v_{4}}^{C_{1}^{old}})+2(\Delta% _{v_{4}}^{C_{2}^{\textit{new}}}-\Delta_{v_{4}}^{C_{2}^{old}})=2(0-0.0191)+2(-0% .0382-(-0.0191))=-0.0764$ .

.

There must be a disjoint community in which modularity is no less than that of any given overlapping community.

Proof..

For any given overlapping community, assuming that arbitrary nodes $v_{1},v_{2},\ldots,v_{p}$ serve as common nodes in overlapping communities, each node belongs to $q_{v_{1}},q_{v_{2}},\ldots,q_{v_{p}}$ overlapping communities, respectively. Then, the contribution of common node $v_{j}$ to the whole modularity is

$\displaystyle\Delta_{v_{j}}=\frac{1}{2m}\sum_{i=1}^{q_{v_{j}}}\sum_{w\in C_{v_% {j}}^{i}}\alpha_{iv_{j}}\alpha_{iw}\left(A_{v_{j}w}-\frac{k_{v_{j}}k_{w}}{2m}\right)$ (15)

where $C_{v_{j}}^{i}$ expresses the article $i$ community containing node $v_{j}$ . The contribution of node $v_{j}$ to the whole modularity in community $C_{v_{j}}^{i}$ is

$\displaystyle\Delta_{v_{j}}^{C_{v_{j}}^{i}}=\frac{1}{2m}\sum_{w\in C_{v_{j}}^{% i}}\alpha_{iv_{j}}\alpha_{iw}\left(A_{v_{j}w}-\frac{k_{v_{j}}k_{w}}{2m}\right)$ (16)

Now, inspect the common nodes one by one.

(i)

When moving one node (e.g., node $v_{1}$ ) does not cause changes in the weighted coefficients of other nodes, according to Lemma 1, we should only consider the change of contribution of node $v_{1}$ . In the $q_{v_{1}}$ overlapping communities to which node $v_{1}$ belongs, there must be a community in which node $v_{1}$ has the maximum contribution to modularity. Assuming the contribution in the first overlapping community $C_{v_{1}}^{1}$ reaches the maximum, and the contribution is $\Delta_{v_{1}}^{C_{v_{1}}^{1}}$ , according to Eq. (16), moving node $v_{1}$ as a non-common node to community $C_{v_{1}}^{1}$ does not reduce modularity. Therefore, after moving node $v_{1}$ , the modularity of the overlapping community composed of $(p-1)$ common nodes $\geqslant$ the modularity of the overlapping community composed of $p$ common nodes, and the whole modularity is not reduced. Similarly, after moving common nodes as non-common nodes $(p-1)$ times, the overlapping community is converted to a disjoint community without reducing the whole modularity.

(ii)

When moving one node (e.g., node $v_{1}$ ) causes changes in the weighted coefficients of other nodes, inspect the common nodes separate from the whole. The influence on modularity of moving these common nodes together is equivalent to the influence of moving these common nodes individually, assuming that moving one node does not cause changes in the weighted coefficients of other nodes. From (i), we know that the modularity is not reduced when moving these nodes as non-common nodes and thereby creating a disjoint community.

∎

Theorem 2 shows that there is no reasonable overlapping community from the perspective of modularity. If we once had a â€œreasonableâ€ overlapping community, it would be possible to obtain a more reasonable disjoint community in the view of modularity by adjusting common nodes. Therefore, even if the division of the overlapping community is reasonable, it is difficult to give a rational evaluation using $Q_{f}$ and EQ.

According to this theorem, moving the common node $v_{4}$ to community $C_{1}$ and $C_{2}$ , respectively, there must be a way to move the node to get a greater modularity. The community is shown as Fig. 4 after moving node $v_{4}$ to community $C_{1}$ and $C_{2}$ . If the weight coefficients are given in accordance with EQ, the modularity which node $v_{4}$ serves as common node is 0.2376, and modularity changes to $0.2376+0.0764=0.3140$ and $0.2376-0.0764=0.1612$ when moving it to community $C_{1}$ and $C_{2}$ , respectively.

Figure 4.

An example to explain the Theorem 2.

To address the theoretical defects of modularity evaluation, we present an evaluation criterion using the accuracy of a novel link prediction method.

4. USI-AUC-evaluation criterion for community detection

There is a strong positive correlation between the community division predicted by the community detection algorithm and link prediction. If link prediction depends only on community structure information, the more reasonable the community structure, the more accurate link prediction will be. Conversely, the more precise the link prediction, the more rational the community structure will be. Based on this analysis, link prediction can be performed using only community information, taking the AUC determined by link prediction as the evaluation criterion for community detection.

The existing methods for link prediction using community information are divided into the following two types: calculating the similarity index from the whole network to the specific community interior[26, 27] and regarding the community information as a kind of property information and weighting it to the similarity index[28]. Both of these methods regard community information as supplementary information in order to improve the accuracy of link prediction. However, the link-prediction method for evaluating the performance of community detection should take only the community information into consideration without other information. Therefore, the above two methods are not applicable to the evaluation of community detection. This paper presents an evaluation criterion based on link-prediction using the USI model that adopts only community information and uses AUC accuracy to evaluate community detection (USI-AUC). This method can be applied to assess disjoint, overlapping, and hierarchical communities among others.

First, we provide the definition of the USI model, then present to the link prediction algorithm based on the USI model, followed by evaluation of the USI-AUC metric. This process is illustrated in Fig. 5.

Figure 5.

Diagram of the USI-AUC-evaluation criterion for community detection.

4.1 The definition of Uniform-Structure-Information model

.

USI model.

(i)
$A_{0}$ represents the set of all vertexes in the network. Define the power set as:

$\displaystyle A_{1}=2^{A_{0}},A_{2}=2^{A_{1}},A_{3}=2^{A_{2}},\ldots,A_{i}=2^{% A_{i-1}},i=1,2,3,\ldots$

For arbitrary $i$ , elements with specific relationship, $f$ , consist of a family set $\mathfrak{D}$ , $\mathfrak{D}=\{D_{1},D_{2},$ $\ldots,D_{n}\}$ :

$\displaystyle f:(A_{i})^{k}\rightarrow\mathfrak{D},\mathfrak{D}\subset(A_{i})^% {k},k=1,2,\ldots,|A_{i}|,$ (17)

where

$\displaystyle(A_{i})^{k}=\left\{\{A_{i1},A_{i2},\ldots,A_{ik}\}:A_{ij}\in A_{i% },1\leqslant j\leqslant k\right\}.$

Specifically, when $i=$ 0,

$\displaystyle f:(A_{0})^{k}\rightarrow\mathfrak{D}_{0},\mathfrak{D}_{0}\subset% (A_{0})^{k}.$
(ii)
Define a discrete metric space based on the elements of family set $\mathfrak{D}$ :

$\displaystyle d_{i}(x,y)=\left\{\begin{array}[]{ll}p_{i}(p_{i}\leqslant 1),&% \mbox{$x\neq y$ and $x,y\in D_{i}$}\\ 0,&\mbox{ $x=y$ and $x,y\in D_{i}$}\\ \end{array}\right..$ (18)

The USI model establishes a concise description for hierarchical and overlapping relationships of nodes in the network and can explain various relationships between the connection and organization of nodes in the network. Definition 2 (i) describes that nodes or sets can consist of a new set in which elements have specific relationships or properties. In Definition 2 (ii), the metric $p$ in the discrete metric space can be interpreted as the probability that internal elements come in contact with each other, with this probability identical for all elements within a given set.

From the definition of the USI model, the elements of a set could consist of sets or non-sets. When the elements are non-sets, the metric $p$ expresses the probability between nodes directly. When the elements are sets, metric $p$ expresses the probability between sets. For example, all the students in a class comprise a set consisting of non-sets. The metric $p$ describes the contact probability for any two students in the class, as well as for all remaining students. As another example, all the classes in a school comprise another set consisting of sets. In this case, the metric $p$ describes probabilities between any two classes, thus indirectly implicating contact probabilities between any two arbitrary students between classes.

.

The order of elements.

In the USI model, elements of $A_{k}$ and its nonempty subsets are called k-order elements $(k=0,1,2,3,\ldots)$ . Elements where $k\geqslant 2$ order are called high-order elements.

.

The order of sets.

In the USI model, the order of elements contained within a set is referred to as set order.

For example, considering a network with 3 nodes, $A_{0}=\{1,2,3\}$ , Obviously, the elements 1, 2, 3 are all 0-order elements according to Definition 3, and $A_{0}$ is a 0-order set based on Definition 4. Because the element $\{1,2\}\in 2^{A_{0}}=\left\{\varnothing,\{1\},\{2\},\{3\},\{1,2\},\right.\{1,3% \},\{2,3\},\left.\{1,2,3\}\right\}={{A}_{1}}$ , element $\{1,2\}$ is a 1-order element. Naturally, $\left\{\{1,2\}\right\}$ is a 1-order set, owing to the 1-order element $\{1,2\}$ as the basic of Definition 4. As another example, ${{\Lambda}_{1}}=\left\{\{1,2\},\{1,3\},\{1,2,3\}\right\}$ is a nonempty subset of $A_{1}$ , ${{\Lambda}_{1}}\subseteq{{A}_{1}}$ , so the element of ${\Lambda}_{1}$ , e.g., $\{1,2,3\}$ , is a 1-order element, and ${\Lambda}_{1}$ is a 1-order set. Similarly, $\left\{\{1,3\},\{1,2,3\}\right\}\in{{2}^{{{\Lambda}_{1}}}}\subseteq{{2}^{{{A}_% {1}}}}={{A}_{2}}$ , therefore the element $\left\{\{1,3\},\{1,2,3\}\right\}$ is a 2-order element.

The USI model is a generalization of the hierarchical-structure model[29] and the stochastic block model[30, 31]. From the USI model, if and only if each set contains two elements and each element belongs only to the corresponding order set, the USI model degenerates to a hierarchical-structure model that could contain high-order elements. When the highest order of a set is one and the intersection of all elements is the empty set, the USI model degenerates to a stochastic block model.

When the specific relationship describes that the division results of the community detection algorithm describe elements consisting of a set, the AUC of link prediction for the USI model is used as the evaluation criterion for the community detection algorithm. Because the USI model does not require that 0-order elements belong to the set, this criterion can evaluate both the disjoint and overlapping communities.

.

A $k(k\geqslant 1)$ -order set $\mathfrak{X}^{k}=\{X^{k}_{1},X^{k}_{2},\ldots,X^{k}_{n}\}$ can be reduced in order to $(k-1)$ by performing union operations on elements:

$\displaystyle g:\mathfrak{X}^{k}\rightarrow\bigcup_{i=1}^{n}X^{k}_{i}.$ (19)

Clearly, $\bigcup_{i=1}^{n}X^{k}_{i}$ is a $(k-1)$ -order set and marked as $\mathfrak{X}^{k-1}$ .

.

A $k(k\geqslant 2)$ -order set $\mathfrak{X}^{k}=\{X^{k}_{1},X^{k}_{2},\ldots,X^{k}_{n}\}$ can be reduced in order to a 0-order set by performing k-iterations of the element-union operation $g$ .

Proof..

According to Pproposition 1, a k-order set can be reduced in order to a $(k-1)$ -order set by performing the element-union operation $g$ . The $(k-1)$ -order set can be reduced in order to a $(k-2)$ -order set by subsequent operations and so on down to a 0-order set. ∎

For example, we assume a 3-order set,

$\displaystyle{{\mathfrak{X}}^{3}}=\left\{X_{1}^{3},X_{2}^{3},X_{3}^{3}\right\}% =\left\{\left\{\left\{\left\{1,2,3\right\},\left\{6\right\}\right\},\left\{% \left\{4,5\right\}\right\}\right\},\left\{\left\{\left\{6\right\},\left\{7,8,9% \right\}\right\}\right\},\left\{\left\{\left\{6,9\right\},\left\{9\right\}% \right\}\right\}\right\}.$ (20)

Evidently, $X_{1}^{3}=\left\{\left\{\left\{1,2,3\right\},\left\{6\right\}\right\},\left\{% \left\{4,5\right\}\right\}\right\}$ , $X_{2}^{3}=\left\{\left\{\left\{6\right\},\left\{7,8,9\right\}\right\}\right\}$ , $X_{3}^{3}=\left\{\left\{\left\{6,9\right\},\left\{9\right\}\right\}\right\}$ are 3-order elements. According to Corollary 1, the 3-order set is reduced to 2-order set,

$\displaystyle{{\mathfrak{X}}^{2}}=\bigcup\limits_{i=1}^{3}{X_{i}^{3}}=\left\{% \left\{\left\{1,2,3\right\},\left\{6\right\}\right\},\left\{\left\{4,5\right\}% \right\}\right\}\cup\left\{\left\{\left\{6\right\},\left\{7,8,9\right\}\right% \}\right\}\cup\left\{\left\{\left\{6.9\right\},\left\{9\right\}\right\}\right% \}=\left\{\left\{\left\{1,2,3\right\},\left\{6\right\}\right\},\left\{\left\{4% ,5\right\}\right\},\left\{\left\{6\right\},\left\{7,8,9\right\}\right\},\left% \{\left\{6,9\right\},\left\{9\right\}\right\}\right\},$ (21)

the 2-order set is reduced to 1-order set,

$\displaystyle{{\mathfrak{X}}^{1}}=\bigcup\limits_{i=1}^{4}{X_{i}^{2}}=\left\{% \left\{1,2,3\right\},\left\{6\right\}\right\}\cup\left\{\left\{4,5\right\}% \right\}\cup\left\{\left\{6\right\},\left\{7,8,9\right\}\right\}\cup\left\{% \left\{6,9\right\},\left\{9\right\}\right\}=\left\{\left\{1,2,3\right\},\left% \{6\right\},\left\{4,5\right\},\left\{7,8,9\right\},\left\{6,9\right\},\left\{% 9\right\}\right\},$ (22)

and the 1-order set is reduced to 0-order set,

$\displaystyle{{\mathfrak{X}}^{0}}=\bigcup\limits_{i=1}^{3}{X_{i}^{1}}=\left\{1% ,2,3\right\}\cup\left\{6\right\}\cup\left\{4,5\right\}\cup\left\{7,8,9\right\}% \cup\left\{6,9\right\}\cup\left\{9\right\}=\left\{1,2,3,4,5,6,7,8,9\right\}.$ (23)

.

A $k(k\geqslant 2)$ -order set $\mathfrak{X}^{k}=\{X^{k}_{1},X^{k}_{2},\ldots,X^{k}_{n}\}$ in which elements can represent a $(k-1)$ -order set can be reduced in order 1 by $(k-1)$ iterations of the element-union operation $g$ on $(k-1)$ -order sets $X_{1}$ to $X_{n}$ .

Proof..

Regarding the elements of k-order sets as the $(k-1)$ -order set, according to Corollary 1, each $(k-1)$ -order set can be reduced in order to a 0-order set by $(k-1)$ iterations of the element-union operation $g$ , thereby reducing the original k-order set to a 1-order set consisting of 0-order sets as elements and we can obtain the 1-order set. ∎

According to Corollary 2, ${{\mathfrak{X}}^{3}}=\left\{{X}_{1}^{3},{X}_{2}^{3},{X}_{3}^{3}\right\}$ , The 3-order elements ${X}_{1}^{3},{X}_{2}^{3},{X}_{3}^{3}$ are reduced to 2-order elements,

$\displaystyle{X}_{1}^{2}=\left\{\left\{1,2,3\right\},\left\{6\right\}\right\}% \cup\left\{\left\{4,5\right\}\right\}=\left\{\left\{1,2,3\right\},\left\{6% \right\},\left\{4,5\right\}\right\},$ $\displaystyle{X}_{2}^{2}=\left\{\left\{6\right\},\left\{7,8,9\right\}\right\},$ $\displaystyle{X}_{3}^{2}=\left\{\left\{6,9\right\},\left\{9\right\}\right\}.$

and the 2-order elements ${X}_{1}^{2},{X}_{2}^{2},{X}_{3}^{2}$ are reduced to 1-order elements,

$\displaystyle{X}_{1}^{1}=\left\{1,2,3\right\}\cup\left\{6\right\}\cup\left\{4,% 5\right\}=\left\{1,2,3,4,5,6\right\},$ $\displaystyle{X}_{2}^{1}=\left\{6\right\}\cup\left\{7,8,9\right\}=\left\{6,7,8% ,9\right\},$ $\displaystyle{X}_{3}^{1}=\left\{6,9\right\}\cup\left\{9\right\}=\left\{6,9% \right\}.$

So, we get the 1-order set,

$\displaystyle{{\mathfrak{X}}^{1}}=\left\{{X}_{1}^{1},{X}_{2}^{1},{X}_{3}^{1}% \right\}=\left\{\left\{1,2,3,4,5,6\right\},\left\{6,7,8,9\right\},\left\{6,9% \right\}\right\}.$ (24)
4.2 USI-AUC criterion based on a link-prediction method

Here, the process of link prediction was performed on a training set. First, the network set $E$ was divided into two parts, and the training set, $E^{T}$ , was treated as known information, whereas the testing set (probe set) $E^{P}$ was used for testing, with no information allowed for prediction. This resulted in $E^{T}\bigcup E^{P}=E,E^{T}\bigcap E^{P}=\phi$ . In this paper, the training set contained 90% links, with the remaining 10% links consisting of the probe set.

In this section, we regarded the community division of community detection algorithm as the partition of sets in the USI model and estimated the metric $p$ in discrete metric space. We then provided the score for link prediction based on the USI model to obtain the USI-AUC-evaluation criterion.

4.2.1 The partition of sets

For the 0-order sets, $i=0,k=1,2,\ldots|A_{0}|$ , according to the results divided using the community detection algorithm as the specific relationship, $f$ , partition the 0-order sets. Assuming that $C_{i}$ is a community divided using the community detection algorithm, the division is $P=\{C_{1},C_{2},\ldots,C_{n}\}$ . Therefore, the specific relationship $f_{1}$ is

$\displaystyle f_{1}:(A_{0})^{k}\rightarrow\mathfrak{D}_{1k},\mathfrak{D}_{1k}% \subset(A_{0})^{k},k=1,2,\ldots,|A_{0}|,$ (25)

$\displaystyle\mathfrak{D}_{1k}=f_{1}[(A_{0})^{k}]=\{C_{i}:|C_{i}|=k,C_{i}\in P\}.$ (26)

For 1-order sets, if the community divided using the algorithm do not contain hierarchical-structure information, considering interactions between two 0-order sets, the specific relationship, $f_{2}$ , defines as arbitrary two 0-order sets consisting a 1-order set.

Assume

$\displaystyle D_{i}\in\bigcup_{k=1}^{|A_{0}|}\mathfrak{D}_{1k}=\mathfrak{D}_{1}.$ (27)

When $i=1$ and $k=2$ , according to a specific relationship, $f_{2}$ :

$\displaystyle f_{2}:(A_{1})^{2}\rightarrow\mathfrak{D},\mathfrak{D}\subset(A_{% 1})^{2},$ (28)

we obtain

$\displaystyle\mathfrak{D}=f_{2}[(A_{1})^{2}]=\{\{D_{i},D_{j}\}:\forall D_{i},D% _{j}\in\mathfrak{D_{1}},D_{i}\neq D_{j}\}.$ (29)

For 1-order and higher-order sets, if the community divided using the algorithm contains hierarchical-structure information, regard the hierarchical information as specific relationships, $f_{3}$ , consisting of 1-order or higher-order sets.

$\displaystyle f_{3}:(A_{i})^{k}\rightarrow\mathfrak{D},\mathfrak{D}\subset(A_{% i})^{k},k=1,2,\ldots,|A_{i}|.$ (30)

4.2.2 Estimation of metric

p

in discrete metric space

According to the difference in set order, the estimation method for metric $p$ is divided into three cases: estimation according to the 0-order set, estimation according to the 1-order set, and estimation according to the high-order set. Specific algorithms are provided as follows.

(i)
The estimation of metric $p$ according to the 0-order set $G_{0}$ .

Assume

$\displaystyle|G_{0}|=N_{1},$ (31)

where $|\cdot|$ is the cardinality of set $G_{0}$ , meaning that the number of nodes in set $G_{0}$ is $N_{1}$ .

In a set consisting of only 0-order elements, the distinction between the elements is whether they are connecting or non-connecting. Metric $p$ describes the connecting probability. Assume that the number of links between nodes in set $G_{0}$ is a random variable $X$ , such that $X$ follows a binomial distribution, $B(N,p)$ , where $N$ represents the maximum number of possible links, $N=C_{N_{1}}^{2}$ . Maximum-likelihood estimation is used to estimate matric $p$ .

The likelihood function is

$\displaystyle L(p,x)=p^{x}(1-p)^{N-x},x=0,1,2,\ldots,N,$ (32)

where $x$ is the number of actual links between nodes in set $A_{0}$ .

Let

$\displaystyle\frac{dL(p,x)}{dp}=0.$ (33)

We can obtain

$\displaystyle\widehat{p}=\frac{x}{N}.$ (34)

Figure 6.
Example of estimating the matric $p$ in 0-order set.

As shown in Fig. 6, the 0-order set has 10 nodes and 12 edges, and the estimation of metric $p$ is $\widehat{p}={12}/{C_{10}^{2}={4}/{15}\;}\;$ .
(ii)
The estimation of metric $p$ according to the 1-order set $G_{1}$

$\displaystyle G_{1}=\{S_{i}:S_{i}\text{ is 1 order element,}i=1,2,3,\ldots\},$ (35)

$\displaystyle|G_{1}|=K.$ (36)

There may be intersections between 1-order elements divided using the overlapping-community detection algorithm:

$\displaystyle\left|S_{i}\backslash\bigcap_{i=1}^{K}S_{i}\right|=k_{i}.$ (37)

The maximum number of possible links between 1-order elements is

$\displaystyle N=\sum_{i\neq j}k_{i}k_{j}.$ (38)

Therefore, metric $p$ for set $G_{1}$ is defined as the connecting probability between 1-order elements. The problem is also transformed into the estimation of $p$ according to a binomial distribution, $B(N,p)$ , with the method the same as that presented in (i).

As Fig. 7a shows, there is no intersection of 1-order sets, we only need to consider the actual number of edges (6) and probably the largest number of edges ( $6\times 7=42$ ), so the estimation of metric $p$ is $\widehat{p}={6}/{\left(6\times 7\right)}\;={1}/{7}\;$ . And there is an intersection in Fig. 7b, we only consider the actual number of edges (7) and probably the largest number of edges ( $6\times 7=42$ ) except the intersection. The estimation of metric $p$ is $\widehat{p}={7}/{\left(9\times 5\right)}\;={7}/{45}\;$ .

Figure 7.
Example of estimating the matric $p$ in 1-order sets.

(iii)
The estimation of metric $p$ according to the high-order set $G_{k}(k\geqslant 2)$ .

According to Corollary 2, reduce the order of set $G_{k}$ to a 1-order set, and then resolve the problem using case (ii).

4.2.3 USI-model-based link-prediction scoring

For the overlapping-community detection algorithm, a node can be subordinate to multiple communities; therefore, there exist multiple channels for two nodes to generate a link, which is very similar to human communications. Each increase in one contacting channel indicates an increase in the probability of two nodes consisting of a link. For the USI model, one node can belong to different sets; therefore, calculate the parallel probability of contacts between node pairs through each channel as a link-prediction score.

$\displaystyle s_{xy}=1-\prod_{i=1}^{N_{xy}}(1-p_{xy}^{i}),$ (39)

where $s_{xy}$ represents the final link-prediction score of node pair $x y$ , $p_{xy}^{i}$ is the contacting probability of node pair $x y$ in an article $i$ common community, and $N_{xy}$ represents the number of common communities in which node pair $x y$ belongs.

4.3 USI-model-based evaluation criterion for community detection

According to the final score of node pair $s_{xy}$ , use the test set to compute the AUC[32], which is the area under the receiver operating characteristic curve, to quantify the accuracy of link prediction. AUC represents the probability that the link-prediction score of a randomly chosen missing link is higher than a nonexistent link. For $n$ independent comparisons, if there are $n^{\prime}$ comparisons where the missing link returns a higher score and $n^{\prime\prime}$ comparisons where the missing and nonexistent links return the same score, define the AUC as

$\displaystyle\textit{AUC}=\frac{n^{\prime}+0.5n^{\prime\prime}}{n}.$ (40)

This AUC is defined as the evaluation criterion based on the USI model (USI-AUC). The steps of this algorithm are listed in Table 1.

Table 1

Algorithm steps.

Step1:	Divide the network into training set and test set;
Step2:	According to the results of community detection algorithm, P, follow the Eqs (26), (29) and (30) to partition the 0-order sets, 1-order sets and high-order sets;
Step3:	Estimate the matric $p$ on each set. 0-order sets and 1-order sets follow the Eq. (34), and high-order sets need to reduce order to 1-order sets according to corollary 2 and follow the Eq. (34) to estimate the matric $p$ ;
Step4:	Calculate the score $s_{xy}$ of link prediction according to Eq. (39);
Step5:	According to the score of link prediction, compute the AUC of link prediction on the test set, i.e., USI-AUC.

5. Experimental verification

In order to reduce computational complexity and simplify the problem, here, we only considered partitions of 0-order sets. First, the experiment focused on EQ to verify the validity of the theorems. We then comparatively analyzed the evaluation effects of the USI-AUC criterion and the modularity criterion to disjoint and overlapping communities using several datasets, At last we tested the statistical significance for the comparison of the USI-AUC criterion and other criteria.

5.1 Verification of the validity of the two theorems

5.1.1 Experimental verification of Theorem 1

Using the classical Newman-modularity algorithm[5] on the Dolphin dataset[33] for partitioning communities resulted in five divided communities (see Table 2), with modularity criterion for this division at $Q=0.5086$ . We deleted one node in the network individually and tested modularity.

Table 2
Partitions of the Dolphin dataset using the Newman-modularity algorithm. Each column indicates a community, and the contents are node numbers

1	2	3	4	5
		2,6,7,10,14,18,		13,15,17,21,34,
4,9,37,40,60	1,3,8,11,20,29,	23,26,27,28,32,	5,12,16,19,22,	35,38,39,44,47,
	31,41,43,45,48	33,42,49,55,57,	24,25,30,36,46,	50,51,53,54,59,
		58,61	52,56	62

We observed reductions in modularity after deleting arbitrary nodes in these communities (see Table 3), and verified Theorem 3.

Table 3

Modularity values after deleting individual nodes in Dolphin dataset, with values representing the modularity after deleting a corresponding node. Each cell corresponds to the node number 1–62 in turn

Number	1	2	3	4	5	6	7	8	9	10
1–10	0.4899	0.4896	0.4941	0.4973	0.5035	0.4898	0.4803	0.4952	0.4984	0.4756
11–20	0.4889	0.5032	0.5041	0.4708	0.4782	0.4918	0.4812	0.4659	0.4792	0.5004
21–30	0.5049	0.4907	0.5039	0.4997	0.4844	0.4945	0.4945	0.4914	0.5015	0.4814
31–40	0.4826	0.5039	0.4945	0.4751	0.4921	0.5035	0.4987	0.4892	0.4782	0.5032
41–50	0.5045	0.4851	0.4837	0.4828	0.5067	0.4773	0.4995	0.4837	0.5039	0.4995
51–60	0.4954	0.4636	0.5030	0.4995	0.4881	0.4985	0.4992	0.4722	0.5041	0.4917
61–62	0.5039	0.5012

5.1.2 Experimental verification of Theorem 2

We chose one node as a common node in two communities and tested EQ. We obtained 248 EQ values, with 244 values lower than the original modularity $(Q=0.5086)$ , and the other four values exceeding the original modularity (0.5110, 0.5111, 0.5121, and 0.5136, corresponding to nodes 21, 21, 41, and 45, respectively). According to Theorem 2, we could obtain a disjoint community exhibiting higher modularity by adjusting common nodes. Moving node 21 from community five to community two, the modularity changed to $Q=0.5127$ , moving node 41 from community two to community five changed the modularity to $Q=0.5149$ , and moving node 45 from community two to community five changed the modularity to $Q=0.5184$ . Moving the three nodes simultaneously changed the modularity to $Q=0.5191$ .

We then chose two nodes as common nodes in two communities and tested the EQ. Using the above adjusting results, we obtained 7560 EQ values, with 7558 values lower than the original modularity $(Q=0.5191)$ and two EQ values exceeding the original modularity. When nodes 21 and 45 served as common nodes, the EQ achieved a maximum value of $\textit{EQ}=0.5216$ . According to Theorem 2, we transformed the overlapping community to a disjoint community by reversing the two nodes in communities one and five and deleting other common nodes. The modularity became $Q=0.5232$ , which exceeded the maximum value $(\textit{EQ}=0.5216)$ , thereby verifying Theorem 2. The adjusted communities are listed in Table 4.

Table 4
Adjusted communities, with each column indicating a community and its node numbers

1	2	3	4	5
		2,6,7,10,14,18,		13,15,17,34,35,
	1,3,8,11,20,29,	23,26,27,28,32,	5,12,16,19,22,	38,39,44,47,50,
4,9,37,40,60,21	31,43,48	33,42,49,55,57,	24,25,30,36,46,	51,53,54,59,62,
		58,61	52,56	41,45

It is possible to generalize the case involving two common nodes to cases involving multiple common nodes; therefore, it is unnecessary to enumerate these results here.

5.2 Evaluation of the disjoint-community detection algorithm using USI-AUC criterion

This section comparatively analyzed the evaluation effect of the USI-AUC criterion and modularity criterion for 11 kinds of disjoint-community detection algorithms used on six datasets. The 11 disjoint-community detection algorithms and corresponding parameters are listed as Table 5.

Table 5
The brief description of 11 disjoint-community detection algorithms and corresponding parameters

Algorithm	Brief description	Parameter(s)
StabilityOpt[34]	A community detection method using stability as an optimization criterion.	0.1, 0.5, 0.9, 1.3, 1.7, 2.1, 2.5, and 2.9.
Ronhovde[35]	An exceptionally accurate spin-glass-type Potts model for community detection.	0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, and 0.5.
SpectralClust[36]	A graph-partitioning algorithm.	6
SpectralClust2[37]	A graph-partitioning algorithm for perceptual problems.	6
Reichardt[38]	Statistical mechanics of community detection.	0.5, 1.0, 1.5, 2.0, 2.5, and 3.0.
ModulMax1[39]	A simple method that reveals the hierarchical structure of the network.	N/A
ModulMax2[5]	Classic Newmanâ€™s modularity method.	N/A
LFK[15]	A method based on local optimization of a fitness function.	0.5, 1.0, 1.5, 2.0, 2.5, and 3.0.
HSLSW[40]	A method based on a quality function and a fast local-expansion algorithm for uncovering communities.	0.5, 1.0, 1.5, 2.0, 2.5, and 3.0.
Danon[41]	A method that takes inhomogeneities in communities into account.	N/A
AFG[42]	A method that allows for multiple-resolution screening of the modular structure.	0.5, 1.0, 1.5, 2.0, 2.5, and 3.0.

The six datasets are Lesmis[43], Zachary[23], Football[1], CKM-3[44], PPI-Cell[45], and Metabolic[46]. The basic topological features of 6 real networks are listed in Table 6.

Table 6

The basic topological features of six example networks

Network	$\|V\|$	$\|E\|$	$<k>$	$<d>$	C	r	H
Lesmis	77	258	6.70	2.63	0.723	$-$ 0.143	1.78
Zachary	34	78	4.59	2.41	0.606	$-$ 0.476	1.69
Football	115	613	10.66	2.51	0.403	0.162	1.01
CKM-3	246	423	3.44	4.24	0.356	0.102	1.33
PPI_Cell	127	237	3.73	4.45	0.455	0.035	1.65
Metabolic	453	2025	8.94	2.66	0.660	$-$ 0.226	4.49

$|V|$ and $|E|$ are the total numbers of nodes and links, respectively. $<k>$ represents the average degree of nodes in a network, and $<d>$ represents the average distance between nodes in a network. C and r are the clustering coefficient and assortative coefficient, respectively. $H$ is the degree heterogeneity, defined as $H=\frac{<k^{2}>}{<k>^{2}}$ .

For the six datasets, we obtained 57 divisions using different parameters. Each division was evaluated by USI-AUC criterion and modularity criterion, with evaluation results shown in Fig. 8 (sorting from small to large according to modularity).

Figure 8.

Comparison of USI-AUC and modularity criteria. The “*” represents the score associated with the USI-AUC criterion, and the “+” represents the score associated with the modularity criterion. Each scatter diagram showing modularity is sorted from small to large and shows the corresponding points for the USI-AUC score.

Representative experimental results are listed in Table 8. For each community detection algorithm, we selected an optimal parameter according to the USI-AUC criterion and listed the scores from the two criteria and their corresponding rankings.

Table 7

The scores and rankings of disjoint-community detection algorithms using USI-AUC and modularity criteria. We selected the optimal parameter allowing the USI-AUC criterion to reach the maximum value in each algorithm

(a) Lesmis	USI-AUC	Ranking of	Modu-	Ranking of	(b) Zachary	USI-AUC	Ranking of	Modu-	Ranking of
		USI- AUC	larity	modularity			USI-AUC	larity	modularity
StabilityOpt	0.824	8	0.533	7	StabilityOpt	0.748	9	0.392	8
Ronhovde	0.862	1	0.307	11	Ronhovde	0.780	6	0.359	9
SpectralClust1	0.817	10	0.428	10	SpectralClust1	0.783	1	0.407	5
SpectralClust2	0.817	10	0.432	9	SpectralClust2	0.705	11	0.252	11
Reichardt	0.823	9	0.537	6	Reichardt	0.778	7	0.399	6
ModulMax1	0.840	4	0.552	1	ModulMax1	0.781	2	0.419	1
ModulMax2	0.842	3	0.544	4	ModulMax2	0.781	2	0.419	1
LFK	0.835	7	0.545	3	LFK	0.742	10	0.343	10
HSLSW	0.837	5	0.511	8	HSLSW	0.778	7	0.399	6
Danon	0.836	6	0.544	4	Danon	0.781	2	0.419	1
AFG	0.849	2	0.552	1	AFG	0.781	2	0.419	1
(c) Football	USI-AUC	Ranking of	Modu-	Ranking of	(d) CKM-3	USI-AUC	Ranking of	Modu-	Ranking of
		USI-AUC	larity	modularity			USI-AUC	larity	modularity
StabilityOpt	0.844	2	0.600	4	StabilityOpt	$-$	$-$	$-$	$-$
Ronhovde	0.816	10	0.562	9	Ronhovde	0.856	7	0.688	6
SpectralClust1	0.814	11	0.538	10	SpectralClust1	0.832	9	0.653	10
SpectralClust2	0.803	9	0.520	11	SpectralClust2	0.582	10	0.108	9
Reichardt	0.830	8	0.573	8	Reichardt	0.904	2	0.741	3
ModulMax1	0.841	3	0.604	1	ModulMax1	0.846	8	0.644	9
ModulMax2	0.845	1	0.603	2	ModulMax2	0.887	6	0.739	4
LFK	0.832	7	0.600	4	LFK	0.907	1	0.690	5
HSLSW	0.834	6	0.593	6	HSLSW	0.904	2	0.684	7
Danon	0.836	5	0.581	7	Danon	0.895	5	0.748	2
AFG	0.840	4	0.603	2	AFG	0.897	4	0.750	1
(e) PPI-Cell	USI-AUC	Ranking of	Modu-	Ranking of	(f) Metabolic	USI-AUC	Ranking of	Modu-	Ranking of
		USI-AUC	larity	modularity			USI-AUC	larity	modularity
StabilityOpt	0.823	1	0.613	1	StabilityOpt	0.734	4	0.434	3
Ronhovde	0.822	2	0.502	10	Ronhovde	0.730	6	0.194	9
SpectralClust1	0.701	11	0.321	11	SpectralClust1	0.687	10	0.277	8
SpectralClust2	0.810	7	0.535	8	SpectralClust2	$-$	$-$	$-$	$-$
Reichardt	0.811	6	0.561	6	Reichardt	0.742	1	0.408	6
ModulMax1	0.816	3	0.604	3	ModulMax1	0.734	4	0.421	4
ModulMax2	0.813	4	0.608	2	ModulMax2	0.735	3	0.437	2
LFK	0.788	9	0.560	7	LFK	0.711	8	0.337	7
HSLSW	0.762	10	0.513	9	HSLSW	0.694	9	0.152	10
Danon	0.812	5	0.603	4	Danon	0.723	7	0.416	5
AFG	0.810	7	0.603	4	AFG	0.739	2	0.440	1

Table 8

Comparing the evaluation of USI-AUC and modularity criteria using the labeled dataset Zachary

Zachary	USI-AUC	Ranking of USI-AUC	Modularity	Ranking of modularity
Original division	0.730	1	0.371	1
SpectralClust1	0.657	4	0.298	4
SpectralClust2/Reichardt	0.697	3	0.360	2
LFK/HSLSW	0.710	2	0.352	3

5.2.1 Comparative analysis of general trends

For the disjoint-community detection algorithms, the USI-AUC and modularity criteria exhibited positively correlated trends, indicating their joint capabilities. With increases in modularity, the USI-AUC also increased; however, according to data shown in Fig. 8 and Table 7, there were also fluctuations between the two indices mainly caused by isolated nodes. For example, the reason for the strong fluctuation in the fifth division of the lesmis dataset from Fig. 8a concerned the use of {11, 14, 15, 16, 31, 33, 34, 41, 43, and 68} nodes as isolated nodes, with the degrees of the nodes {11, 14, 15, 16, 33, 41, and 68} all equal to 1 and linked with nodes of larger degree. The degrees of nodes {31, 34, and 43} were two or three and were linked with nodes of larger degree. Therefore, when node information was limited, regarding these nodes as isolated was reasonable. We were unsure whether these nodes were attached to the community containing large-degree nodes. These results were consistent with Theorem 1, indicating that USI-AUC criterion provided a reasonable evaluation of communities containing isolated nodes.

5.2.2 Comparative analysis using labeled datasets

We selected community labeled datasets Zachary and Football to analyze the two metrics. In the Zachary dataset, the original division consisted of two communities. Using the 11 algorithms, we obtained three types of divisions consisting of two communities from five algorithms. The experimental results are listed as Table 8.

From rankings of USI-AUC and modularity results, the two metrics are almost the same based on evaluating the division of the Zachary dataset, with the original division of each scoring highest in the two indices and showing that both methods were reasonable.

The same experiment using the Football dataset (see Table 9) showed that both USI-AUC and modularity criteria were better for the divisions of the algorithms as compared with the original communities, indicating that the two indexes either exhibited evaluation errors or some communities were unlabeled. For example, although nodes representing teams in the dataset were attributed to their own leagues, there may still be teams capable of establishing relationships with teams from other leagues, resulting in potentially unmarked community structures.

Table 9
Comparison of evaluations using USI-AUC and modularity criteria on the labeled dataset Football

Football	USI-AUC	Ranking of USI-AUC	Modularity	Ranking of modularity
Original division	0.799	4	0.554	4
StabilityOpt	0.839	1	0.597	1
Reichardt	0.822	3	0.567	3
HSLSW	0.838	2	0.595	2

Table 10

Rankings of disjoint- and overlapping-community detection algorithms according to the use of USI-AUC and extended modularity (EQ) criteria on the Dolphin and football datasets based on selecting the parameter optimizing USI-AUC-criterion results for both algorithms

(a) Dolphin	USI-AUC	Ranking of	Modu-	Ranking of	(b) Football	USI-AUC	Ranking of	Modu-	Ranking of
		USI-AUC	larity	modularity			USI-AUC	larity	modularity
FOCS(2 0.5)	0.816	1	0.361	13	FOCS(2 0.5)	0.796	13	0.484	12
FOCS(1 0.5)	0.797	2	0.376	11	FOCS(1 0.5)	0.799	12	0.418	13
StabilityOpt	0.795	4	0.523	1	StabilityOpt	0.844	2	0.600	4
Ronhovde	0.782	9	0.501	7	Ronhovde	0.816	9	0.562	9
SpectralClust1	0.785	7	0.484	8	SpectralClust1	0.814	10	0.538	10
SpectralClust2	0.733	13	0.384	10	SpectralClust2	0.803	11	0.520	11
Reichardt	0.790	6	0.518	3	Reichardt	0.830	8	0.573	8
ModulMax1	0.780	10	0.502	6	ModulMax1	0.841	3	0.604	1
ModulMax2	0.748	12	0.373	12	ModulMax2	0.845	1	0.603	2
LFK	0.791	5	0.521	2	LFK	0.832	7	0.600	4
HSLSW	0.768	11	0.473	9	HSLSW	0.834	6	0.593	6
Danon	0.785	7	0.506	5	Danon	0.836	5	0.581	7
AFG	0.797	2	0.513	4	AFG	0.840	4	0.603	2

Table 11

Parameter description of LFR benchmark

Parameter	Description
$\|V\|$	Number of nodes
$<k>$	Average degree
maxk	Maximum degree
mu	Mixing parameter
minc	Minimum for the community sizes
maxc	Maximum for the community sizes
on	Number of overlapping nodes
om	Number of memberships of the overlapping nodes
C	Average clustering coefficient

5.3 Evaluation of the overlapping-community detection algorithm using USI-AUC criterion

If communities overlap, USI-AUC criterion can also be used for their evaluation. For example, Fig. 1 shows that using nodes 3 and 9 as common nodes in the Zachary dataset results in a value according to USI-AUC criterion of 0.742, which is greater than the USI-AUC value associated with the original division. However, the value of EQ was 0.351, which was lower than the EQ value associated with the original division according to Theorem 2. As illustrated in Fig. 1, nodes 3 and 9 are active in both communities; therefore, use of the USI-AUC criterion was more reasonable.

Table 12
Parameter setting of LFR benchmark network generation

Network	$\|V\|$	$<k>$	maxk	mu	minc	maxc	on	om	C
LFR1	500	8	50	0.1	15	40	20	3	0.6
LFR2	500	8	50	0.3	15	40	20	4	0.6
LFR3	500	10	50	0.1	15	40	20	3	0.6
LFR4	500	10	50	0.3	15	40	20	4	0.6

Table 13

The basic topological features of 4 generated networks

Network	$\|V\|$	$\|E\|$	$<k>$	$<d>$	C	r	H
LFR1	500	2028	8.112	3.897	0.629	$-$ 0.148	1.877
LFR2	500	2152	8.608	3.735	0.566	$-$ 0.063	2.060
LFR3	500	2340	9.360	3.543	0.606	$-$ 0.098	1.762
LFR4	500	2345	9.740	3.818	0.583	$-$ 0.021	1.726

Table 14

Statistical significance test for the comparison of the USI-AUC measure with other evaluation measures

Network	Precision	Recall	F1-score	NMI	Omega	Modularity	USI-AUC
FLR1	1	1	1	1	1	0.771	0.928
FLR1-1	1	0.926	0.962	0.848	0.863	0.774	0.902
FLR1-2	0.896	0.861	0.878	0.784	0.822	0.666	0.869
FLR2	1	1	1	1	1	0.575	0.825
FLR2-1	1	0.893	0.943	0.824	0.846	0.577	0.805
FLR2-2	0.896	0.866	0.881	0.769	0.765	0.448	0.760
FLR3	1	1	1	1	1	0.711	0.931
FLR3-1	1	0.926	0.962	0.855	0.871	0.712	0.869
FLR3-2	0.896	0.861	0.878	0.744	0.779	0.580	0.861
FLR4	1	1	1	1	1	0.586	0.822
FLR4-1	1	0.893	0.943	0.684	0.686	0.587	0.805
FLR4-2	0.896	0.866	0.881	0.681	0.731	0.465	0.760

For evaluation of the overlapping-community detection algorithm, we used the Dolphin and Football datasets and partitioned the networks using the fast overlapping community detection algorithm (FOCS) [47] and the other 11 disjoint algorithms. Evaluation results according to USI-AUC and EQ are shown in Table 10.

Use of USI-AUC criterion was able to evaluate both the disjoint and overlapping communities. Based on the evaluation results, values associated with the USI-AUC criterion were highest when using the FOCS algorithm with parameters (2 and 0.5), indicating that this type of division was more reasonable. However, the value associated with use of USI-AUC criterion for the FOCS algorithm was lower than that observed using the other algorithms on the Football dataset, indicating that this type of overlapping division was of little significance. As for the EQ criterion, because of the overlapping communities, the EQ value was low, which was consistent with the analysis using Theorem 2. Therefore, EQ did not provide a reasonable evaluation of the overlapping-community detection algorithm.

5.4 Statistical significance test for the comparison of the USI-AUC measure with other measures

In the experiment, we use some artificial network data sets generated by LFR benchmark[48] with community labels to do comparative analyses with other evaluation measures such as precision, recall, F1-score, NMI, omega and modularity. The parameters description of LFR benchmark is listed in Table 11; parameters setting of LFR benchmark to generate 4 networks with labeled overlapping communities is shown in Table 12; the basic topological features of 4 generated networks are listed in Table 13.

FLR1, FLR2, LFR3, LFR4 are networks generated by LFR benchmark within overlapping communities. FLR1-1, FLR2-1, FLR3-1, FLR4-1 are the disjoint community networks by adjusting common nodes to a community randomly from FLR1, FLR2, LFR3, LFR4, respectively. FLR1-2, FLR2-2, FLR3-2, FLR4-2 are adjusted networks within overlapping communities simulated as algorithm found. We can see from Table 14, USI-AUC criterion has a better consistency with criteria based on labeled network. However, there exists phenomenon the modularity of disjoint community is higher than the extended modularity of standard overlapping community, which is assistant with Ttheorem 2.

6. Conclusions

In this paper, we have proved that ME actually has theoretical defects in the presence of evaluation disjoint and overlapping communities. On that basis, we have proposed a method using AUC accuracy for the link-prediction method based on the USI model as an evaluation criterion (USI-AUC). Avoiding defects associated with contributions from isolated nodes to modularity and the inability to determine a best overlapping community according to ME, this criterion effectively evaluated the disjoint and overlapping communities, and has a better consistency with criteria based on labeled network. However, this criterion exhibited fluctuations, indicating that USI-AUC will not be consistent based on differences in the training or test sets, indicating that this criterion requires datasets with multiple partitions.

In this paper, we use an USI-AUC-evaluation index for the 0-order set, which can be extended to a 1-order set or a higher-order set. Our results also indicated that the USI model was suitable for weighted networks; therefore, the USI-AUC index is applicable for the evaluation of community detection in weighted networks. However, for directed networks, the USI model was not directly applied. Future work lies in generalizing the use of the USI model and USI-AUC criterion in directed networks. Moreover, we will also try to apply current HPC techniques (e.g., cloud computing [49]) to meet the challenges of massive networks.

Footnotes

Acknowledgments

We acknowledge Xi Tan and Wei Liu for their inspirations. This work was partially supported by the Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 61521003), and National Natural Science Foundation of China (No. 61601513).

References

Girvan

and Newman

M.E.J.

, Community structure in social and biological networks, Proceedings of the National Academy of Sciences of the United States of America 99(12) (2002), 7821–7826.

Fortunato

, Community detection in graphs, Physics Reports 486(3–5) (2009), 75–174.

Cheng

Kotoulas

Ward

T.E

and Theodoropoulos

, Robust and skew-resistant parallel joins in shared-nothing systems, in: ACM International Conference on Conference on Information and Knowledge Management, 2014, pp. 1399–1408.

Wang

R.S.

Zhang

and Zhang

X.S.

, Modularity and community detection in bipartite networks, Computer Science, 2015.

Newman

M.E.J.

, Fast algorithm for detecting community structure in networks, Physical Review E Statistical Nonlinear & Soft Matter Physics 69(6) (2004), 066133–066133.

Newman

M.E.J.

, Community detection in networks: Modularity optimization and maximum likelihood are equivalent, 2016.

Langone

Alzate

and Suykens

J.A.K.

, Kernel spectral clustering for community detection in complex networks, Proc of the IEEE World Congress on Computational Intelligence 20 (2012), 1–8.

Habashi

Ghanem

N.M.

and Ismail

M.A.

, Enhanced community detection in social networks using active spectral clustering, in: The ACM Symposium 2016, pp. 1178–1181.

Cheng

Liu

Huang

and Zhu

, Hierarchical clustering based on hyper-edge similarity for community detection, in: Ieee/wic/acm International Conferences on Web Intelligence and Intelligent Agent Technology, 2012, pp. 238–242.

10.

Raghavan

U.N.

, R Albert and S Kumara, Near linear time algorithm to detect community structures in large-scale networks, Physical Review E Statistical Nonlinear & Soft Matter Physics 76(3 Pt 2) (2007).

11.

Liu

and Murata

, Advanced modularity-specialized label propagation algorithm for detecting communities in networks, Physica A Statistical Mechanics & Its Applications 389(7) (2010), 1493–1500.

12.

Palla

DerÃ©nyi

Farkas

and Vicsek

, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435(7043) (2005), 814–818.

13.

Gregory

, Finding overlapping communities in networks by label propagation, New Journal of Physics 12(10) (2009), 2011–2024.

14.

Ahn

Y.Y.

Bagrow

J.P.

and Lehmann

, Link communities reveal multiscale complexity in networks, Nature 466(7307) (2010), 761–764.

15.

Lancichinetti

Fortunato

and Kertész

, Detecting the overlapping and hierarchical community structure of complex networks, New Journal of Physics 11(3) (2009), 19–44.

16.

Nepusz

PetrÃőczi

NÃ©gyessy

and Bazsó

, Fuzzy communities and the concept of bridgeness in complex networks, Physical Review E 77(1 Pt 2) (2008), 119–136.

17.

Collins

L.M.

and Dent

C.W.

, Omega: A general formulation of the rand index of cluster recovery suitable for non-disjoint solutions, Multivariate Behavioral Research 23(2) (1988), 231–242.

18.

Gregory

, Fuzzy overlapping communities in networks, Journal of Statistical Mechanics Theory & Experiment 2(2) (2011), P02017.

19.

Pizzuti

, Ga-net: A genetic algorithm for community detection in social networks, in: Parallel Problem Solving From Nature – PPSN X, International Conference Dortmund, Germany, September 13–17, 2008, Proceedings, 2008, pp. 1081–1090.

20.

Nicosia

Mangioni

Carchiolo

and Malgeri

, Extending modularity definition for directed graphs with overlapping communities, 2008.

21.

Shen

Cheng

Cai

and Hu

M.B.

, Detect overlapping and hierarchical community structure in networks, Physica A Statistical Mechanics & Its Applications 388(8) (2009), 1706–1712.

22.

Shen

H.W.

Cheng

X.Q.

and Guo

J.F.

, Quantifying and identifying the overlapping community structure in networks, Journal of Statistical Mechanics Theory & Experiment 2009(7) (2009), 07042.

23.

Zachary

W.W.

, An information flow model for conflict and fission in small groups, Journal of Anthropological Research 33(4) (1977), 473.

24.

Xie

Kelley

and Szymanski

B.K.

, Overlapping community detection in networks: The state-of-the-art and comparative study, Acm Computing Surveys 45(4) (2013), 115–123.

25.

Danon

DÃazguilera

Duch

and Arenas

, Comparing community structure identification, Journal of Statistical Mechanics Theory & Experiment 2005(9) (2005), 09008.

26.

Valverde-Rebaza

and De Andrade Lopes

, Exploiting behaviors of communities of twitter users for link prediction, Social Network Analysis and Mining 3(4) (2013), 1063–1074.

27.

Valverde-Rebaza

Valejo

Berton

De Paulo Faleiros

and De Andrade Lopes

, A naive bayes model based on overlapping groups for link prediction in online social networks, in: The Acm/sigapp Symposium on Applied Computing, 2015, pp. 1136–1141.

28.

Soundarajan

and Hopcroft

, Using community information to improve the precision of link prediction methods, in: Proceedings of the 21st International Conference Companion on World Wide Web, 2012, pp. 607–608.

29.

Aaron

Cristopher

and Newman

M.E.J.

, Hierarchical structure and the prediction of missing links in networks, Nature 453(7191) (2008), 98–101.

30.

Holland

P.W.

Laskey

K.B.

and Leinhardt

, Stochastic blockmodels: First steps, Social Networks 5(2) (1983), 109–137.

31.

Airoldi

E.M.

Blei

D.M.

Fienberg

S.E.

and Xing

E.P.

, Mixed membership stochastic blockmodels, Journal of Machine Learning Research 9(5) (2008), 1981–2014.

32.

Hanley

J.A.

and Mcneil

B.J.

, The meaning and use of the area under a receiver operating characteristic (roc) curve, Radiology 143(1) (1982), 29–36.

33.

Lusseau

Schneider

Boisseau

O.J.

Haase

Slooten

and Dawson

S.M.

, The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations, Behavioral Ecology & Sociobiology 54(4) (2003), 396–405.

34.

Martelot

E.L.

and Hankin

, Multi-scale community detection using stability as optimisation criterion in a greedy algorithm, 2011.

35.

Ronhovde

and Nussinov

, Local resolution-limit-free potts model for community detection, Physical Review E Statistical Nonlinear & Soft Matter Physics 81(2) (2010), 387–395.

36.

Shi

and Malik

, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8) (2000), 888–905.

37.

Hespanha

, An efficient matlab algorithm for graph partitioning, 2004.

38.

Reichardt

and Bornholdt

, Statistical mechanics of community detection, Physical Review E Statistical Nonlinear & Soft Matter Physics 74(2) (2006).

39.

Blondel

V.D..

Guillaume

J.L.

Lambiotte

and Lefebvre

, Fast unfolding of community hierarchies in large networks, J Stat Mech, abs/0803.0476, 2008.

40.

Huang

Sun

Liu

Song

and Weninger

, Towards online multiresolution community detection in large-scale networks, Plos One 6(8) (2011), 492–492.

41.

Danon

Diaz-Guilera

and Arenas

, Effect of size heterogeneity on community identification in complex networks, Journal of Statistical Mechanics Theory & Experiment 2006(11) (2006), 11010.

42.

Arenas

Fernandez

and Gomez

, Analysis of the structure of complex networks at different resolution levels, New Journal of Physics 10(5) (2008), 4656–4658.

43.

Knuth

D.E.

, The stanford graphbase: a platform for combinatorial computing, in: Acm/sigact-Siam Symposium on Discrete Algorithms, 25–27 January 1993, Austin, Texas, 1993, pp. 41–43.

44.

Coleman

and Menzel

, The diffusion of innovation among physicians, Sociometry 20(20) (1977), 253–269.

45.

Kolaczyk

E.D.

, Statistical Analysis of Network Data, Springer New York, 2009.

46.

Duch

and Arenas

, Community detection in complex networks using extremal optimization, Physical Review E Statistical Nonlinear & Soft Matter Physics 72(2 Pt 2) (2005), 986–1023.

47.

Bandyopadhyay

Chowdhary

and Sengupta

, Focs: Fast overlapped community search, IEEE Transactions on Knowledge & Data Engineering 27(11) (2015), 2974–2985.

48.

Lancichinetti

Fortunato

and Radicchi

, Benchmark graphs for testing community detection algorithms, Physical Review E Statistical Nonlinear & Soft Matter Physics 78(2) (2008), 046110.

49.

Cheng

and Kotoulas

, Efficient skew handling for outer joins in a cloud computing environment, IEEE Transactions on Cloud Computing, 2015, p. 1.

USI-AUC: An evaluation criterion of community detection based on a novel link-prediction method

Abstract

Keywords

1. Introduction

2.1 Evaluation criterion based on labeled network

.

.

Proof..

.

Proof..

.

Proof..

.

.

.

.

.

Proof..

.

Proof..

4.2.1 The partition of sets

5.1 Verification of the validity of the two theorems

5.1.1 Experimental verification of Theorem 1

Table 2 Partitions of the Dolphin dataset using the Newman-modularity algorithm. Each column indicates a community, and the contents are node numbers

Table 4 Adjusted communities, with each column indicating a community and its node numbers

Table 5 The brief description of 11 disjoint-community detection algorithms and corresponding parameters

5.2.2 Comparative analysis using labeled datasets

Table 9 Comparison of evaluations using USI-AUC and modularity criteria on the labeled dataset Football

Table 12 Parameter setting of LFR benchmark network generation

6. Conclusions

Footnotes

Acknowledgments

References

Table 2
Partitions of the Dolphin dataset using the Newman-modularity algorithm. Each column indicates a community, and the contents are node numbers

Table 4
Adjusted communities, with each column indicating a community and its node numbers

Table 5
The brief description of 11 disjoint-community detection algorithms and corresponding parameters

Table 9
Comparison of evaluations using USI-AUC and modularity criteria on the labeled dataset Football

Table 12
Parameter setting of LFR benchmark network generation