Exploiting higher-order patterns for community detection in attributed graphs

Abstract

As a fundamental task in cluster analysis, community detection is crucial for the understanding of complex network systems in many disciplines such as biology and sociology. Recently, due to the increase in the richness and variety of attribute information associated with individual nodes, detecting communities in attributed graphs becomes a more challenging problem. Most existing works focus on the similarity between pairwise nodes in terms of both structural and attribute information while ignoring the higher-order patterns involving more than two nodes. In this paper, we explore the possibility of making use of higher-order information in attributed graphs to detect communities. To do so, we first compose tensors to specifically model the higher-order patterns of interest from the aspects of network structures and node attributes, and then propose a novel algorithm to capture these patterns for community detection. Extensive experiments on several real-world datasets with varying sizes and different characteristics of attribute information demonstrated the promising performance of our algorithm.

Keywords

Attributed graph community detection clustering higher-order patterns

Figure 1.

An social example of attributed graph.

1. Introduction

A graph consists of nodes that represent individual objects and edges that connect nodes to describe the relationship between them [6, 26]. In many practical applications, graphs are normally used to represent complex network systems and have been widely applied to many practical applications, such as membrane computing [9, 31], disease diagnosis [17, 18], and industrial system solutions [19, 20]. With the rapid development of information technology, the network sizes increase very quickly, and there is also a concomitant increase in the richness and variety of attribute information of nodes. Taking social networks as an example, the user-generated content provides us an alternative way to characterize users, thus facilitating the analysis of social communities [7, 25]. Another example can be found in protein-protein interaction networks, where the functional attributes of proteins are useful to the understanding of biological mechanisms [12, 15]. A formal representation to describe such networks is attributed graph.

Two sources of information are available in an attributed graph. One is the structural, or topological, information of graph and the other is the attribute information of nodes in that graph. In particular, the structural information of graph is represented by the set of links between nodes while the attribute information of nodes is the rich content associated with nodes, such as the user-generated content in social networks and the functional attributes of proteins in protein-protein interaction networks. An example of attributed graph is presented in Fig. 1, where the social network on the left side denotes the structural information describing the topology of that attributed graph while the user profiles on the right side denotes the attribute information associated with each of users. As has been pointed out by [21, 32], community analysis has been increasingly used for social, ecological, and other networks. However, regarding the problem of community detection in attributed graphs (CDAG), new challenges have thus been raised, as there is a necessity for us to take into account of these two sources of information simultaneously. Motivated by the intuition that nodes in the same community should be similar in terms of both structure and attribute information [27], certain attempts have thus been made to solve the CDAG problem and most of them are either distance-based or model-based.

Distance-based algorithms generally compose an augmented graph by introducing virtual links to connect nodes and their attribute values, and then design different distance metrics so that conventional clustering algorithms, such as Markov and K-Medoids clustering, can be explicitly applied to detect communities in that augmented graph. Based on both structural and attribute similarities through a unified distance measure, SA-Cluster [36] partitions a given attributed large into a set of clusters each of which is a densely connected subgraph where attribute values are homogeneous. In [1], the correlation between attribute sets and dense subgraphs is measured by the proposed statistical significance measures. CODICIL [37] introduces a measure of signal strength between two nodes in the network by fusing their link strength with content similarity and then prunes edges that are locally irrelevant for each node so that standard community discovery algorithms can be applied. Combining the content relevance and network structures, FCAN [14] allows the detection of overlapping communities. As a recent attempt in this direction, SToC [2] composes a new heterogeneous distance measure from an alternative view and performs the clustering task for attributes graphs based on bottom-k sketches and traditional graph-theoretic concepts.

Compared to distance-based algorithms, model-based ones consider the detection problem from an alternative view such that the design of specific distance measures can be avoided. In particular, a variety of Bayesian models are proposed to simulate the generative process of an attributed graph and the set of communities that most likely generate the given attributed graph is considered as the final solution for the task of community detection. In [29], a discriminative model is developed to alleviate the impact of irrelevant content attributes such that the link and related content analysis can be combined for community detection. As an accurate and scalable algorithm to solve the CDAG problem, CESNA [10] constructs a statistic model to simulate the interaction between the network structure and the node attributes, thus improving robustness in the presence of noise. To seamlessly handle different types of edges and node attributes, GBAGC [38] provides a general and principled framework and develops an efficient variational method for community detection under this framework. In [34], two models, namely the node-information involved mixed-membership model and the node-information involved latent-feature model, are developed to systematically incorporate additional node information and the efficiency of these two models is improved by using conjugate priors. Recently, motivated by the fact that node attributes have different contributions to the formation of communities, a three-layer node-attribute-value hierarchical structure is introduced by [13] to describe the attribute information in a flexible and interpretable manner and then TARA is proposed to solve the optimization problems for community detection.

However, existing algorithms are constrained only for the clustering consistency between pairwise nodes. More specifically, distance-based algorithms introduce a variety of measures to compute the similarity of two nodes in the augmented networks, while model-based algorithms rely on the difference in the probability of belonging to the same community between pairwise nodes when sampling their links and respective attribute values. Yet one should note that emphasizing the close similarity between pairwise nodes cannot guarantee the highly desirable consistency in both structure and attribute information to the other nodes in the same community, especially for many real-world applications, whose attributed graphs often suffer from the shortcomings of sparse structures and dispersed distributions of attribute values [3, 35]. Obviously, there is a necessity for us to consider the higher-order tensor patterns involving more than two nodes from the aspects of network structures and node attributes for improving the performance of detecting communities on the CDAG problem.

Recently, due to the ability of representing higher dimensional structure of data, tensor models are preferred to capturing the higher-order connectivity patterns in network data at the level of small subgraphs, or network motifs [4]. For a particular network motif, the clustering framework proposed by [5] is able to detect communities each of which is composed of many instances of that motif. Generalizing from spectral clustering based on random walks, GTSC [30] designs a new stochastic process that models higher-order Markov chains, namely superspacey random walk, to simultaneously clusters the rows, columns, and slices of a non-negative three-mode tensor. By viewing tensors with hypergraphs, the co-clustering algorithm proposed by [8] partitions the corresponding hypergraph based on the random sampling technique.

However, few of them can be applied to address the CDAG problem, as the incorporation of attribute information into the higher-order patterns largely remains an open issue. In this paper, we extend previous works [4, 5] to attributed graphs and propose a novel Tensor-based cOmmunity Detection Algorithm (TODA) for efficiently detecting communities based on higher-order patterns available in the structure and attribute information of attributed graphs. Our contributions can be summarized as follows.

We specifically design the higher-order patterns from the aspects of network structures and node attributes, and formulate the CDAG problem as tensor spectral clustering for attributed graphs by modeling these patterns as tensors.

We then present our TODA algorithm to solve the CDAG problem by adopting the theory of spacey random walk, and also introduce a few heuristics to achieve a more efficient performance when applying TODA to large attributed graphs.

2. Preliminaries

Here, an attributed graph is represented with a 3-element tuple as:

$\displaystyle G=\{V,E,\Lambda\}$ (1)

where $V=\{v_{i}\}∼{}(1\leqslant i\leqslant n_{V})$ is a set of all $n_{V}$ nodes, $E=\{e_{ij}\}$ denotes a total of $n_{E}$ links, and $\Lambda=\{\Lambda_{m}\}∼{}(1\leqslant m\leqslant n_{\Lambda})$ consists of $n_{\Lambda}$ attributes that are available to be associated with each of nodes in $V$ . If there is a link $e_{ij}\in E$ , it means that the two nodes $v_{i}$ and $v_{j}$ are connected in the network.

Given an arbitrary attribute $\Lambda_{m}\in\Lambda$ , we define its domain, i.e., $\textit{dom}(\Lambda_{m})$ , as a set of possible values that can be taken by $\Lambda_{m}$ and $|\textit{dom}(\Lambda_{m})|$ is the size of $\textit{dom}(\Lambda_{m})$ . A $n_{V}\times|\textit{dom}(\Lambda_{m})|$ matrix $\textbf{X}_{m}$ is used to describe whether an attribute value of $\Lambda_{m}$ is taken by a node. For instance, if the $p$ -th $(1\leqslant p\leqslant|\textit{dom}(\Lambda_{m})|)$ value in $\textit{dom}(\Lambda_{m})$ is taken by $v_{i}$ , we have

$\displaystyle x_{ip}^{m}=1$ (2)

For simplicity, we use upper-case underlined boldface letters to denote tensors and the same non-bold lower-case underlined italic letters stand for their entries. For instance, a three-mode tensor can be denoted as $\underline{\textbf{T}}\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}$ , where $n_{1}$ , $n_{2}$ and $n_{3}$ are the number of elements in corresponding directions. The entry of $\underline{\textbf{T}}$ at index $(i,j,k)$ is a non-negative real number, i.e., $\underline{t}(i,j,k)$ , for $1\leqslant i\leqslant n_{1}$ , $1\leqslant j\leqslant n_{2}$ , and $1\leqslant k\leqslant n_{3}$ . $\underline{\textrm{T}}_{:,j,k}\in\Re^{n_{1}}$ is the $(j,k)$ column of $\underline{\textbf{T}}$ . The $k$ -th frontal slice of $\underline{\textbf{T}}$ is denoted as $\underline{\textbf{T}}_{:,:,k}\in\mathbb{R}^{n_{1}\times n_{2}}$ .

The CDAG problem to be addressed in this paper is to partition $V$ into $K$ communities $C=\{V_{k}\}$ such that $\bigcup_{k=1}^{K}V_{k}=V$ and $V_{k}\cap V_{l}=\emptyset$ for $1\leqslant k\neq l\leqslant K$ .

3. Tensor-based community detection algorithm

In our TODA algorithm, the CDAG problem is formulated as a $K$ -partition of $V$ by simultaneously considering both the structural and attribute information of $G$ . To solve it, TODA adopts the spectral clustering framework, which was originally proposed by [30] for higher-order network structures. In this section, we show that their algorithm can be extended to attributed graphs.

3.1 Tensor representation of higher-order patterns

Regarding the higher-order patterns about $G$ , we compose them based on the structural and attribute information in $G$ . Regarding the higher-order structural patterns, triangle motifs are preferred, as they are the fundamental units in complex networks [16, 28]. To represent the corresponding third-order structural patterns for triangle motifs, we use a three-mode tensor $\underline{\textbf{T}_{S}}$ defined as:

$\displaystyle\underline{\textbf{T}_{S}}=\big{(}\underline{t_{S}}(i,j,k)\big{)}$ (3)

where $1\leqslant i,j,k\leqslant n_{V}$ and

$\displaystyle\underline{t_{S}}(i,j,k)=\begin{cases}1,&\text{if $e_{ij}$, $e_{% jk}$ and $e_{ik}\in E$}.\\ 0,&\text{otherwise}.\end{cases}$ (4)

A triangle motif can be observed among the nodes $v_{i}$ , $v_{j}$ and $v_{k}$ if $\underline{t_{S}}(i,j,k)$ is equal to 1. Moreover, we can extend the definition of $\underline{\textbf{T}_{S}}$ to patterns beyond three-order.

Unlike most of existing algorithms that require a conversion from multi-valued attributes to binary attributes in advance, the use of tensor allows us to preserve the similarity information for each of attributes in $\Lambda$ . To indicate the similarity among nodes $v_{i}$ , $v_{j}$ and $v_{k}$ in terms of an arbitrary attribute $\Lambda_{m}$ , a five-mode tensor $\underline{\textbf{T}_{A}}=\big{(}\underline{t_{A}}(m,i,j,k,p)\big{)}$ is adopted. The entry at index $(m,i,j,k,p)$ is defined as:

$\displaystyle\underline{t_{A}}(m,i,j,k,p)=$ (5) $\displaystyle\quad\begin{cases}1,&\text{if $x_{ip}^{m}$, $x_{jp}^{m}$ and $x_{% kp}^{m}$ are equal to 1}.\\ 0,&\text{otherwise}.\end{cases}$

In Eq. (3.1), the case of $\underline{t_{A}}(m,i,j,k,p)=1$ means that the $p$ -th value in $\textit{dom}(\Lambda_{m})$ is commonly found in the attribute information of $v_{i}$ , $v_{j}$ and $v_{k}$ .

3.2 Transition probability tensor

Since all entries in $\underline{\textbf{T}_{S}}$ and $\underline{\textbf{T}_{A}}$ involve three nodes, we incorporate these higher-order patterns into a second-order Markov chain. The transition probability tensor of this chain is denoted as $\underline{\textbf{P}}\in\mathbb{R}^{n_{V}\times n_{V}\times n_{V}}$ , which is a three-mode tensor. In $\underline{\textbf{P}}$ , each entry $\underline{p}(i,j,k)$ is the probability of moving to the node $v_{i}$ depends on the current node $v_{j}$ and the previous node $v_{k}$ . Hence, we have

$\displaystyle\underline{p}(i,j,k)\!=\!\textrm{Prob}(Z_{t+1}\!=\!v_{i}|Z_{t}\!=% \!v_{j},Z_{t-1}\!=\!v_{k})$ (6)

where $Z_{t}$ represents the node visited at the time $t$ . In the context of $G$ , the value of $\underline{p}(i,j,k)$ can be computed using Eq. (7).

$\displaystyle\underline{p}(i,j,k)=\frac{\underline{p_{S}}(i,j,k)+\underline{p_% {A}}(i,j,k)}{2}$ (7)

where

$\displaystyle\underline{p_{S}}(i,j,k)=\frac{\underline{t_{S}}(i,j,k)}{\sum_{i=% 1}^{n_{V}}\underline{t_{S}}(i,j,k)}$ (8) $\displaystyle\underline{p_{A}}(i,j,k)=\frac{1}{n_{\Lambda}}\sum_{m=1}^{n_{% \Lambda}}\frac{\sum_{p}\underline{t_{A}}(m,i,j,k,p)}{\sum_{i,p}\underline{t_{A% }}(m,i,j,k,p)}$ (9)

With respect to Eq. (7), $\underline{\textbf{P}}$ is column stochastic, thus eligible to be considered as a Markov chain with states $Z_{t}$ and $Z_{t-1}$ . Furthermore, according to $\underline{p_{S}}(i,j,k)$ and $\underline{p_{A}}(i,j,k)$ , the probability of visiting $v_{i}$ at time $t+1$ is determined by the amount of higher-order patterns observed among $v_{i}$ , $v_{j}$ and $v_{k}$ from structural and attribute perspectives.

However, in many practical applications, the instances of $G$ are normally sparse in terms of both structure and attribute information. That is to say, some of nodes may encounter the situation of $\sum_{i=1}\underline{t_{S}}(i,j,k)$ $=0$ , or $\sum_{i,p}\underline{t_{A}}(m,i,j,k,p)=0$ , or both. To represent such situation, we define a set of feasible states for $(v_{j},v_{k})$ as below to address this issue.

$\displaystyle\Gamma_{S}=\big{\{}(v_{j},v_{k})\big{|}\sum_{i=1}\underline{t_{S}% }(i,j,k)=0\big{\}}$ (10) $\displaystyle\Gamma_{A}=\big{\{}(v_{j},v_{k},\Lambda_{m})\big{|}\!\sum_{i,p}% \underline{t_{A}}(m,i,j,k,p)\!=\!0\big{\}}$ (11)

Hence, regarding $\underline{p_{S}}(i,j,k)$ , we simply set $\underline{p_{S}}(i,j,k)$ $=0$ when $(v_{j},v_{k})\in\Gamma_{S}$ . For $\underline{p_{A}}(i,j,k)$ , a more appropriate definition is given by

$\displaystyle\underline{p_{A}}(i,j,k)=$ $\displaystyle\quad\left\{\begin{array}[]{l}\frac{1}{|\Lambda_{+}(j,k)|}\sum_{% \Lambda_{m}\in\Lambda_{+}(j,k)}\frac{\sum_{p}\underline{t_{A}}(m,i,j,k,p)}{% \sum_{i,p}\underline{t_{A}}(m,i,j,k,p)},\\ \hskip 99.584646pt\text{if}|\Lambda_{+}(j,k)|>0.\\ 0,\hskip 105.275197pt\text{otherwise}.\end{array}\right.$

where

$\displaystyle\Lambda_{+}(j,k)=\big{\{}\Lambda_{m}|(v_{i},v_{j},\Lambda_{m})% \notin\Gamma_{A}\big{\}}$ (13)

and $|\Lambda_{+}(j,k)|$ is the size of $\Lambda_{+}(j,k)$ . In doing so, the case of $\sum_{i,p}\underline{t_{A}}(m,i,j,k,p)=0$ is disregarded in Eq. (3.2).

Since $\underline{\textbf{P}}$ can be determined by Eq. (7), it is intuitive for us to convert the corresponding second-order Markov chain into the first-order Markov chain by considering the full state space of $(v_{j},v_{k})$ . However, the stationary distribution computed by doing so requires the memory storage of $O(n_{V}^{2})$ , which is infeasible especially for large attributed graphs. Next, we see how to alternatively approximate the stationary distribution in a space-friendly manner.

3.3 Approximation of the second-order Markov chain

Figure 2.

Two different stochastic processes. (a) regular second-order Markov chain; (b) spacey random walk process.

We now describe the stochastic process by following the theory of spacey random walk proposed by [30, 5]. Generally speaking, in the spacey random walk process, once the process visits $Z_{t}$ at time $t$ , it spaces out and forgets its second last state, i.e., $Z_{t-1}$ . Instead, a state, denoted as $Y_{t}$ , is drawn from the sequence of past states, i.e., $H_{t}=\{Z_{1},\dots,Z_{t}\}$ with the probability given by

$\displaystyle\text{Prob}(Y_{t}=v_{k}\big{|}H_{t})=\frac{1}{n_{V}+t}$ (14) $\displaystyle\quad\left(1+\sum_{r=1}^{t}\text{Ind}\{Z_{r}=v_{k}\}\right)$

where $\textrm{Ind}\{\cdot\}$ is the indicator event. In particular, the value of $\textrm{Ind}\{Z_{r}=v_{k}\}$ is 1 if the state $Z_{r}$ at time $r$ is $v_{k}$ , and 0 otherwise. After that, the process transitions to $Z_{t+1}$ as a second-order Markov chain with the last two states $Z_{t}$ and $Y_{t}$ as described in Fig. 2. Formally, the transition probabilities of this stochastic process are defined as below.

$\displaystyle\text{Prob}(Z_{t+1}=v_{i}|Z_{t}=v_{j},Y_{t})=(1-\alpha)u_{i}$ $\displaystyle\quad+\alpha\sum_{k}\text{Prob}(Z_{t+1}=v_{i}\big{|}Z_{t}=v_{j},Y% _{t}=v_{k},H_{t})$ $\displaystyle\qquad\times\textrm{Prob}(Y_{t}=v_{k}\big{|}H_{t})$ (15)

where $\alpha$ is a constant, $u_{i}$ is the teleportation probability and $\textrm{Prob}(Z_{t+1}=v_{i}\big{|}Z_{t}=v_{j},Y_{t}=v_{k},H_{t})=\textrm{Prob}% (Y_{t}=v_{i}\big{|}H_{t})$ if $(v_{j},v_{k})\in\Gamma_{S}\cap\Gamma_{A}$ ; otherwise, $\textrm{Prob}(Z_{t+1}=v_{i}\big{|}Z_{t}=v_{j},Y_{t}=v_{k},H_{t})=\underline{p}% (i,j,k)$ .

One should note that the state $(v_{j},v_{k})\in\Gamma_{S}\cap\Gamma_{A}$ represents the undefined transitions, where the corresponding columns of $\underline{\textbf{T}_{S}}$ and $\underline{\textbf{T}_{A}}$ are entirely zero. Instead of filling in $\underline{\textbf{T}_{S}}_{:,j,k}$ and $\underline{\textbf{T}_{A}}_{:,j,k}$ with stochastic vectors, we randomly select a state from history according to Eq. (3.3).

To approximate the second-order Markov chain defined by Eq. (6), an equivalent first-order Markov chain can be derived from the stationary distribution of the spacey random walk by following a procedure in [30]. Assuming that M and x are the transition matrix of the first-order Markov chain and the corresponding stationary distribution respectively, the equations of M and x are then given by

$\displaystyle\textbf{M}=\underline{\textbf{P}}[\textbf{x}]+\textbf{x}(\textbf{% e}^{T}-\textbf{e}^{T}\underline{\textbf{P}}[\textbf{x}])$ (16) $\displaystyle\textbf{x}=\alpha\underline{\textbf{P}}\textbf{x}^{2}+\alpha(1-% \lVert\underline{\textbf{P}}\textbf{x}^{2}\rVert_{1})\textbf{x}+(1-\alpha)% \textbf{u}$ (17)

where

$\displaystyle\underline{\textbf{P}}[\textbf{x}]=\sum_{k=1}^{n_{V}}x_{k}% \underline{\textbf{P}}_{:,:,k}$ (18)

Hence, we can use the iterative fixed-point algorithm to yield x with Eq. (17) and then obtain M with Eq. (16). Given M and x, the stochastic process of the first-order Markov chain can be determined and we then move to the part of solving the CDAG problem under the framework of spectral clustering.

3.4 Spectral clustering framework

Since M is the transition matrix of an equivalent first-order Markov chain, we first compute the second left eigenvector of it by following the steps of spectral clustering. After that, we sort the nodes by their corresponding values in the eigenvector in ascending order and apply a sweep cut to the sorted nodes.

Given that $V_{k}$ is the current node set to be partitioned and $v_{i}\in V_{k}$ is the node corresponding to the $i$ -th smallest entry of the second left eigenvector, the node sets $Q_{i}=\{v_{1},\ldots,v_{i}\}$ and $\bar{Q}_{i}=V_{k}-Q_{i}$ constitutes one of the solutions to partition $V_{k}$ . To assess the quality of this partition, we introduce the cut and volume measures for attributed graphs from the structural and attribute perspectives.

When we evaluate the partition $Q_{i}$ by only considering the structural information of $G$ , the cut and volume measures are defined as follows.

$\displaystyle\textit{cut}_{S}(Q_{i})=\sum_{v_{i},v_{j},v_{k}\in V_{k}}% \underline{t_{S}}(i,j,k)-\sum_{v_{i},v_{j},v_{k}\in Q_{i}}\underline{t_{S}}(i,% j,k)-\sum_{v_{i},v_{j},v_{k}\in\bar{Q}_{i}}\underline{t_{S}}(i,j,k)$ (19) $\displaystyle\textit{vol}_{S}(Q_{i})=\sum_{v_{i}\in Q_{i}}\underline{t_{S}}(i,% j,k)+\sum_{v_{j}\in Q_{i}}\underline{t_{S}}(i,j,k)+\sum_{v_{k}\in Q_{i}}% \underline{t_{S}}(i,j,k)$ (20)

In particular, the value of $\textit{cut}_{S}(Q_{i})$ is the number of edges that connect nodes in $Q_{i}$ to those in $\bar{Q}_{i}$ and the value of $\textit{vol}_{S}(Q_{i})$ is the number of edges connecting the nodes in $Q_{i}$ . Then the structure conductance measures the ratio of $\textit{cut}_{S}(Q_{i})$ to $\textit{vol}_{S}(Q_{i})$ or $\textit{vol}_{S}(\bar{Q}_{i})$ and its definition is given by

$\displaystyle\textit{con}_{S}(Q_{i})=\frac{cut_{S}(Q_{i})}{\textrm{min}(% \textit{vol}_{S}(Q_{i}),\textit{vol}_{S}(\bar{Q}_{i})}$ (21)

Regarding the quality of $Q_{i}$ in terms of attribute information, the cut and volume measure for the attribute $\Lambda_{m}$ is defined as:

$\displaystyle\textit{cut}_{A}(Q_{i},\Lambda_{m})=\sum_{p=1}^{|\textit{dom}(% \Lambda_{m})|}\sum_{v_{i},v_{j},v_{k}\in V_{k}}\underline{t_{A}}(m,i,j,k,p)-\!% \!\sum_{v_{i},v_{j},v_{k}\in Q_{i}}\underline{t_{A}}(m,i,j,k,p)-\!\!\sum_{v_{i% },v_{j},v_{k}\in\bar{Q}_{i}}\!\!\underline{t_{A}}(m,i,j,k,p)$ (22) $\displaystyle\textit{vol}_{A}(Q_{i},\Lambda_{m})=\sum_{p=1}^{|\textit{dom}(% \Lambda_{m})|}\sum_{v_{i}\in V_{k}}\underline{t_{A}}(m,i,j,k,p)+\sum_{v_{j}\in Q% _{i}}\underline{t_{A}}(m,i,j,k,p)+\sum_{v_{k}\in\bar{Q}_{i}}\underline{t_{A}}(% m,i,j,k,p)$ (23)

According to Eq. (22), the value of $\textit{cut}_{A}(Q_{i},\Lambda_{m})$ is the number of attribute values that are shared by the nodes of $Q_{i}$ and those of $\bar{Q}_{i}$ in the domain of $\Lambda_{m}$ . The value of $vol_{A}(Q_{i},\Lambda_{m})$ is the number of attribute values that are shared by the nodes of $Q_{i}$ and those of $G$ in the domain of $\Lambda_{m}$ . Then the attribute conductance for $\Lambda_{m}$ is the ratio of $cut_{A}(Q_{i},\Lambda_{m})$ to $vol_{A}(Q_{i},\Lambda_{m})$ or $vol_{A}(\bar{Q}_{i},\Lambda_{m})$ and it can be computed as below.

$\displaystyle\textit{con}_{A}(Q_{i},\Lambda_{m})=$ $\displaystyle\quad\frac{\textit{cut}_{A}(Q_{i},\Lambda_{m})}{\textrm{min}(vol_% {A}(Q_{i},\Lambda_{m}),\textit{vol}_{A}(\bar{Q}_{i},\Lambda_{m})}$ (24)

Concerning the attribute preferences in forming communities, the overall attribute conductance, denoted as $\textit{con}_{A}(Q_{i})$ , is determined by the attribute with the smallest value of $\textit{con}_{A}(Q_{i},\Lambda_{m})$ . Hence, we have

$\displaystyle\textit{con}_{A}(Q_{i})=\textrm{min}(\textit{con}_{A}(Q_{i},% \Lambda_{1}),\ldots,\textit{con}_{A}(Q_{i},\Lambda_{n_{\Lambda}}))$ (25)

Heuristically, if we only consider the attribute information, $V_{k}$ is more likely to be partitioned into $Q_{i}$ and $\bar{Q}_{i}$ by the attribute that obtains the smallest value of $\textit{con}_{A}(Q_{i},\Lambda_{m})$ .

Once we obtain $\textit{con}_{S}(Q_{i})$ and $\textit{con}_{A}(Q_{i})$ , the conductance of $Q_{i}$ in terms of both structure and attribute information is then given by

$\displaystyle\textit{con}(Q_{i})=\frac{\textit{con}_{S}(Q_{i})+\textit{con}_{A% }(Q_{i})}{2}$ (26)

One should note that the definitions of $\textit{con}_{S}(Q_{i})$ and $\textit{con}_{A}(Q_{i})$ guarantee that $\textit{con}_{S}(Q_{i})\in[0,1]$ and $\textit{con}_{A}(Q_{i})\in[0,1]$ . It is for this reason that we can ensure that $\textit{con}(Q_{i})\in[0,1]$ . Then the quality of $Q_{i}$ can be assessed by its conductance without considering the difference in the magnitude between $\textit{con}_{S}(Q_{i})$ and $\textit{con}_{A}(Q_{i})$ . The smaller the conductance of $S_{i}$ is, the better the quality of $S_{i}$ will be.

For each node $v_{i}$ in $V_{k}$ , we evaluate the partitions $Q_{i}$ and $\bar{Q}_{i}$ according to the conductance measure and the one with the smallest conductance is then selected to be the best solution to split $V_{k}$ . A complete description of TODA is presented in Algorithm 3.4.

A Complete Description of TODA[1] $G$ , $K$ , u, $\alpha$ $\bigcup_{k=1}^{K}V_{k}$ Initialize $C=\emptyset$ Set $V_{max}=V$ as the current node set to be partitioned Set $\underline{\textbf{T}_{S}}$ and $\underline{\textbf{T}_{A}}$ for $V_{max}$ with Eqs (3) and (3.1) respectively Set $\underline{\textbf{P}}$ with Eq. (7) Obtain x and M using the iterative fixed-point algorithm with Eqs (17) and (16) respectively Compute the second left eigenvector of MPerform sweep cut and obtain the partition set { $Q,\bar{Q}$ } with the best quality given by Eq. (26) Remove $V_{\textit{max}}$ from $C$ if $V_{\textit{max}}\in C$ $C=C\cup\{Q,\bar{Q}\}$ $V_{\textit{max}}=$ the set of nodes in the largest community of $C$ size of $C=K$ Return $C$

3.5 Incremental update of conductance measure

When computing the conductance with Eq. (26), we find that it takes $O(\beta n_{V})$ time where $\beta$ is the number of non-zeros in $\underline{\textbf{P}}$ and thus constrains the efficiency of TODA when we apply it to detect communities in large attributed graphs. Since $Q_{i+1}$ differs from $Q_{i}$ by only one node, here we introduce an incremental update for computing the conductance so that the time can be reduced to $O(\beta+n_{E}n_{V})$ given that $\sum_{m}\lvert\textit{domain}(\Lambda_{m})\rvert\ll n_{E}$ .

For two continuous partitions $Q_{i-1}$ and $Q_{i}$ ( $i>2$ ), assuming that $Q_{i-1}\cup v_{i}=Q_{i}$ , we only need to concern the changes made by $v_{i}$ . In particular, when computing the structure conductance, instead of computing $\textit{cut}_{S}(Q_{i})$ and $\textit{vol}_{S}(Q_{i})$ with Eqs (19) and (20) respectively, the incremental update of $\textit{cut}_{S}(Q_{i})$ and $\textit{vol}_{S}(Q_{i})$ only requires obtaining the results of $\sum_{v_{j},v_{k}\in Q_{i-1}}\underline{t_{S}}(i,j,k)$ and $\sum_{v_{j},v_{k}\in\bar{Q}_{i-1}}\underline{t_{S}}(i,j,k)$ , which could be achieved in $O(n_{E})$ . Hence, after some algebraic manipulations, we show that $\textit{cut}_{S}(Q_{i})$ and $vol_{S}(Q_{i})$ can be updated by

$\displaystyle\textit{cut}_{S}(Q_{i})=\textit{cut}_{S}(Q_{i-1})$ $\displaystyle\quad+3\left(\sum_{v_{j},v_{k}\in\bar{Q}_{i-1}}\underline{t_{S}}(% i,j,k)\right.$ $\displaystyle\quad\left.-\sum_{v_{j},v_{k}\in Q_{i-1}}\underline{t_{S}}(i,j,k)\right)$ (27) $\displaystyle\textit{vol}_{S}(S_{i})=\textit{vol}_{S}(Q_{i-1})+3\!\!\sum_{v_{j% },v_{k}\in}$ $\displaystyle\quad\bar{Q}_{i-1}\underline{t_{S}}(i,j,k)$ (28)

One should note that since attributed graphs are undirected, then we have

$\displaystyle\sum_{v_{j},v_{k}\in\bar{Q}_{i-1}}\underline{t_{S}}(i,j,k)=$ $\displaystyle\quad\sum_{v_{i},v_{k}\in\bar{Q}_{i-1}}\underline{t_{S}}(i,j,k)=% \!\!\!\sum_{v_{i},v_{j}\in\bar{Q}_{i-1}}\!\!\underline{t_{S}}(i,j,k)$ (29)

In this regard, there is no need for us to compute $\sum_{v_{i},v_{k}\in\bar{Q}_{i-1}}\underline{t_{S}}(i,j,k)$ and $\sum_{v_{i},v_{j}\in\bar{Q}_{i-1}}\underline{t_{S}}(i,j,k)$ . That is why we use a factor of three in Eqs (3.5) and (3.5) to avoid unnecessary computation.

Similarly, we can also update $cut_{A}(Q_{i},\Lambda_{m})$ and $vol_{A}(Q_{i},\Lambda_{m})$ in an incremental manner and it takes $O(\sum_{m}\lvert\textit{domain}(\Lambda_{m})\rvert)$ time. The update rules are given by Eqs (3.5) and (3.5) respectively.

$\displaystyle\textit{cut}_{A}(Q_{i},\Lambda_{m})=\textit{cut}_{A}(Q_{i-1},% \Lambda_{m})$ $\displaystyle\quad+3\sum_{p}\left(\sum_{v_{j},v_{k}\in\bar{Q}_{i-1}}\underline% {t_{A}}(m,i,j,k,p)\right.$ $\displaystyle\quad-\left.\sum_{v_{j},v_{k}\in Q_{i-1}}\underline{t_{A}}(m,i,j,% k,p)\right)$ (30) $\displaystyle\textit{vol}_{A}(Q_{i},\Lambda_{m})=\textit{vol}_{A}(Q_{i-1},% \Lambda_{m})$ $\displaystyle\quad+3\sum_{p}\sum_{v_{j},v_{k}\in\bar{Q}_{i-1}}\!\!\underline{t% _{A}}(m,i,j,k,p)$ (31)

3.6 Computational complexity

For each iteration that partitions $V_{\textit{curr}}$ in Algorithm 3.4, the time complexity of Lines 6–7 is $O(\beta)$ . Line 8 is to calculate the cuts and volumes for the sweep cut in spectral clustering and this procedure dominates the running time with the complexity of $O(\beta+n_{V}n_{E})$ based on our analysis in Section 3.5. To summarize, the time complexity for each iteration is $O(\beta+n_{V}n_{E})$ . Since the number of iterations to generate $K$ clusters is $K$ -1, the overall complexity of TODA is $O(K\beta+Kn_{V}n_{E})$ . One should note that the maximum value of $\beta$ is $n_{V}^{3}$ , which is rarely encountered in practice.

4. Experiment results

To evaluate the performance of TODA, we have performed a series of extensive experiments on two practical applications, including document classification and social community detection. TODA was compared with several state-of-the-art algorithms specifically developed for solving the CDAG problem and they were CODICIL [37], CESNA [10], GBAGC [38], niMM [34] and Fuzzy c-means (FCM). Among them, CODICIL is a distance-based algorithm while CESNA, GBAGC and niMM are model-based. Moreover, we also selected a classical clustering algorithm, i.e., FCM, for comparison, as it can be used to define a baseline performance level. Regarding the implementation of FCM, we adopted the Euclidean distance measure to calculate the distance between pairwise nodes in the domain of $\Lambda$ .

4.1 Evaluation metrics

To determine to what extent the communities detected by the different algorithms matched with the ground truth, we adopted two evaluation metrics, NMI and Accuracy. The reason why these two metrics were chosen is that they are both very widely used to evaluate performance of clustering algorithms [10, 14].

NMI is an acronym for Normalized Mutual Informa-tion, which is an information-theoretic measure that can compute the degree of matching between the detected communities and ground truth. Assuming that $\textbf{Z}=\{Z_{k}\}(1\leqslant k\leqslant K)$ is a known set of ground truth communities, NMI is then defined as:

$\displaystyle\textit{NMI}\!=\!\!\frac{\sum_{k=1}^{K}\sum_{l=1}^{K}n_{V_{k},Z_{% l}}\log(\frac{n_{V}n_{V_{k},Z_{l}}}{n_{V_{k}}n_{Z_{l}}})}{\sqrt{(\sum_{k=1}^{K% }n_{V_{k}}\!\log\!\frac{n_{V_{k}}}{n_{V}}\!)(\sum_{l=1}^{K}n_{Z_{l}}\!\log\!% \frac{n_{Z_{l}}}{n_{V}}\!)}}$ (32)

where $n_{V_{k}}$ is the number of nodes in $V_{k}$ , $n_{Z_{l}}$ is the number of nodes in $Z_{l}$ , and $n_{V_{k},Z_{l}}$ is the number of common nodes shared by $V_{k}$ and $Z_{l}$ .

As for the Accuracy measure, we need to find the mapping function $f:V_{k}\mapsto Z_{l}$ before we compute the measure. To determine $f:V_{k}\mapsto Z_{l}$ , we have to compute $n_{V_{k},Z_{l}}$ for all different combinations of $V_{k}$ and $Z_{l}$ and find the most matched combinations through an iterative process. For each iteration, starting from the combination with the largest value of $n_{V_{k},Z_{l}}$ , we obtain $V_{k}$ that best matches against $Z_{l}$ , then we add the mapping $V_{k}\mapsto Z_{l}$ to $f$ . Once the mapping is added, $V_{k}$ and $Z_{l}$ can be omitted in future iterations. The iteration process will end till each $V_{k}$ has a unique match in Z. Once the mapping function $f:V_{k}\mapsto Z_{l}$ is confirmed, the Accuracy measure can be defined as follows in Eq. (33).

$\displaystyle\textit{Accuracy}=\frac{\sum_{k=1}^{K}n_{V_{k},f(V_{k})}}{\sum_{k% =1}^{K}n_{V_{k}}}$ (33)

According to the definitions of NMI and Accuracy, their values are larger if the communities detected from $G$ match better with the ground truth Z. If $\{V_{k}\}$ match perfectly with those in Z, both NMI and Accuracy will have the value 1, which is the maximum value that they can take on.

To ensure that algorithms were compared at their best performances, parameters that had to be determined for them to work properly were set to the values recommended by the authors. For algorithms without such recommendation, their performances were tuned to respond best for the tasks. Regarding the parameter setting of TODA, we set the teleportation probability vector u with uniform distribution. Regarding the value of $\alpha$ , according to Eq. (17), the procedure of determining the stationary distribution x proceeds a step with probability $\alpha$ according to the equivalent first-order Markov chain and with probability 1- $\alpha$ according to a fixed distribution. Hence, $\alpha$ should be set as a value not less than 0.5 such that the first-order Markov chain plays a more significant role in determining x. Hence, we varied its value from the set $\{0.5,0.6,0.7,0.8,0.9\}$ and selected the one with the best mean performance.

All algorithms were run in exactly the same computation environment on the same machine equipped with Intel Dual-Core Processors at 2.20GHz and 16 GB of RAM.

4.2 Applications on document classification

The purpose of document classification is to classify documents into different topics. Given a collection of unlabeled documents with available link and attribute information, we attempt to apply clustering algorithms to identify clusters that are highly correlated with the true groups of the documents.

4.2.1 Experiment setup

Two classical datasets were used to compare the performance of TODA with state-of-the-art algorithms, one was the Cora dataset and the other was the Citeseer dataset. These two datasets were obtained from LINQS1 and they represented two different citation networks where nodes were scientific publications and links were added between nodes if one was cited by the other one [24]. The properties of these two real networks are described in Table 1.

Table 1
Statistics of Cora and Citeseer datasets

	$n_{V}$	$n_{E}$	$n_{\Lambda}$	$n_{Val}^{*}$	$K$
Cora	2708	5429	1	1433	7
Citeseer	3312	4732	1	3703	6

* $n_{\textit{Val}}=\sum_{m}|\textit{dom}(\Lambda_{m})|$ .

For the Cora dataset, each document was classified into one of the seven categories including Case-based Reasoning, Genetic Algorithms, Neural Networks, Probabilistic Methods, Reinforcement Learning, Theory and Rule Learning. The keywords of each publication were obtained from a dictionary of 1433 unique words and they were considered as the values of the single attribute for the corresponding node.

For the Citeseer dataset, all the documents were classified into six categories including Agents, Artificial Intelligence, Database, Information Retrieval, Machine Learning and Human-Computer Interaction. When compared with the Cora dataset, the number of keywords involved in the Citeseer dataset was much larger, as the keywords were obtained from a dictionary of 3703 unique words.

4.2.2 Results

After applying all the algorithms to the Cora and Citeseer datasets, the experiment results are reported in Table 2.

Table 2
Community detection performances on document classification

	Cora dataset		Citeseer dataset
	NMI	Accuracy	NMI	Accuracy
TODA	$\textbf{0.41}^{*}$	0.54	0.23	0.48
CODICIL	0.37	0.55	0.20	0.41
CESNA	0.34	0.59	0.06	0.42
GBAGC	0	0.30	0	0.21
niMM	0.01	0.25	0.004	0.21
FCM	0.07	0.26	0.13	0.33

* Best scores are highlighted by bold face.

For the Cora dataset, in terms of NMI, TODA yielded the best score, as it performed better by 10.8% than CODICIL, which was the second best algorithm. Although the Accuracy score of TODA was not as good as NMI, TODA was still among the top three algorithms. We also noted that GBAGC and niMM yielded small NMI scores. That is to say, these two algorithms had only limited ability to distinguish the documents in different categories. Regarding the performances of algorithms on the Citeseer dataset, TODA was the best algorithm in terms of both NMI and Accuracy.

Another point worth to noting is that CESNA performed better than TODA in terms of Accuray on Cora dataset. After investigation into clustering results, we found that CESNA did not classify all documents, as the average rate of rejection was as high as 37% for Cora and Citeseer datasets. That is to say, CESNA only considered the documents that were easy to classify to perform its clustering task. It is for this reason that CESNA performed better than TODA in terms of Accuracy on Cora dataset. We also noted that TODA obtained a better performance than CENSA on Citeseer dataset. That is to say, even CESNA disregarded many documents during the clustering, its performance was not as good as that of CESNA. Since the Citeseer dataset was sparser than Cora dataset, we may reason that CESNA is more sensitive to the sparsity of attributed graph.

4.3 Applications on social community detection

The purpose of social community detection is to identify meaningful communities from social networks. In this experiment, we applied TODA to two different social networks extracted from Facebook and Twitter respectively and discussed its performance as follows.

4.3.1 Experiment setup

The social networks of Facebook and Twitter are available for download from the Stanford Large Network Dataset Collection.2 The characteristics of these social datasets are presented in Table 3, where we could observe that the networks composed from social datasets were much larger and sparser than those obtained from the datasets of document classification. For the Facebook dataset, a set of total 27 profile attributes were extracted and the amount of possible values that could be taken by these attributes were 1406. In the Twitter dataset, a total of 33569 hashtags annotated by users in their tweets were considered as the possible attribute values for the single attribute associated with nodes.

Table 3
Statistics of Facebook and Twitter Datasets

	$n_{V}$	$n_{E}$	$n_{\Lambda}$	$n_{Val}$	$K$
Facebook	4089	170714	27	1406	193
Twitter	125120	2248406	1	33569	3140

Compared with the ground truth of these three social datasets, McAuley and Leskovec [11] extracted clusters from the ego-networks of different users in the respective social networking sites. The ego-network of a single user was defined as a personal social network. Given such an ego-network, several social circles could be identified as a subset of friends and the ego-network, and they were then processed further to obtain hand-labeled ground truth clusters.

4.3.2 Results

For the Facebook dataset, TODA and CESNA, respectively, obtained the best Accuracy and NMI scores as indicated by Table 4. We noted that the Accuracy score of TODA was better by 17.4% than that of CESNA, but the difference in NMI between TODA and CESNA was rather small. Hence, the overall performance of TODA was believed to be better than that of CESNA.

Table 4
Community detection performances on social networks

	Facebook dataset		Twitter dataset
	NMI	Accuracy	NMI	Accuracy
TODA	0.49	0.54	0.31	0.69
CODICIL	0.44	0.51	0.25	0.57
CESNA	0.51	0.46	0.20	0.65
GBAGC	0.13	0.36	0.16	0.63
niMM	0.13	0.25	0.22	0.52
FCM	0.14	0.34	0.22	0.58

* Best scores are highlighted by bold face.

When comparing the performances of all algorithms in Twitter dataset, we observed from Table 4 that the strong performance of TODA in terms of both NMI and Accuracy. In particular, the Accuracy score of TODA was better by 21.1%, 6.2%, 9.5%, 32.7% and 19% than CODICIL, CESNA, GBAGC, niMM and FCM respectively. For the NMI measure, TODA presented a significantly bigger margin against the other algorithms, as it outperformed them by 51% on mean. Moreover, CESNA rejected to detect the communities for around 18% of nodes in the Facebook dataset. It is for this reason that CESNA performed better than TODA in terms of NMI on Facebook dataset.

Lastly, we also performed the Wilcoxon test to verify the significance in the difference of NMI and Accuracy scores. The results indicated that the differences between TODA and all the other algorithms except CESNA were significant at 95% confidence interval.

In sum, we have reason to believe that the strong performance of TODA was an indication to the rationality behind the use of higher-order patterns in the structure and attribute information of attributed graph.

4.4 Comparison of running time

Comparison of running time between TODA and the other algorithms was conducted for all the datasets used in the experiments and the results were provided in Fig. 3.

Figure 3.

The comparison of running time for all algorithms.

For the datasets with small sizes (i.e., Cora and Citeseer), we noted that although the performance of GBAGC was not as accurate as the other algorithms, GBAGC was the fastest. However for the Facebook dataset, even the number of nodes was comparable to that of the Citeseer dataset, we found that the increase in the running time between these two datasets was much larger. The reason for that was that the number of edges in the Facebook dataset was considerably more than that of Citeseer. That is to say, GBAGC took much more time to simulate the generative process of edges, thus increasing the overall running time.

Regarding the running time used by niMM, it was more sensitive to the increase in the network size, as it spent much longer time than the other algorithm when applied to networks with large sizes.

For TODA, it was seen that TODA was always among the fastest two algorithms. Especially for the Facebook dataset, TODA yielded the best performance. From the curve of TODA in Fig. 3, we noted that the impact resulted from the increase in $n_{E}$ was less significant that from the increase in $n_{V}$ . The reason for that was due to the fact that most of computation required by TODA was only related the nodes instead of edges in $G$ . Even the increase in edges could make the running time longer when TODA computed the conductance in a sweep cut operation, the incremental update for the cut and volume measures considerably reduced the time. This could be verified by the running time of TODA without incremental update, which was much slower than TODA.

4.5 Case study

To better demonstrate the advantages of TODA, we selected a community detected by TODA from the Twitter dataset and gave an in-depth explanation.

Figure 4.

A community detected by TODA from the Twitter Dataset.

In Fig. 4, we show the topological structure of a ground truth community in the circle with ID 189875309 from the Twitter dataset. After investigating the corresponding community detected by each of the algorithms, we found that only TODA exactly detected this community whereas most of the other algorithms was only able to detect the part in the dashed circle. identified the part inside the dotted red circle. Nodes in the dashed circle were connected much more densely when compared to those outside the dashed circle. It is for this reason that all algorithms except TODA only detected the part in the dashed circle. However, since TODA considered the higher-order patterns involving three nodes, we noted that the three red nodes actually formed a triangle motif that could be captured by TODA according to the definition of $\underline{\textbf{T}_{S}}$ . In this regard, certain evidence was provided to TODA that the three red nodes should be detected in the same community.

We also noted that the green node was not able to compose a triangle motif, as it only had one neighboring node. That is to say, solely resting on the structure information was insufficient for us to detect the community of the green node. However, if we took into account of the attribute information, we found that the green node shared three attribute values with the other two nodes. This value was relatively higher than the entries in the attribute tensor that involved the other nodes in Fig. 4. Hence, we have reasons to believe that the green node was more likely in the community of Fig. 4 and accordingly TODA successfully detected this ground truth community. This case, again, demonstrated the importance of considering higher-order patterns from the structural and attribute perspectives when detecting communities from attributed graph.

5. Conclusion

In this work, we proposed a novel community detection algorithm, i.e., TODA, for attributed graphs by considering higher-order patterns, which are able to enhance the internal consistence in terms of both structural and attribute information for more than two nodes (i.e., three nodes in our work). In doing so, the accuracy performance of community detection can be improved as indicated by the experiment results of TODA. Moreover, the existence of higher-order patterns in the structure and attribute information allows TODA to correctly detect the communities for peripheral nodes that are not connected as dense as those in the core parts of communities. Last but not least, in order to address the CDAG problem, TODA explores the possibility of elaborating the intrinsic correlation between communities and higher-order patterns, and also is able to extract semantically meaningful communities by incorporating the attribute information of nodes.

As there are so many different structures, such as rectangle and pentagon motifs, we would like to extend our TODA algorithm to the connectivity patterns beyond three-order as one of our future works. We are also interested to verify whether the performance of TODA can be further improved if different kinds of connectivity patterns, or their combinations, are taken into account. Lastly, considering the simplicity and efficiency of probabilistic neural network [20, 22], we would like to integrate higher-order patterns into a neural dynamic classification model [23] for an improved performance of community detection.

Footnotes

http://www.cs.umd.edu/projects/linqs/projects/lbc/index.html.

http://snap.stanford.edu/data/index.html.

Acknowledgments

This research is supported in part by the National Natural Science Foundation of China under grants 61602352 and 61802317, and in part by the Pioneer Hundred Talents Program of Chinese Academy of Sciences.

References

Silva

Meira

, Jr. Zaki

. Mining attribute-structure correlated patterns in large attributed graphs. arXiv preprint arXiv: 1201.6568. 2012 Jan 31.

Baroni

Conte

Patrignani

Ruggieri

. Efficiently clustering very large attributed graphs. In 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2017 Jul 31, pp. 369-376. IEEE.

Clauset

Newman

Moore

. Finding community structure in very large networks. Physical review E. 2004 Dec 6; 70(6): 066111.

Benson

Gleich

Leskovec

. Higher-order organization of complex networks. Science. 2016 Jul 8; 353(6295): 163-166.

Benson

Gleich

Lim

. The spacey random walk: A stochastic process for higher-order data. SIAM Review. 2017; 59(2): 321-345.

Bordel

Alcarria

Martín

Sánchez-de-Rivera

. An agent-based method for trust graph calculation in resource constrained environments. Integrated Computer-Aided Engineering. 2020 Jan 1; 27(1): 37-56.

Boyd

Ellison

. Social network sites: definition, history, and scholarship. IEEE Engineering Management Review. 2010 Aug 30; 38(3): 16-31.

Hatano

Fukunaga

Maehara

Kawarabayashi

. Scalable algorithm for higher-order co-clustering via random sampling. In Proceedings of the thirty-first AAAI conference on artificial intelligence 2017 Feb 4, pp. 1992-1999.

delEtoile

Adeli

. Graph theory and brain connectivity in Alzheimer’s disease. The Neuroscientist. 2017 Dec; 23(6): 616-626.

10.

Yang

McAuley

Leskovec

. Community detection in networks with node attributes. In 2013 IEEE 13th International Conference on Data Mining. 2013 Dec 7, pp. 1151-1156. IEEE.

11.

Leskovec

Mcauley

. Learning to discover social circles in ego networks. In Advances in neural information processing systems 2012, pp. 539-547.

12.

Yuan

Liu

Xiong

Luo

. Efficiently detecting protein complexes from protein interaction networks via alternating direction method of multipliers. IEEE/ACM transactions on computational biology and bioinformatics. 2018 Jun 5; 16(6): 1922-1635.

13.

Chan

Yuan

Xiong

. A variational Bayesian framework for cluster analysis in a complex network. IEEE Transactions on Knowledge and Data Engineering. 2020 Nov 1; 32(11): 2115-2128.

14.

Chan

. Fuzzy clustering in a complex network based on content relevance and link structures. IEEE Transactions on Fuzzy Systems. 2015 Jul 24; 24(2): 456-470.

15.

Zhang

Pan

Yan

You

. HiSCF leveraging higher-order structures for clustering analysis in biological networks. Bioinformatics. 2020 Sep 15.

16.

Newman

. The structure and function of complex networks. SIAM review. 2003; 45(2): 167-256.

17.

Ahmadlou

Adeli

. New diagnostic EEG markers of the Alzheimer’s disease using visibility graph. Journal of neural transmission. 2010 Sep 1; 117(9): 1099-109.

18.

Ahmadlou

Adeli

. Graph theoretical analysis of organization of functional brain networks in ADHD. Clinical EEG and neuroscience. 2012 Jan; 43(1): 5-13.

19.

Ahmadlou

Adeli

. Visibility graph similarity: A new measure of generalized synchronization in coupled dynamic systems. Physica D: Nonlinear Phenomena. 2012 Feb 15; 241(4): 326-332.

20.

Ahmadlou

Adeli

. Improved visibility graph fractality with application for the diagnosis of autism spectrum disorder. Physica A: Statistical Mechanics and its Applications. 2012 Oct 15; 391(20): 4720-4726.

21.

Ahmadlou

Adeli

. Functional community analysis of brain: A new approach for EEG-based investigation of the brain pathology. Neuroimage. 2011 Sep 15; 58(2): 401-408.

22.

Ahmadlou

Adeli

. Enhanced probabilistic neural network with local decision circles: A robust classifier. Integrated Computer-Aided Engineering. 2010 Jan 1; 17(3): 197-210.

23.

Rafiei

Adeli

. A new neural dynamic classification algorithm. IEEE transactions on neural networks and learning systems. 2017 Jul 25; 28(12): 3074-83.

24.

Sen

Namata

Bilgic

Getoor

Galligher

Eliassi-Rad

. Collective classification in network data. AI magazine. 2008 Sep 6; 29(3): 93.

25.

Gorbunov

Rauterberg

Barakova

. A cognitive model of social preferences in group interactions. Integrated Computer-Aided Engineering. 2019 Jan 1; 26(2): 185-196.

26.

Strogatz

. Exploring complex networks. Nature. 2001 Mar; 410(6825): 268-276.

27.

Fortunato

. Community detection in graphs. Physics reports. 2010 Feb 1; 486(3-5): 75-174.

28.

Wang

Shen

. Classification of diffusion tensor metrics for the diagnosis of a myelopathic cord using machine learning. International journal of neural systems. 2018 Mar 23; 28(2): 1750036.

29.

Yang

Jin

Chi

Zhu

. Combining link and content for community detection: a discriminative approach. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009 Jun 28, pp. 927-936.

30.

Benson

Gleich

. General tensor spectral co-clustering for higher-order data. InAdvances in Neural Information Processing Systems 2016, pp. 2559-2567.

31.

Bîlbîe

Păun

Pan

Neri

. Simplified and yet Turing universal spiking neural P systems with communication on request. International journal of neural systems. 2018 Oct 15; 28(08): 1850013.

32.

Liu

Chan

Ong

. Contextual correlation preserving multiview featured graph clustering. IEEE transactions on cybernetics. 2019 Jul 19.

33.

Bai

Ong

. Manifold Regularized Stochastic Block Model. In 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), 2019 Nov 4, pp. 800-807. IEEE.

34.

Fan

Da Xu

Cao

Song

. Learning nonparametric relational models by conjugately incorporating node information in a network. IEEE transactions on cybernetics. 2016 Feb 11; 47(3): 589-599.

35.

Luo

Zhou

Shang

. Non-negativity constrained missing data estimation for high-dimensional and sparse matrices from industrial applications. IEEE transactions on cybernetics. 2019 Feb 27; 50(5): 1844-1855.

36.

Zhou

Cheng

. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment. 2009 Aug 1; 2(1): 718-729.

37.

Ruan

Fuhry

Parthasarathy

. Efficient community detection in large networks using content and links. In Proceedings of the 22nd international conference on World Wide Web, 2013 May 13, pp. 1089-1098.

38.

Wang

Cheng

. GBAGC: A general bayesian framework for attributed graph clustering. ACM Transactions on Knowledge Discovery from Data. 2014 Aug 25; 9(1): 5.

Exploiting higher-order patterns for community detection in attributed graphs

Abstract

Keywords

2. Preliminaries

3.1 Tensor representation of higher-order patterns

4. Experiment results

4.1 Evaluation metrics

4.2.1 Experiment setup

Table 1 Statistics of Cora and Citeseer datasets

Table 2 Community detection performances on document classification

4.3.1 Experiment setup

Table 3 Statistics of Facebook and Twitter Datasets

Table 4 Community detection performances on social networks

Footnotes

Acknowledgments

References

Table 1
Statistics of Cora and Citeseer datasets

Table 2
Community detection performances on document classification

Table 3
Statistics of Facebook and Twitter Datasets

Table 4
Community detection performances on social networks