Compression of community graph using graph mining techniques

Abstract

Representation of any network graphically has vast applications and used for knowledge extraction efficiently. Due to the increase in applications of a graph, the size of the graph becomes larger as well as its complexity becomes more and more. So visualization and analyzing of a large community graph are more challenging. Hence compression technique may be used to study a large community graph for knowledge extraction. During compression, there should not be any loss of information. This paper proposes an algorithm, “ComComGra” which compresses a large community graph with various communities using graph mining techniques. The proposed algorithm elaborates with two examples which include a benchmark example.

Keywords

Community community graph community members compressed community graph self-edge

1. Introduction

The applications on graphs are increasing day by day rapidly which leads to increase in size and complexities of the graph. So representation of a large community graph in the memory is a very challenging task. So the direct visualization of a large community graph is beyond the human capability. Hence the visualization can be achieved easily by compressing a large community graph into a smaller one which should contain all the nodes information of the community graph. So to represent and compress of such kind of community graph without loss of any knowledge opens a new challenge to graph mining. The compressed community graph should contain all the information of the original community graph. So that visualization and the process of extraction of knowledge could be more efficient and easy.

The paper starts with a formal introduction followed by literature survey related to the compressed graph models representation as well as the overview of the Greedy Algorithm which is an existing technique of compression of the graph. The proposed algorithm has been discussed followed by experimental results of two suitable examples. Finally, the paper concludes with a general conclusion.

2. Literature survey

A general compressed graph a non-weighted graph $G=(V_{G}$ , $E_{G}$ ), the compressed representation $R=(S,C)$ , which consists of a graph summary $S=(V_{S}$ , $E_{S}$ ) and a set of edge corrections $C$ . Every node $V$ in $V_{G}$ belongs to a node $V$ in $V_{G}$ which represents a set of nodes of $G$ . An edge $E=(V_{i}$ , $V_{j}$ ) in $E_{S}$ represents the set of all edges connecting all pairs of nodes in $V_{i}$ and $V_{j}$ . In edge corrections C, +e means adding e the edge and -e means deleting the edge while recreation of the original graph [8]. MDL defines the cost of a representation as the sum of the storage costs of its two components, i.e., cost(R) $=|E_{S}|+|C|$ . The cost of mapping of nodes onto nodes is ignored since the cost is quite small compared to the storage costs of the edge sets $E_{S}$ and $C$ . The edge sets $E_{S}$ and the edge corrections C are only determined by the internal structure of nodes [8]. The approximate representation is the exact neighbor set of a node can be replaced by an approximate neighbor set. The error means that a neighbor of node V may not be V’s neighbor in the representation, or vice-versa [8]. The label compatible representation only deals with unlabeled graphs. However, edges have labels. In biological networks, where nodes are biological entities, such as genes, protein, and edges have meaning, such as “codes for”, “functionally associated to”, and “belongs to”. Therefore, it is more meaningful when compressing a graph that these labels are taken into consideration [14]. The cost of grouping nodes algorithm is the cost of storing the information between one node $V_{i}$ and one of its adjacent node $V_{j}$ is $C_{ij}=\min{\{}|A_{ij}|$ , 1 $+$ $|\pi_{ij}|$ – $|A_{ij}|{\}}$ . If $V_{i}$ and $V_{j}$ are not adjacent, then cost becomes 0, i.e., $C_{ij}=$ 0. So the cost of one node $C_{i}$ is the sum of the cost of storing the information between $V_{i}$ and all it’s adjacent, i.e., $C_{i}=\sum_{j=1}^{N_{vi}}C_{ij}$ where $N_{vi}$ is the number of adjacent nodes to $V_{i}$ . So the cost reduction for any pair of nodes ( $u, v$ ) when combining into a new node $w$ is $S(u,v)=\frac{\text{Cu}+\text{Cv}-\text{Cw}}{\text{Cu}+\text{Cv}}$ [8]. A clustering algorithm, where group members have similar characteristics and can be represented by super nodes addresses a graph compression problem. The graph partitioning algorithms [1, 7] are used to detect hidden community structures. Most of these algorithms are mainly focused on the distance matrix rather focusing on the structure of the original graph. Another feasible technique is to construct a small graph consisting of a set of the most important nodes and edges, which ranks the centrality of nodes and edges [12, 11]. The information-theoretic technique [5] is to construct both lossy and lossless compressed graph representations.

The compressed representation of the graph has two parts. The first part is the graph summary that collects the important communities and relationships of the original community graph and the second one is the collection of the set of edge corrections which help to re-create the original community graph from the compressed community graph. The greedy algorithm [13] is to iteratively group two nodes with the highest cost reduction. The Greedy algorithm has three phases: Initialization, Iterative merging, and Output. In the initialization phase, the cost reduction $S$ for all pairs of nodes will be computed. In each round of merging phase, the globally best pair of nodes will be chosen, so node $u$ and $v$ will be removed from $V_{S}$ , and a new node $W$ will be added into $V_{S}$ . In the output phase, the graph summary edge set, $E_{S}$ , and the edge corrections, $C$ , will be constructed. The Greedy algorithm can construct a much higher-level compressed graph for a given graph. A large web graph can be compressed into a smaller graph using BFS (breadth-first search) technique is proposed by [1]. The compression of an undirected weighted graph is proposed by [9].

3. Proposed algorithm

3.1 Algorithm for compression of community graph

Algorithm ComComGra( )

n: To assign number of communities.

NCM[n] [3]: To assign community code, number of community members of each community, and community members code.

tcm: To assign the total number of community members of the community graph.

CMM[tcm][tcm]: Adjacency matrix of the Community Graph.

CCM[n][n]: Adjacency matrix of the Compressed Community Graph.

CODE[]: To assign all the community members code.

CommData.Txt: Dataset file of community graph consists of a total number of communities, the community codes, a total number of community members, and the community members’ code.

EdgeData.Txt: Dataset file contains the edge details of community members i.e. ‘From Community Member Code’ and ‘To Community Member Code’.

ComAdjMa.Txt: Data file to write the adjacency matrix of the community graph.

{

// to read community data and edge data of community

// members

ReadComData( );

// to assign each communities ’community members code’

CoMeCodes( );

// to create the community adjacency matrix

ComMemMatrix( );

// to write the community adjacency matrix in a file

WriteMemMatrix( );

// to count edges of the same community graph

SameComEdgeCount( );

// to count edges between dissimilar community graph

DissimilarComEdgeCount( );

// to display the adjacency matrix of compressed community

// graph

CompressedComMatDisplay( );

}

3.2 Procedure to read community graph data and its edge data

Procedure ReadComData( )

{

// to open the community data file

open(“CommData.Txt”);

// to read total number of communities from the data

file

// “CommData.Txt”

read(n);

x:=1;

tcm:=0;

// to read ‘n’ communities details such as community

code,

// number of community members, and community

// member’s code from the data file “CommData.Txt”

for i:=1 to (n+1) do

{

read(NCM[i][1], NCM[i][2]);

// to count the total number communities

tcm:= tcm + NCM[i][2];

for j:=1 to NCM[i][2] do

{

read(NCM[i][j+2]);

code[x]:=NCM[i][j+2];

x:=x+1;

}

// to close the community data file

close(“CommData.Txt”);

}

3.3 Procedure to assign community member codes

Procedure CoMeCodes( )

{

k:=2;

for i:=1 to (n+1) do

{

for j:=1 to NCM[i][2] do

{

CMM[1][k] := NCM[i][j+2];

CMM[k][1] := NCM[i][j+2];

k:=k+1;

}

// to assign the Community codes in CCM[][]

for i:=2 to (n+1) do

{

CCM[i][1] := NCM[i-1][1];

CCM[1][i] := NCM[i-1][1];

}

3.4 Procedure to create community adjacency matrix

Procedure ComMemMatrix ( )

{

open(“EdgeData.Txt”); // to open the edge data file

row:=1; col:=1;

while(Not EOF())

{

// to read the ‘From Community Member Code’

read(node1);

// to read the ’To Community Member Code’

read(node2);

for i:=1 to tcm do // row-side code detection

{

if(code[i]=node1) then break;

}

for j:=1 to tcm do // column-side code detection

{

if(code[j]=node2) then break;

}

// to assign the edge value 1 at (i+1) row and (j+1)

column

CMM[i+1][j+1]:=CMM[j+1][i+1]:=1;

}

close(“EdgeData.Txt”); // to close the edge data

file

}

3.5 Procedure to write the community adjacency matrix in file

Procedure WriteMemMatrix( )

{

// to open the file for writing the community

adjacency matrix

open(“ComAdjMa.Txt”);

for i:=1 to (tcm+1) do

{

for j:=1 to (tcm+1) do

{

if(i=1 and j=1) then write(“C”);

else write(CMM[i][j]);

}

close(“ComAdjMa.Txt”); // to close the file

}

3.6 Procedure to count edges of the same community graph

Procedure SameComEdgeCount ( )

{

d:=1; s:=0;

for i:=1 to n do

{

s := s + NCM[i][2];

for j:=d to s do

{

for k:=d to s do

{

// to check the Edge at CMM[j+1][k+1]

if (CMM[j+1][k+1]=1) then

{

CCM[i+1][i+1]:=CCM[i+1][i+1]+1;

}

d:=s;

// to find the actual number of edges in the

undirected graph

CCM[i+1][i+1]:=CCM[i+1][i+1]/2;

}

3.7 Procedure to detect edges of the dissimilar community graph

Procedure DissimilarComEdgeCount ( )

{

a:=1;

b:=NCM[1][2];

c:=b;

d:=b;

for i:=2 to (n+1) do

{

d := d + NCM[i][2];

Addition(i-1, a, b, c, d);

a := b;

b := b + NCM[i][2];

c := d;

}

3.8 Procedure to display the adjacency matrix of compressed community graph

Procedure CompressedComMatDisplay ( )

{

for i:=1 to (n+1) do

{

for j:=1 to (n+1) do

{

if(i=1 and j=1) then display(“C”);

else display(CCM[i][j]);

}

3.9 Procedure to assign the counted dissimilar community edges

Procedure Addition(p, a, b, c, d)

a, b: Initial and final value of row index.

c, d: Initial and final value of column index.

p: Initial index of the matrix CCM[n][n].

{

x := c;

y := d;

k := p+1;

for i:=a to b do

{

k:=p+1;

Smiley:

for j:=c to d do

{

if(CMM[i+1][j+1]=1)

{

// to count and assign the dissimilar community

edges

// row-side

CCM[p+1][k+1]:=CCM[p+1][k+1]+1;

// to count and assign the dissimilar community

edges

// column-side

CCM[k+1][p+1]:=CCM[k+1][p+1]+1;

}

k:=k+1;

if(d<tcm)

{

c := d;

d := d + NCM[k][2];

goto Smiley;

}

c:=x; d:=y;

}

The proposed algorithm, ComComGra has seven procedures. The 1 ${}^{\rm st}$ procedure, ReadComData( ) is to open the dataset file, “CommData.Txt” for reading the details of the community graph such as the ‘total number of communities’, ‘community codes’, ‘total number of community members’, and ‘community members code’ respectively. These details are assigned to the variable, n, and to the matrix, NCM[i][1], NCM[i][2], and NCM[i][j+2] respectively. The total numbers of community members are computed and assigned to the variable, tcm. Then all the community members’ respective member codes are assigned to the array, code[x].

The 2 ${}^{\rm nd}$ procedure, ComeCodes( ) is to assign the community members’ code in the community member matrix, CMM[tcm][tcm] from the matrix, NCM[i][j+2]. Similarly the community codes are assigned to the compressed community matrix, CCM[n] [n] from the matrix, NCM[i-1][1].

The 3 ${}^{\rm rd}$ procedure, ComMemMatrix( ) is to open the dataset file, “EdgeData.txt” for reading the details about the edges among the members of the community graph i.e., the ‘From Community Member Code’ and the ‘To Community Member Code’ respectively. These details are assigned to the variables, node1 and node2, and are considered as the member codes assigned to row-side and column-side of the matrix, CMM[tcm][tcm] to form the actual edge of the community graph with an assignment of a value 1. So this procedure is about the creation of community member matrix, CMM[tcm][tcm] which only consists of 0s and 1s.

The 4 ${}^{\rm th}$ procedure, WriteMemMatrix( ) is to open a text file “ComAdjMa.Txt” for writing the community member matrix, CMM[tcm][tcm].

The 5 ${}^{\rm th}$ procedure, SameComEdgeCount( ) is to detect and count those edges between the community members of a particular community and assigned to the compressed community matrix, CCM[n][n]. However, these counting’s are assigned diagonally to the matrix, CCM[n][n] which is considered as the same community’s total number of members edge.

The 6 ${}^{\rm th}$ procedure, DissimilarComEdgeCont( ) is to call another procedure, Addition(i-1, a, b, c, d), where it is to detect and count the edges between the community members of dissimilar communities. However, these details are assigned to the row-side and column-side of the compressed community matrix, CCM[n][n].

Finally, the procedure, CompressedComMatDisplay ( ) is to display the compressed community matrix, CCM[n][n] of the community graph as result. The running time complexity of the algorithm is O(n ${}^{3}$ ).

4. Explanation and analysis

The authors have considered a community graph [2, 3, 4, 5, 6] as an example with twenty-three numbers of community members belonging to four types of communities or clusters i.e., $C_{1}$ , $C_{2}$ , $C_{3}$ , and $C_{4}$ depicted in Fig. 1. Each community or cluster is considered as a sub-graph. The community C ${}_{1}$ has six community members with community member codes {1, 2, 3, 4, 5, 6}. Similarly the community $C_{2}$ has five community members with community member codes {7, 8, 9, 10, 11}, the community $C_{3}$ has four community members with community member codes {12, 13, 14, 15} and $C_{4}$ has eight community members with community member codes {16, 17, 18, 19, 20, 21, 22, 23} respectively. Here the authors aim is to compress the above community graph to make as one compressed community graph. However, the community graph has two types of edges i.e., the edge between the community members of the same community (represented in black color edge) and the edges between the community members of dissimilar communities (represented in gray color edge).

Figure 1.

Community graph.

Figure 2.

Adjacency matrix of the community graph

To represent the above community graph in the memory, the authors require an adjacency matrix of order 24 $\times$ 24 depicted in Fig. 2. However, the 1 ${}^{\rm st}$ row and 1 ${}^{\rm st}$ column are exclusively used for assignment of community member codes. The communities $C_{1}$ , $C_{2}$ , $C_{3}$ , and $C_{4}$ have their respective sub-adjacency matrices which are represented in gray color filled boxes. However, the community graph has two types of edges i.e., similar community’s members’ edge and dissimilar community’s members’ edge. The 1 ${}^{\rm st}$ type of edges of community $C_{1}$ , $C_{2}$ , $C_{3}$ , and $C_{4}$ are represented as sub-adjacency matrices and are shown in gray color filled boxes depicted in Fig. 2. So these edges are counted and assigned diagonally in the matrix, CCM[][]. For the above community graph, it is 9, 6, 5, and 14 edges respectively and depicted in Fig. 3.

Figure 3.

Compressed adjacency matrix.

Figure 4.

Compressed community graph of community graph.

Similarly, the 2 ${}^{\rm nd}$ type of edges of community $C_{1}$ , $C_{2}$ , $C_{3}$ , and $C_{4}$ are shown in black color filled boxes depicted in Fig. 2. For example, there are two edges available from community $C_{1}$ to community $C_{2}$ which are created from $C_{1}$ ’s member 3 to $C_{2}$ ’s member 11 and $C_{1}$ ’s member 5 to $C_{2}$ ’s member 7 respectively. So the total numbers of edges from community $C_{1}$ to community { $C_{2}$ , $C_{3}$ , $C_{4}$ } are 2, 3, and 4 respectively. Similarly, from community $C_{2}$ to $C_{4}$ and from community $C_{3}$ to $C_{4}$ , the total numbers of edges are 5 and 2 respectively. So these types of edges are counted and assigned row-wise and column-wise in the matrix, CCM[ ][ ] depicted in Fig. 3.

Figure 5.

Dataset of community graph.

Finally, the authors draw the compressed community graph from the compressed community adjacency matrix depicted in Fig. 4. However, the community $C_{1}$ , $C_{2}$ , $C_{3}$ , and $C_{4}$ has self-edges or self-loops or self-cycles, which are the indication of the total number of edges present among its members. For the above community graph, the total number of self-edges of $C_{1}$ , $C_{2}$ , $C_{3}$ , and $C_{4}$ are 9, 6, 5, and 14 respectively.

Figure 6.

Edge dataset of community graph.

5. Experimental results

5.1 Example-I

The authors have considered Fig. 1 as the first example of community graph and have created two dataset files namely “COMDATA1.TXT” and “EDGE1. TXT”. The dataset “COMDATA1.TXT” has data about the community graph such as ‘Total Number of Communities’ (1 ${}^{\rm st}$ row), ‘Community Codes’ (1 ${}^{\rm st}$ column, 2 ${}^{\rm nd}$ row onwards), ‘Total Number of Community Members’ (2 ${}^{\rm nd}$ column, 2 ${}^{\rm nd}$ row onwards), and ‘Community Member Codes’ (2 ${}^{\rm nd}$ row, 3 ${}^{\rm rd}$ column onwards) depicted in Fig. 5. Similarly, the 2 ${}^{\rm nd}$ dataset “EDGE1.TXT” has data about the edge details between the community members of the community graph depicted in Fig. 6. So the actual edge is formed between the ’From Community Member Code’ and ‘To Community Member Code’ respectively.

Replace “CommData.Txt” with “COMDATA1.TXT” and “EdgeData.Txt” with “EDGE1.TXT” and input these dataset files to the algorithm depicted in Fig. 7, it successfully writes the adjacency matrix of the community graph in the data file namely “ComAdjMa.Txt” depicted in Fig. 8. Then the algorithm uses the data from the data file “ComAdjMa.Txt” and successfully creates the compressed adjacency matrix of the community graph depicted in Fig. 9. Using Fig. 9, the authors have drawn successfully the compressed community graph depicted in Fig. 4.

5.2 Result-I

Figure 7.

Input of dataset files.

Figure 8.

Adjacency matrix of the community graph.

Figure 9.

Adjacency matrix of the compressed community graph.

Figure 10.

Football team graph.

5.3 Example-II (Benchmark)

The authors have considered the second example as football team graph which is a benchmark example [10] depicted in Fig. 10. The football team graph has twelve football communities with community codes $C_{1}$ , $C_{2}$ , $C_{3}$ , $C_{4}$ , $C_{5}$ , $C_{6}$ , $C_{7}$ , $C_{8}$ , $C_{9}$ , $C_{10}$ , $C_{11}$ , and $C_{12}$ respectively. There are a total of 115 football teams. Each team is considered as a node and each node is assigned with a code ranging from 1 to 115. These 115 football teams are clustered into twelve numbers of communities. The community $C_{1}$ has nine numbers of football teams and the football team codes are 2, 26, 34, 38, 46, 90, 104, 106, and 110 respectively. Similarly, from the community $C_{2}$ to the community $C_{12}$ , the total numbers of football teams are 8, 11, 12, 10, 5, 13, 8, 10, 12, 7, and 10 respectively. The authors have created two

Figure 11.

Dataset of football team graph.

Figure 12.

Edge dataset of football team graph.

Figure 13.

Input dataset file.

dataset files namely “COMDATA2.TXT” and “EDGE2.TXT” for the football team graph. The dataset “COMDATA2.TXT” has data about the football team graph such as ‘Total Number of Football Team Communities’ (1 ${}^{\rm st}$ row), ‘Football Community Codes’ (1 ${}^{\rm st}$ column, 2 ${}^{\rm nd}$ row onwards), ‘Total Number of Football Teams’ (2 ${}^{\rm nd}$ column of 2 ${}^{\rm nd}$ row onwards), and ‘Football Team Codes’ (2 ${}^{\rm nd}$ row, 3 ${}^{\rm rd}$ column onwards) depicted in Fig. 11. Similarly, the 2 ${}^{\rm nd}$ dataset “EDGE2.TXT” has data about the edge (matches played) details between the footballs teams of the football team graph depicted in Fig. 12. The edge is the actual match played between the football teams and is formed between the ‘From Football Team Code’ and ‘To Football Team Code’ respectively. Replace“CommData.Txt” and “EdgeData.Txt” with “COMDATA2.TXT” and “EDGE2.TXT”. Then input these dataset files to the algorithm depicted in Fig. 13. Then the algorithm successfully creates a data file “ComAdjMa.Txt” which writes the edge value between the football teams as 1s of the football team graph depicted in Fig. 14. Then the algorithm uses the data from the data file “ComAdjMa.Txt” and compresses the adjacency matrix of the football team graph depicted in Fig. 15. Then the authors have drawn successfully the compressed football team graph from Fig. 15 depicted in Fig. 16.

The algorithm was written in C++ and compiled with DevC++. The experiment was run on Intel Core I5-3230M CPU + 2.60 GHz Laptop with 4 GB memory running MS-Windows 7.

5.4 Result-II

Figure 14.

Adjacency matrix of the football team graph.

Figure 15.

Compressed adjacency matrix of the football team graph.

Figure 16.

Compressed football team graph.

To represent the compressed graph in compact nature, and allow for both lossless and lossy graph compression with bounds on the introduced error proposed by [13]. With the combination of MDL principle, it represents in a highly intuitive coarse-level graph summary. It also developed two algorithms, Greedy and Randomized. Greedy repeatedly picks the best pair of nodes to merge in the entire graph and outputs a highly compressed representation of the graph. Randomized performs the best merge on a randomly selected node.

To compress a large scale web graph into a smaller one using BFS technique is proposed by [1]. This method of compression is based on the topological structure of the Web Graph rather than on the underlying URLs. It has two phases of compression. In the first phase, it traverses in BFS way to assign the indices to the traversal node in a traversal list for compression. The second phase for compression of web graph using the traversal list data. To compress a weighted undirected graph is proposed by [14]. It compresses a weighted graph with a pair of original nodes is connected by an edge if their super nodes are connected by one, and that the weight of an edge is approximated to be the weight of the super edge.

The proposed algorithm, ComComGra considers a community graph with a set of sub-community graphs and there is a relationship among the members of the same communities as well as the dissimilar communities. Here a sub-graph means a cluster or a community having its own set of members. During the compression process, a cluster or a community is treated as one node. Hence the total number of communities or clusters will be the total number of nodes in the compressed community graph irrespective of the total number of community members present in the original community graph. This technique is completely different from the above-explained existing techniques and adopts the graph-theoretic concept. The running time complexity of the proposed algorithm is O(n ${}^{3}$ ).

6. Conclusions

This paper is an extended work on proposing an efficient algorithm for compression of a large community graph with a set of sub-community graphs. The earlier literature, findings, and observations are already available in the article [7]. The authors have compressed a large community graph into a compressed community graph using the concepts of graph technique, especially by detecting and counting the edges between the nodes of similar communities as well as dissimilar communities. A simple graph technique has been used for compression of a large community graph in an efficient way. The running time complexity of the proposed algorithm, ComComGra is O(n ${}^{3}$ ). An appropriate benchmark example from social community network background has been compressed and observed the satisfactory result. Finally, the paper concludes with focusing on the process of compression of a large community graph without loss of information of the original community graph.

References

Apostolico

and Drovandi

, Graph Compression by BFS, Published in Algorithms 2009, 2, 2009, pp. 1031–1044.

Rao

and Mitra

, An Approach to Merging of two Community Sub-Graphs to form a Community Graph using Graph Mining Techniques, In Proceedings of 2014 IEEE ICCIC-2014, Coimbatore, India, 2014, pp. 460–466.

Rao

and Mitra

, A new approach for detection of common communities in a social network using graph mining techniques, In Proceedings of IEEE ICHPCA-2014, Bhubaneswar, India, 2014, pp. 1–6.

Rao

Mitra

and Narayana

, An approach to study properties and behavior of Social Network using Graph Mining Techniques, In Proceedings of DIGNATE 2014: ETEECT 2014, India, 2014, pp. 13–17.

Rao

Mitra

and Mondal

, Algorithm for Retrieval of Sub-Community Graph from a Compressed Community Graph using Graph Mining Techniques, 3rd ICRTC-2015, SRM University, Delhi, India, Published in Elsevier Procedia Computer Science 57 (2015), pp. 678–685.

Rao

and Mitra

, A Proposed Algorithm for Partitioning Community Graph into Sub-Community Graphs using Graph Mining Techniques, 3

{}^{\rm rd}

ICACNI 2015, 2015, KIIT University, Bhubaneswar, India, published in Smart Innovation, Systems and Technologies, 2015, pp. 3–15.

Rao

Mitra

and Acharjya

D.P.

, A new Approach of Compression of Large Community Graph Using Graph Mining Techniques, 3

{}^{\rm rd}

ERCICA 2015, Bangalore, India, Springer Verlag, volume 1, 2015, pp. 127–136.

Zhou

, Graph Compression, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.173.5857&rep=rep1&type=pdf.

Toivonen

Zhou

Hartikainen

and Hinkka

, Compression of Weighted Graphs, In Proceedings of KDD’11, San Diego, California, USA, 2011.

10.

Girvan

and Newman

M.E.J.

, Community structure in social and biological networks, In Proceedings of the National Academy of Sciences of the United States of America 99(12) (2002), 7821–7826.

11.

Newman

M.E.J.

and Girvan

, Finding and evaluating community structure in networks, Physical Review E 69 (2004).

12.

Gert

, The centrality index of a graph, Psychometrika 31(4) (1966), 581–603.

13.

Navlakha

Rastogi

and Shrivastava

, Graph summarization with bounded error, In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, New York, USA, ACM, 2008, pp. 419–432.

14.

Tian

Hankins

and Patel

, Efficient aggregation for graph summarization, In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, New York, NY, USA, ACM, 2008, pp. 567–580.