Abstract
Representation of any network graphically has vast applications and used for knowledge extraction efficiently. Due to the increase in applications of a graph, the size of the graph becomes larger as well as its complexity becomes more and more. So visualization and analyzing of a large community graph are more challenging. Hence compression technique may be used to study a large community graph for knowledge extraction. During compression, there should not be any loss of information. This paper proposes an algorithm, “ComComGra” which compresses a large community graph with various communities using graph mining techniques. The proposed algorithm elaborates with two examples which include a benchmark example.
Introduction
The applications on graphs are increasing day by day rapidly which leads to increase in size and complexities of the graph. So representation of a large community graph in the memory is a very challenging task. So the direct visualization of a large community graph is beyond the human capability. Hence the visualization can be achieved easily by compressing a large community graph into a smaller one which should contain all the nodes information of the community graph. So to represent and compress of such kind of community graph without loss of any knowledge opens a new challenge to graph mining. The compressed community graph should contain all the information of the original community graph. So that visualization and the process of extraction of knowledge could be more efficient and easy.
The paper starts with a formal introduction followed by literature survey related to the compressed graph models representation as well as the overview of the Greedy Algorithm which is an existing technique of compression of the graph. The proposed algorithm has been discussed followed by experimental results of two suitable examples. Finally, the paper concludes with a general conclusion.
Literature survey
A general compressed graph a non-weighted graph
The compressed representation of the graph has two parts. The first part is the graph summary that collects the important communities and relationships of the original community graph and the second one is the collection of the set of edge corrections which help to re-create the original community graph from the compressed community graph. The greedy algorithm [13] is to iteratively group two nodes with the highest cost reduction. The Greedy algorithm has three phases: Initialization, Iterative merging, and Output. In the initialization phase, the cost reduction
Proposed algorithm
Algorithm for compression of community graph
n: To assign number of communities.
NCM[n] [3]: To assign community code, number of community members of each community, and community members code.
tcm: To assign the total number of community members of the community graph.
CMM[tcm][tcm]: Adjacency matrix of the Community Graph.
CCM[n][n]: Adjacency matrix of the Compressed Community Graph.
CODE[]: To assign all the community members code.
CommData.Txt: Dataset file of community graph consists of a total number of communities, the community codes, a total number of community members, and the community members’ code.
EdgeData.Txt: Dataset file contains the edge details of community members i.e. ‘From Community Member Code’ and ‘To Community Member Code’.
ComAdjMa.Txt: Data file to write the adjacency matrix of the community graph.
{
// to read community data and edge data of community
// members
ReadComData( );
// to assign each communities ’community members code’
CoMeCodes( );
// to create the community adjacency matrix
ComMemMatrix( );
// to write the community adjacency matrix in a file
WriteMemMatrix( );
// to count edges of the same community graph
SameComEdgeCount( );
// to count edges between dissimilar community graph
DissimilarComEdgeCount( );
// to display the adjacency matrix of compressed community
// graph
CompressedComMatDisplay( );
}
Procedure to read community graph data and its edge data
Procedure ReadComData( )
{
// to open the community data file
open(“CommData.Txt”);
// to read total number of communities from the data
file
// “CommData.Txt”
read(n);
x:=1;
tcm:=0;
// to read ‘n’ communities details such as community
code,
// number of community members, and community
// member’s code from the data file “CommData.Txt”
for i:=1 to (n+1) do
{
read(NCM[i][1], NCM[i][2]);
// to count the total number communities
tcm:= tcm + NCM[i][2];
for j:=1 to NCM[i][2] do
{
read(NCM[i][j+2]);
code[x]:=NCM[i][j+2];
x:=x+1;
}
}
// to close the community data file
close(“CommData.Txt”);
}
Procedure to assign community member codes
Procedure CoMeCodes( )
{
k:=2;
for i:=1 to (n+1) do
{
for j:=1 to NCM[i][2] do
{
CMM[1][k] := NCM[i][j+2];
CMM[k][1] := NCM[i][j+2];
k:=k+1;
}
}
// to assign the Community codes in CCM[][]
for i:=2 to (n+1) do
{
CCM[i][1] := NCM[i-1][1];
CCM[1][i] := NCM[i-1][1];
}
}
Procedure to create community adjacency matrix
Procedure ComMemMatrix ( )
{
open(“EdgeData.Txt”); // to open the edge data file
row:=1; col:=1;
while(Not EOF())
{
// to read the ‘From Community Member Code’
read(node1);
// to read the ’To Community Member Code’
read(node2);
for i:=1 to tcm do // row-side code detection
{
if(code[i]=node1) then break;
}
for j:=1 to tcm do // column-side code detection
{
if(code[j]=node2) then break;
}
// to assign the edge value 1 at (i+1) row and (j+1)
column
CMM[i+1][j+1]:=CMM[j+1][i+1]:=1;
}
close(“EdgeData.Txt”); // to close the edge data
file
}
Procedure to write the community adjacency matrix in file
Procedure WriteMemMatrix( )
{
// to open the file for writing the community
adjacency matrix
open(“ComAdjMa.Txt”);
for i:=1 to (tcm+1) do
{
for j:=1 to (tcm+1) do
{
if(i=1 and j=1) then write(“C”);
else write(CMM[i][j]);
}
}
close(“ComAdjMa.Txt”); // to close the file
}
Procedure to count edges of the same community graph
Procedure SameComEdgeCount ( )
{
d:=1; s:=0;
for i:=1 to n do
{
s := s + NCM[i][2];
for j:=d to s do
{
for k:=d to s do
{
// to check the Edge at CMM[j+1][k+1]
if (CMM[j+1][k+1]=1) then
{
CCM[i+1][i+1]:=CCM[i+1][i+1]+1;
}
}
}
d:=s;
// to find the actual number of edges in the
undirected graph
CCM[i+1][i+1]:=CCM[i+1][i+1]/2;
}
}
Procedure to detect edges of the dissimilar community graph
Procedure DissimilarComEdgeCount ( )
{
a:=1;
b:=NCM[1][2];
c:=b;
d:=b;
for i:=2 to (n+1) do
{
d := d + NCM[i][2];
Addition(i-1, a, b, c, d);
a := b;
b := b + NCM[i][2];
c := d;
}
}
Procedure to display the adjacency matrix of compressed community graph
Procedure CompressedComMatDisplay ( )
{
for i:=1 to (n+1) do
{
for j:=1 to (n+1) do
{
if(i=1 and j=1) then display(“C”);
else display(CCM[i][j]);
}
}
}
Procedure to assign the counted dissimilar community edges
Procedure Addition(p, a, b, c, d)
a, b: Initial and final value of row index.
c, d: Initial and final value of column index.
p: Initial index of the matrix CCM[n][n].
{
x := c;
y := d;
k := p+1;
for i:=a to b do
{
k:=p+1;
Smiley:
for j:=c to d do
{
if(CMM[i+1][j+1]=1)
{
// to count and assign the dissimilar community
edges
// row-side
CCM[p+1][k+1]:=CCM[p+1][k+1]+1;
// to count and assign the dissimilar community
edges
// column-side
CCM[k+1][p+1]:=CCM[k+1][p+1]+1;
}
}
k:=k+1;
if(d<tcm)
{
c := d;
d := d + NCM[k][2];
goto Smiley;
}
c:=x; d:=y;
}
}
The proposed algorithm, ComComGra has seven procedures. The 1
The 2
The 3
The 4
The 5
The 6
Finally, the procedure, CompressedComMatDisplay ( ) is to display the compressed community matrix, CCM[n][n] of the community graph as result. The running time complexity of the algorithm is O(n
Explanation and analysis
The authors have considered a community graph [2, 3, 4, 5, 6] as an example with twenty-three numbers of community members belonging to four types of communities or clusters i.e.,
Community graph.
Adjacency matrix of the community graph
To represent the above community graph in the memory, the authors require an adjacency matrix of order 24
Compressed adjacency matrix.
Compressed community graph of community graph.
Similarly, the 2
Dataset of community graph.
Finally, the authors draw the compressed community graph from the compressed community adjacency matrix depicted in Fig. 4. However, the community
Edge dataset of community graph.
Example-I
The authors have considered Fig. 1 as the first example of community graph and have created two dataset files namely “COMDATA1.TXT” and “EDGE1. TXT”. The dataset “COMDATA1.TXT” has data about the community graph such as ‘Total Number of Communities’ (1
Replace “CommData.Txt” with “COMDATA1.TXT” and “EdgeData.Txt” with “EDGE1.TXT” and input these dataset files to the algorithm depicted in Fig. 7, it successfully writes the adjacency matrix of the community graph in the data file namely “ComAdjMa.Txt” depicted in Fig. 8. Then the algorithm uses the data from the data file “ComAdjMa.Txt” and successfully creates the compressed adjacency matrix of the community graph depicted in Fig. 9. Using Fig. 9, the authors have drawn successfully the compressed community graph depicted in Fig. 4.
Result-I
Input of dataset files.
Adjacency matrix of the community graph.
Adjacency matrix of the compressed community graph.
Football team graph.
The authors have considered the second example as football team graph which is a benchmark example [10] depicted in Fig. 10. The football team graph has twelve football communities with community codes
Dataset of football team graph.
Edge dataset of football team graph.
Input dataset file.
dataset files namely “COMDATA2.TXT” and “EDGE2.TXT” for the football team graph. The dataset “COMDATA2.TXT” has data about the football team graph such as ‘Total Number of Football Team Communities’ (1
The algorithm was written in C++ and compiled with DevC++. The experiment was run on Intel Core I5-3230M CPU + 2.60 GHz Laptop with 4 GB memory running MS-Windows 7.
Adjacency matrix of the football team graph.
Compressed adjacency matrix of the football team graph.
Compressed football team graph.
To represent the compressed graph in compact nature, and allow for both lossless and lossy graph compression with bounds on the introduced error proposed by [13]. With the combination of MDL principle, it represents in a highly intuitive coarse-level graph summary. It also developed two algorithms, Greedy and Randomized. Greedy repeatedly picks the best pair of nodes to merge in the entire graph and outputs a highly compressed representation of the graph. Randomized performs the best merge on a randomly selected node.
To compress a large scale web graph into a smaller one using BFS technique is proposed by [1]. This method of compression is based on the topological structure of the Web Graph rather than on the underlying URLs. It has two phases of compression. In the first phase, it traverses in BFS way to assign the indices to the traversal node in a traversal list for compression. The second phase for compression of web graph using the traversal list data. To compress a weighted undirected graph is proposed by [14]. It compresses a weighted graph with a pair of original nodes is connected by an edge if their super nodes are connected by one, and that the weight of an edge is approximated to be the weight of the super edge.
The proposed algorithm, ComComGra considers a community graph with a set of sub-community graphs and there is a relationship among the members of the same communities as well as the dissimilar communities. Here a sub-graph means a cluster or a community having its own set of members. During the compression process, a cluster or a community is treated as one node. Hence the total number of communities or clusters will be the total number of nodes in the compressed community graph irrespective of the total number of community members present in the original community graph. This technique is completely different from the above-explained existing techniques and adopts the graph-theoretic concept. The running time complexity of the proposed algorithm is O(n
This paper is an extended work on proposing an efficient algorithm for compression of a large community graph with a set of sub-community graphs. The earlier literature, findings, and observations are already available in the article [7]. The authors have compressed a large community graph into a compressed community graph using the concepts of graph technique, especially by detecting and counting the edges between the nodes of similar communities as well as dissimilar communities. A simple graph technique has been used for compression of a large community graph in an efficient way. The running time complexity of the proposed algorithm, ComComGra is O(n
