Abstract
A number of graph-parallel computing abstractions have been proposed to address the needs of solving complex and large-scale graph computing. However, unnecessary and excessive communication and state sharing between nodes in these frameworks not only reduce the network efficiency but may also cause decrease in runtime performance. In this paper, we propose a mechanism called LightGraph, which reduces the synchronizing communication overhead for distributed graph-parallel computing abstractions. Besides identifying and eliminating the redundant synchronizing communications in existing systems, in order to minimize the required synchronizing communications LightGraph also proposes an edge direction-aware graph partitioning strategy. This new graph partitioning strategy optimally isolates the outgoing edges from the incoming edges of a vertex. We have conducted extensive experiments using real-world data, and our results verified the effectiveness of LightGraph. For example compared to PowerGraph LightGraph can not only reduce up to 31.5% synchronizing communication overhead for intra-graph synchronizations, but also cut up to 16.3% runtime for PageRank running on Livejournal dataset.
Introduction
Due to recent advances in high-throughput techniques in various fields, big data analytics technique is more and more popular used by multiple fields, such as system Biology [28, 24], government sector [10], moblie communication [27] and so on. Complex networks such as social, biological and technologies networks can be mathmatically modelled as graphs. The size of these real-world networks are often large consisting of millions or even billions of vertices, and hundreds of billions of edges. Making sense of large real-world networks ranging from social networks of friends; links between web pages in the World Wide Web; and gene regulatory networks, is an increasingly important problem. Thus, designing effective and scalable computing systems for analyzing and processing huge real-world graphs has gained significantly attention and effort.
To alleviate the communication overhead and accelerate the graph-structured application execution, we propose a mechanism that identifies and eliminates the avoidable communication during synchronization in existing distributed graph structured computing abstractions. We implemented our method and created LightGraph: a light communication distributed graph-parallel computation system. Furthermore, to minimize the required intra-graph synchronizations for PageRank-like applications, LightGraph also employs an edge direction-aware graph partitioning strategy, which optimally isolates the outgoing edges from the incoming edges of a vertex when creating and distributing replicas among different machines.
The rest of the paper is organized as follows. Section 2 introduces the related works. LightGraph is detailed in Section 3. Section 4 details the experiment design and result. Section 5 concludes this paper.
Related works
A number of distributed graph-parallel abstractions have emerged in literatures. Pregel [13] explores graph-parallelization through the use of a bulk synchronous distributed message-passing system. Several other systems are similar to Pregel such as GPS [19], Giraph [1]. PGX [8] developed by Oracle can process large scale graphs under either single-machine shared memory or distributed computing model. Stutz et al. propose the Signal-Collect [22] framework to concisely specify and execute a number of computations that are typical for Semantic Web. Naiad [15, 16] is able to conduct incremental iterative computation. However, it adopts traditional synchronous check pointing for fault tolerance and cannot respond to stragglers [26]. Distributed GraphLab [12] and its successors, PowerGraph [6] and Ligraph [29] exhibit more excellent performance than others with better graph processing rate and higher scalability [30, 7, 4]. Cyclops [3] is also a vertex-oriented graph-parallel framework. However, compared with LightGraph (written in C++) its java implementation based on Hama [21] drags its runtime performance down. Work [14] uses vertex-cut graph partitioning that considers both diverse vertex traffic and heterogeneous network costs. However, its partitioning method does not take the application characteristics into account like EDAP partitioning strategy proposed in this work. Work [20] proposes a light-weight processing framework called Frog with a hybrid coloring model. However, Frog only supports asynchronous computing model. In general, asynchronous computing model introduces much more communications than synchronous computing model. On the other hand, works [23, 25] only support synchronous computing model not like this work that can support both synchronous and asynchronous computing model. Compared with our preliminary work [31] this work details the proposed algorithms, conducts more extensive experiments verifying the effectiveness of LightGraph on processing various data set using different algorithms, reports that LightGraph outperforms more recent mechanisms.
In order to deal with the inherent problem, communication overhead, in distributed computing systems, much effort has been done as well. In traditional message passing abstractions, such as Pregel [13], Giraph [1], and GPS [19], all vertex-programs run simultaneously in a sequence of super-steps. In each super-step, each program instance receives all messages sent by its neighbors in the previous super-step and sends messages to its neighbors for next super-step [6]. In order to reduce the number of communication messages, Pregel introduces a commutative associative message combiner, which merges messages destined to a same vertex [13]. Work [18] proposes asynchronous broadcast and reduction operations to reduce communication associated with high-degree vertices. Works [6, 29] and LightGraph proposed in this work employ GAS (Gather, Apply, and Scatter) graph computing model and ensures the changes made to the vertex or edge data are automatically visible to adjacent vertices. Thus, LightGraph eliminates the messages transferred between adjacent vertices.
LightGraph: Lighten communication in distributed graph-parallel abstractions
Challenges of communication and synchronization
In order to process a large-scale graph, a distributed graph-parallel computing system needs to partition the graph into smaller sub-graphs and distribute the sub-graphs to different machines. GraphLab [12] uses an edge-cut approach while PowerGraph [6] adopts a vertex-cut strategy. Nevertheless, replicas (ghosts) have to be created for the vertices and edges across the cutting-line. And through synchronizing these replicas, computation states and data can traverse through the sub-graphs placed on different machines.
In both GraphLab and PowerGraph, communication occurs during synchronizations of replicas and the volume is proportion to the number of ghosts. One prominent problem in GraphLab is that when partitioning a power-law graph, it has to resort to a hashed (random) vertex placement algorithm that cuts across most of the edges and creates many unnecessary mirrors [6]. The communication overhead will then substantially increase and seriously impact the execution efficiency of graph-structured applications for power-law graphs. On the other hand, under PowerGraph, the vertex-cut partitioning process stores each edge exactly once, thus eliminates the need of edge-mirrors and data updates on edges do not need to be communicated to other sub-graphs [6].
In other words, with PowerGraph, only vertices are replicated and only vertex data need to be synchronized. One of the vertex-replicas is randomly chosen as the master and the remaining ghosts are noted as mirrors. In a typical vertex-program, the master runs the apply function and sends the updated vertex data to all mirrors. Although PowerGraph reduces the communication overhead significantly comparing to GraphLab, it still suffers from having very high communication overhead, which limits its performance and scalability [6].
Communication overhead in PowerGraph
Existing distributed graph-parallel computing systems including PowerGraph blindly synchronize all replicas of a vertex or edge when there is a data change in one of the replicas. However, there are certain graph algorithms such as PageRank [17], the data on some of the replicas will never be accessed in future computation iterations. Therefore, the communication for synchronizing these mirrors can be avoided.
Through experimenting with PowerGraph, we found that in certain graph computing applications/algorithms, the direction of information or data flow is consistent with the direction of the edges. More specifically, in a directed edge, the data on the target vertex is not needed by that edge or that edgeâs source vertex during computation. Thus, in a distributed graph, the data on a vertex replica that has no out-going edge will never be accessed by any other edges or vertices in future computation iterations; and such replicas do not need to be synchronized. PageRank [17], HITS [9] and SALSA [11] all fall into this category. We define this category of applications/algorithms as a PageRank-like application/algorithm as follows:
The computation happens on
and
in which
Where
We demonstrate our observation through a simplified example of running PageRank in PowerGraph with a sample graph shown in Fig. 1. Conceptually, the PageRank score of a node is the long-term probability that a random web surfer is at that node at a particular time step. The computation of the PageRank score of a webpage
where
A partial sample graph for PageRank algorithm.
Figure 2 shows an example of a 4-way vertex-cut of the graph based on PowerGraph’s partitioning algorithm. Let us assume that we are computing the PageRank score of vertex
Graph placement under PowerGraph.
Based on the above observation and analysis, we propose a mechanism that identifies and eliminates these avoidable communications during synchronizing master and replicas. We implemented our method on PowerGraph and created LightGraph: a light communication distributed graph-parallel computation system targeting to alleviate the communication overhead for PageRank-like algorithms. In particular, to achieve a light communication distributed graph-parallel computing system, we propose two novel methods in LightGraph: (1) a streamlined synchronization process that eliminates the unnecessary communications; and (2) an edge direction-aware vertex-cut partitioning strategy to maximize the proportion of mirrors with no out-going edges and further reduce the communication overhead.
The communication pattern of PowerGraph and LightGraph when master communicates with a mirror with no outgoing edges in PageRank-like algorithm. In LightGraph the synchronization and scatter phases of this mirror are eliminated.
An example of reduced communication in LightGraph comparing to PowerGraph during synchronization.
Under PowerGraph, as illustrated above, the data on mirrors without out-going edges will not be accessed in future computations for PageRank-like algorithms. Thus, even if the data on the master of a vertex has been updated there is still no need for the master to synchronize these mirrors. LightGraph identifies these mirrors during the initial partitioning process by checking their out-going degree and then eliminates the synchronizing operations pertaining to these mirrors during the execution stage of the graph application as shown by Fig. 3. Consequently, LightGraph is able to reduce the overall communications required for PageRank-like algorithms. Figure 4 gives an example of comparing the synchronization process in LightGraph with that under PowerGraph, in which the communication workload is reduced. The pseudo codes for the vertex computing in PowerGraph and LightGraph are shown in Algorithms 1 and 2, respectively.
Vertex Computing in PowerGraph[1]
Vertex Computing in LightGraph[1]
Edge direction-aware graph partition
As shown above, for PageRank-like algorithms, the synchronization to the mirrors without any outgoing edges can be eliminated. Naturally, we can reason that assuming the same (or similar) number of overall mirrors, the more mirrors that are created without any outgoing edges, the more synchronizing communication overhead can be avoided. Therefore, we propose a new graph displacement method in LightGraph to take into account the direction of edges during the initial graph partitioning phase, namely the edge direction-aware partition (EDAP) strategy. First, during the graph partitioning process, for a particular vertex, EDAP tries to assign edges with the same direction (inbound or outbound edge of a vertex) to the same machine. By this design EDAP maximum the proportion of replicas that has no outgoing edge in the overall vertex replicas. Second, instead of randomly appointing one of the vertex replicas as the master, EDAP chooses the master from the replicas that do have out-going edges, since the synchronization communication only occurs from the master to mirrors. EDAP optimally isolates the outgoing edges from the ingoing edges of a vertex among different machines while maintaining other partitioning mechanisms used by PowerGraph to guarantee good work balance and low number of replicas of vertices.
Figure 5 illustrates the new 4-way placement of the sample graph shown in Fig. 1 achieved by EDAP. Distinct from existing vertex-cut approaches (as demonstrated in Fig. 2), the inbound edges of vertex
An example of graph placement using the proposed edge direction-aware partitioning strategy.
An example of synchronizing communications under EDAP-based graph placement.
We implemented the proposed EDAP approach based on two existing partitioning strategies in PowerGraph: Random and Oblivious [30]. The Random strategy uses a hash function that randomly distributes edges to machines. The Random strategy is fully data-parallel during the partitioning process and can achieve near perfect balanced workload distribution on large graphs. On the other hand, the Oblivious partitioning strategy uses a sequential greedy heuristic whose goal is to place subsequent edges on appropriate machines to minimize the conditional expected replication factor. As defined in [30], the replication factor is the ratio of the number of overall replicas in the distributed graph over the number of vertices in the original input graph. In a
Therefore, the objective of the Oblivious strategy is to place the
where
Oblivious runs the greedy heuristic independently on each machine without additional communication and it has the best performance of all the partitioning strategies implemented in PowerGraph [30].
In LightGraph, we extended the Random and Oblivious partition strategies using the edge direction awareness feature and created the partition method: EDAP_Random and EDAP_Oblivious, respectively. In particular, based on the Random and Oblivious implementation, we added a heuristic operating process that isolates the outgoing edges from the incoming edges of a vertex among different machines. In detail, following the placement of previous
and
where
During the machine selection process we give higher priority to the machines, on which
or
By doing so, we maximize the chances of creating mirrors with only incoming or outgoing edges.
Algirithms 3–6 detail the pseudo codes of the mentioned partition strategies, respectively.
Random Partition in PowerGraph[1] // Random assign edge (source, target) to a machine p in {0, …numprocs-1} INPUT: Source, Target, # of processes OUTPUT: process/machine ID Return
Oblivious Partition in PowerGraph[1] // Greedy assign edge (source, target) to a machine p in {0, …numprocs-1} INPUT: Source, Target, # of processes OUTPUT: process/machine ID
EDAP_Random in LightGraph[1] // Edge direction-aware random assign edge (source, target) to a machine p in {0, …numprocs-1} //proc_dst_vertex[i].get(vertex ID) and proc_src_vertex[i].get(vertex ID) indicates whether a vertex has already been a target or a source of another edge in machine i; INPUT: Source, Target, # of processes OUTPUT: process/machine ID
Notations
EDAP_Oblivious in LightGraph[1] // Edge direction-aware greedy assign edge (source, target) to a machine p in {0, …numprocs-1} INPUT: Source, Target, # of processes OUTPUT: process/machine ID
In this section we conduct the volume of synchronizing communications analysis. We look inside the distribution structure of a graph and explore its relationship with the volume of synchronizing communications. Table 1 explains the related notations.
General analysis
Given a vertex,
mirrors with both incoming and outgoing edge; mirrors with outgoing edge and without incoming edge; mirrors with incoming edge and without outgoing edge;
We also introduce a flag, ø
In PowerGraph
After the Apply phase is done, data on master is updated. Then the master will synchronize all its mirrors with the new data.
All
There is no duplicated communication between any two different vertices. Consequently, the total number of synchronizing communication messages happening in the whole graph computing job is:
In LightGraph
LightGraph just eliminates the unnecessary synchronizing communications between mirror and master. Thus, LightGraph does not induce any impact on the mediated computing results of a vertex. Therefore, for each vertex
In LightGraph,
Consequently, the number of total synchronizing communication messages happening in the whole graph is:
Thus the number of reduced synchronizing communication messages achieved by LightGraph over PowerGraph is:
Thus,
In LightGraph, it is possible that all outgoing edges of a vertex
Then,
Thus, in the optimal case, compared with PowerGraph, a considerable propotion of the synchronizing communication overhead can be reduced by LightGraph.
Table 2 compares the key characteristics of LightGraph with three state-of-the-art graph parallel abstractions.
LightGraph vs existing systems
In this section, we show the effectiveness of LightGraph comparing with PowerGraph from various aspects through experiments.
Experiment environment
Our experiments were conducted on a 65-node (528 processors) Linux-based cluster. The cluster consists of one front-end node that runs the TORQUE resource manager and the Moab scheduler and 64 computing (worker) nodes. Each computing node has 16 GB of RAM and 2 quad-core Intel Xeon 2.66 GHz CPUs. The/home directory is shared among all nodes through NFS.
Benchmarking application and dataset
We selected PageRank and SSSP (shortest path algorithm) as benchmarking applications and processed three data sets listed in Table 3. These data sets are all large-scale graphs. The selection standard is to select graphs extracted from real-world use with diverse characteristics and different scales in size. LightGraph is only applicable for directed graphs. Thus all graphs selected are directed.
Summary of data sets
Summary of data sets
The number of synchronizing communication messages vs the number of mirrors needing to be synchronized for PageRank on Livejournal.
We conducted tests by running the selected benchmarking algorithms on the datasets. Each presented result is the average of at least three runs. We first ran the PageRank application on LiveJournal data set and Fig. 7 plots the number of synchronizing communication messages vs the number of mirrors needing to be synchronized under synchronous mode using 16 machines. In Fig. 7 the numbers of mirrors needing to be synchronized under LightGraph (Random) and LightGraph (Oblivious) are actually the numbers of mirrors with outgoing edge under Random and Oblivious partition, respectively. As Fig. 7 shows 86.8% and 89.6% of overall mirrors have outgoing edge under Random and Oblivious placement, respectively. Namely 13.2% and 10.4% of overall mirrors have no outgoing edge under Random and Oblivious placement, respectively. PowerGraph needs to synchronize all mirrors. Instead LightGraph only synchronizes 86.8% and 89.6% of overall mirrors under Random and Oblivious, respectively. Thus, the numbers of synchronizing communication messages are reduced by 14.7% and 10.8% by LightGraph (Random) and LightGraph (Oblivious) compared with PowerGraph (Random) and PowerGraph (Oblivious), respectively. EDAP partition strategy tries to further isolate the incoming and outgoing edge for each vertex. And as expected, under EDAP_Random and EDAP_Oblivious the percentages of mirrors with outgoing edge are reduced to 71.5% and 82.8%, respectively. Consequently, the numbers of synchronizing communication messages are reduced by 26.4% and 16.5% by LightGraph (EDAP_Random) and LightGraph (EDAP_Oblivious) compared with PowerGraph (Random) and PowerGraph (Oblivious), respectively.
Comparing the volume of synchronizing communication under synchronous computation mode for PageRank on livejournal.
Comparing the volume of synchronizing communication under asynchronous computation mode for PageRank on livejournal.
PageRank runtime under synchronous mode while computing livejournal.
PageRank runtime under asynchronous mode while computing livejournal.
Comparing the volume of synchronizing communication under synchronous computation mode for SSSP on BFS1.
SSSP runtime under synchronous mode while computing BFS1.
Comparing the volume of synchronizing communication under asynchronous computing mode for SSSP on Twitter.
SSSP runtime under asynchronous mode while computing Twitter.
SSSP runtime under asynchronous mode while computing Twitter.
Figures 8 and 9 show the volume of synchronizing communication happening in PageRank on Livejournal data set in different execution environments under synchronous and asynchronous computation mode, respectively. As expected, LightGraph and its edge direction-aware partitioning strategies can significantly eliminate avoidable synchronizing communication during the PageRank application execution under both modes. As the figures show, the synchronizing communication overhead is reduced up to 31.5% by LightGraph over PowerGraph. Moreover, as the number of machines increases, more synchronizing communication can be saved by in LightGraph.
Figures 10 and 11 show PageRank runtime on Livejournal data set using different partitioning strategies in LightGraph and PowerGraph under synchronous mode and asynchronous mode, respectively. As shown the figures, through eliminating the avoidable synchronizing communication, LightGraph shorten the execution time of PageRank under both synchronous and asynchronous modes. Moreover, the proposed edge direction-aware partitioning strategies, further improved the performance of the PageRank application. As the figures show, the runtime is shortened up to 16.3% by LightGraph over PowerGraph. Furthermore, LightGraph shows better performance gains through reducing synchronizing communication as the number of machines increases in the cluster.
Figures 12–15 show the selected experiment results of SSSP running on Twitter and BFS1. All reported experimental results of diverse benchmarking applications on different datasets demonstrate the universality of LightGraph.
In Fig. 16 we provide PageRank runtime of processing Livejournal data set comparisons between LightGraph and several representative existing systems. In our experiments Giraph, GPS, and GraphX all adopt their system default graph partition strategy: random edge-cut partitioning. And because all these there systems do not support asynchronous graph computing they all conduct the graph computing under synchronous computing mode. As the results demonstrate LightGraph outperforms other systems in runtime. For example, the runtimes of Giraph and GPS are 122.3 s and 110.6 s, respectively. On the other hand, the runtime of LightGraph (Sync, EDAP_Random) is only 59.4 s.
Although distributed graph-parallel computing systems such as PowerGraph can provide high computational capabilities and scalability for large-scale graph-structured computation, they often suffer heavy communication overhead. In this paper, we proposed LightGraph that eliminates the avoidable communications during synchronization of mirrors in existing distributed graph structured computing abstractions. Through extensive experiments with real-world network data, we show that LightGraph can not only reduce synchronizing communication significantly but also improve the runtime performance of PageRank-like applications.
