Abstract
Edge bundling is a technique used to improve the readability of large graph drawings by grouping edges to reduce visual complexity. This paper treats this task as a clustering problem, using compatibility metrics to evaluate solutions in an optimization pipeline combined with a clustering ensemble approach. The aim is to present the Clustering Ensemble-based Edge Bundling (CEBEB) method for solving the General-based Edge Bundling (GBEB) problem and report results for some given graphs. The CEBEB method proved very promising and generated better solutions than an existing evolutionary algorithm. Additionally, the paper introduces a new ensemble algorithm, specific for the GBEB, and reviews some previous results.
Introduction
The research area of Information Visualization primarily focuses on developing techniques to create visual representations of abstract data, making it easier for people to understand and comprehend. According to Card et al. 1 Information Visualization involves using interactive visual representations of data through computers to enhance cognitive abilities. This field of research includes several areas, such as Graph Drawing, which studies the problems and techniques related to the visual representation of graphs, to generate useful representations.
Graphs consist of mathematical models that are commonly used to represent physical or abstract entities and their relationships. To construct a graph, it’s often necessary to create a graphical arrangement that presents the nodes (or vertices) and edges. There are numerous ways to design such an arrangement, including different ways of displaying nodes and edges, various layouts, and approaches to positioning elements in a graphic presentation. The classical graph drawing literature discusses these aspects in detail.2–5 Some of them involve adopting specific layout patterns that may change according to the type of graphs (like hierarchical and circular layouts, general node-link layouts or the visibility representation), using straight lines, poly-lines or curves to represent the edges, adopting distinct graphical shapes (like rectangles or ellipses) for the nodes and creating a 2D or a 3D layout.
The aim of a good graph drawing is to contribute to improve the user’s cognitive process, helping to identify the global structure of the data, its relationships, and other patterns contained within. 6 Therefore, using esthetically pleasing drawings can make it easier to understand.
Graph drawing has applications in many areas, 7 such as in designing digital circuits in microelectronics, modeling systems in software engineering, and designing physical structures using computer-aided design (CAD). Besides that, in recent decades, among the most popular technologies are the increasing mobility of applications, cloud computing, and the Internet of Things (IoT). The use of these technologies generates a large volume of data, mostly coming from simple transactions, such as electronic purchases, e-mail, documents, messages, information disseminated on social networks, and even data generated by sensors embedded in computing devices. This type of information natively represents relationships between objects, and one of the most common ways of structuring such information is through the use of graphs.
However, for relatively large graphs with many edge crossings, legibility is sometimes significantly impaired by screen space constraints, coupled with limited human cognitive power and the overlapping of other visual elements in the drawing. 6 This visual complexity, commonly called “visual clutter,” can compromise even relatively simple tasks, such as finding the endpoint of an edge or perceiving the flow of information in the graph drawing. In that sense, edge bundling has been widely used for reducing visual complexity and improving the level of understanding of the information represented by graph drawings. Such an approach draws the edges of a graph as interpolated curves that share the same path, thus forming bundles, as seen in Figure 1.

Graph drawings with (a) straight lines and with (b) the application of the edge bundling technique. Note that some edges in (b) are drawn together, which causes them to curve.
The academic scientific community has presented several methods for edge bundling in general graphs, and specific methods for the special cases of circular graphs, hierarchical digraphs, and parallel coordinates, among others. The first approach in this area was proposed by Holten, 8 with the concept of joining adjacent edges in hierarchical data. Since then, many other entirely different approaches have been reported, for example, the image-based method, 9 which uses digital image processing operations to assemble edge bundles; the geometry-based method, 10 which employs a mesh for controlling the formation of the bundles; and the force-directed method 11 which uses the concepts of classical force-based graph drawing algorithms for joining edges. Figure 2 presents drawings generated using the mentioned approaches. It is important to note that there is a significant difference between the bundles and the style of the drawings produced by each approach.

In 2017, 12 an attempt was made to unify the concepts within this research area. Even so, there still exist several definitions of what a bundle of edges can be, and some definitions are not mathematically precise. In response to this, Ferreira et al. 13 proposed a general formulation of Edge Bundling (EB) from an optimization perspective, with definitions that enable the comparison and evaluation of specific edge bundling problems and methods. The authors considered that “edge bundling involves the creation of a graph drawing in which the edges are grouped through sharing segments in the plane, improving readability and preserving the connectivity and information of the original graph.” They formulated three specific EB problems: Angular-based Edge Bundling (ABEB), Compatibility-based Edge Bundling (CBEB), and General-based Edge Bundling (GBEB), which incrementally increase in complexity.
These problems fall in the category of explicit edge bundling, in which the set of edges that comprise each bundle has to be explicitly defined before the drawing step. The authors also presented an evolutionary algorithm, called Evolutionary Edge Bundling (EEB), to solve these problems. The EEB algorithm provided useful explicit bundling solutions, but at a high computational cost, particularly for the GBEB, which made its application infeasible for large graphs.
Another evolutionary approach for edge bundling was proposed by Saga et al. 14 It consists of a genetic algorithm that segments the edges using dummy vertices (false vertices, created by the split of a real edge into a sequence of line segments. The dummy vertices connect such segments) for routing them to make the bundles. The authors include esthetic criteria in their objective (fitness) function, for evaluating the visual quality of the generated drawings.
The current paper aims to investigate an alternative solution to the GBEB problem by following the perception mentioned by Lhuillier et al. 12 when reviewing other studies,15,16 that there is a conceptual similarity between edge bundling, clustering, and image segmentation problems, not yet fully analyzed. In that regard, the present work revisits the GBEB problem and discusses a method (abbreviated as CEBEB) for solving it that employs both clustering and clustering ensemble algorithms, which will be further detailed, embedded in an optimization pipeline. A first introduction to the CEBEB was made in a preliminary paper 17 and it is further explained and evaluated here. This work also reviews the previous results, by considering a correct implementation of the visibility compatibility criterion in both the proposed method and in the Evolutionary Edge Bundling (EEB) method, used for comparison.
The main contributions of this paper are:
An enhanced description and evaluation of an optimization method for the General-based Edge Bundling (GBEB) that uses clustering and clustering ensemble algorithms guided by edge-bundling compatibility metrics;
A new consensus algorithm for the GBEB based on edge compatibility; and
Proper results that support the idea of solving the GBEB problem as an edge clustering problem.
The remainder of this paper is organized as follows: Section “General-based edge bundling” defines the GBEB problem; Section “Clustering edge bundling and ensemble learning” provides a brief overview of the usage of clustering approaches to edge bundling and conceptualizes clustering ensemble learning; Section “Clustering ensemble-based edge bundle” introduces the proposed optimization method for the GBEB problem; Section “Experiments” shows the experimental setup and the results of the evaluation of the proposed method; and, finally, Section “Conclusion” suggests steps to further develop the reported work.
General-based edge bundling
Ferreira et al. 13 define three problems in edge bundling: ABEB, CBEB, and GBEB. Since only the last problem will be treated in this paper, only its definition is given below.
Let
For the definition of
Finally, Ferreira et al.
13
investigated a particular instance of the GBEB problem that aims to maximize a weighted sum of the optimization goals. The
The numerical thresholds
such that
To solve the GBEB problem, the authors developed the EEB method, which consists of an evolutionary algorithm that uses equation (2) as its fitness function. An individual in EEB is represented by a vector of size
Clustering edge bundling and ensemble learning
Clustering algorithms have already been used or adapted by some authors for edge bundling. These approaches, however, did not focus on solving an explicit edge bundling optimization problem.
For example, Zhou et al. 18 presented a technique that performs the creation of edge bundles using triangulation and a hierarchical clustering. That approach creates a triangulation based on a drawing of the graph with straight-line edges. The edge points generated by the triangulation are then subjected to a hierarchical clustering algorithm to assemble the bundles.
Telea and Ersoy 9 applied agglomerative hierarchical clustering as an intermediate step in the proposed Image-based Edge Bundling approach. In this case, the first step is to generate a preliminary bundled graph drawing generated with Hierarchical Edge Bundles (HEB). Subsequently, the drawing is submitted to an agglomerative hierarchical clustering algorithm to identify the bundles on the image and therefore perform rendering. However, the drawing used to perform the clustering already contained information about the bundles generated in the preliminary design.
Yamashita and Saga 19 proposed a concept of edge bundling using edge cluster information by a line graph. Such an approach consists of a clustering-based method for edge bundling that uses a node clustering algorithm for community detection in the line graph. Thereafter, the edges are bundled based on the cluster information using Force-Directed Edge Bundling. 11
An idea that has been quite popular is the ensemble learning, which consists of selecting a collection, or set, of hypotheses and combining their predictions using an average, voting, or applying some machine learning technique. Russell and Norvig 20 says that there are basically two reasons for implementing this type of approach. The first of these is to reduce bias, once the hypothesis space of a model can be very restrictive, imposing a strong bias. In this sense, an ensemble can be more expressive and therefore have fewer biases than the basic models. The second reason is to reduce variance: consider, for example, a set of classifiers that combine the solutions using majority voting. For the ensemble to misclassify a new example, the majority of classifiers must misclassify it. The probability of this occurring is much lower than a single incorrect classification by a single classifier.
The concept of ensemble was first used in the combination of supervised classifiers and, due to the success obtained, the idea was applied to the combination of results from different clusters. The approach emerged as an alternative to improve the quality of results of clustering algorithms and also to help clarify the following questions: “If there are two different clustering algorithms, and they are applied to the same set of data, different results can be obtained. But which one is correct? How can the results be evaluated?” 21
Given a set of objects, a clustering ensemble method consists of two main steps: (1) Generation, which deals with creating a set of partitions of these objects, and (2) Consensus Function, where a new separation, which is an integration of all partitions obtained in the generation step, is calculated. 22
Among the different types of consensus functions, some methods can be highlighted that transform the problem of combining clusters into the problem of graph or hypergraph partitioning. The difference between these methods is the way the graph is modeled from the set of clusters and how the cuts are defined to obtain the consensus partition. Strehl and Ghosh 23 defined the consensus partition as the partition that has the most information shared with all partitions in the cluster set. The authors proposed three heuristics based on partitioning to solve the problem. The set of clusters is represented as a hypergraph, and each partition is represented by a hyperedge. Such heuristics are detailed below:
Cluster-based Similarity Partition Algorithm (CSPA) – From the hypergraph, a
Hypergraph Partition Algorithm (HGPA) – This is a straightforward approach to clustering ensemble that partitions the data using the clusters provided as indications of strong ties. The problem is formulated as partitioning the hypergraph by cutting a minimum number of hyperedges, considering that they all have the same weight. The hypergraph is partitioned into
Metaclustering Algorithm (MCLA) – Firstly, all similarity between pairs of clusters
Another method, also based on graph partitioning, called Hybrid Bipartite Graph Formulation (HGBF), was proposed by Fern and Brodley, 25 in which clusters and objects are modeled together in the same graph. In this method, the bipartite graph is constructed in such a way that there are no edges between vertices that are instances or clusters. There is only an edge between two nodes if one of the nodes represents a cluster and the other represents an object that belongs to this cluster. The consensus partition is obtained by partitioning this graph using the METIS 24 algorithm or the spectral clustering 26 algorithm.
With a different approach, Li et al. 27 proposed the clustering consensus solution using Non-Negative Matrix Factorization (NMF). The authors define this approach as a “simple and demonstrably convergent algorithm for solving the consensus problems of clustering and semi-supervised clustering.” This method defines consensus as the median partition problem, using distance as a measure of similarity between partitions. The original problem definition is relaxed consecutively to transform the problem into an optimization problem that can be solved by an iterative process. However, this process may only find local minima, rather than a global minimum of the problem. 21
Clustering ensemble-based edge bundle
The proposed method, called Clustering Ensemble-based Edge Bundle (CEBEB), solves the GBEB problem on undirected graphs by using clustering and clustering ensemble algorithms for producing edge clusters and by combining them.
In short, the CEBEB clustering ensemble activity is divided as described below:
Level 1 Generate various solutions using some clustering algorithms with multiple parameter combinations;
Level 2 Submit the solutions generated at Level 1 to consensus algorithms;
Level 3 The best result that was obtained among the solutions generated at Levels 1 and 2 constitutes the output of the ensemble.
At each level, the generated solutions are evaluated by the objective function
The CEBEB method has an optimization pipeline that divides the resolution of the GBEB problem into three steps, as can be seen in Figure 3, detailed in the following subsections.

The steps of CEBEB method.
Compatibility calculation
Firstly, the compatibility between the edges of the graph is calculated. The geometric compatibility metrics to be used are the same as those already used to solve the GBEB problem.
13
The general compatibility of the graph
Bundle assembly
Next, a bundle assembly step is performed. It consists of three activities, which are structured as follows and are shown in Figure 4.

Bundle assembly pipeline of CEBEB.
Estimation of the number of clusters
In addition to the feature matrix constructed in the previous step, some clustering methods require an estimate of the number of clusters (bundles) as input. To calculate the estimated value of this parameter, which is represented by
Clustering
The employed approach involved the utilization of 10 clustering algorithms that are widely cited in the literature. To select these algorithms, diversity was sought in the way the clusters were constructed so that the solutions generated were as different as possible. Various combinations of parameters were systematically tested to yield diverse results in clustering. Each set of parameters was inputted and executed across all 10 algorithms (see Table 1).
Number of executions and parameters of each clustering algorithm used in the CEBEB.
Several of these selected clustering methods require an estimate of the number of clusters as part of their input data. In order to determine this value, some algorithms proposed in the literature were used. However, since the present objective is to explore the potential of clustering approaches to the problem of edge bundling, and as the results of these methods are approximate, there are used entries in the interval
The input values for the numerical parameters were defined empirically. Those parameters based on initialization (n_init) and the maximum number of iterations (max_iter) were the same for all methods. Finally, the seed for the random-number generator (parameter random_state), when available, was set to zero to enable the reproduction of the achieved results.
In this context, the proposed clustering ensemble combines the two ways of the generation mechanisms proposed by Vega-Pons and Ruiz-Shulcloper 21 : “different clustering algorithms” and “different initialization parameters.”
The results obtained from all executions of the algorithms used were evaluated by the objective function to order them by the quality of the solution. After this evaluation, the best result found by each method was selected and submitted to the next activity: the clustering consensus.
Clustering consensus
The best solutions generated by running these diverse algorithms and their parameter combinations are submitted to standard consensus algorithms in the literature: CSPA, 23 HGBF, 25 HGPA, 23 NMF 27 and MCLA, 23 detailed in the previous section.
However, these mentioned algorithms perform consensus using only the sets of clusters already produced by the clustering algorithms as initial information. In this sense, the authors proposed a new algorithm, called the Compatibility-based Cluster Consensus Algorithm (CCCA), which was developed to include the concept of edge compatibility in the consensus process.
The proposed Compatibility-based Cluster Consensus Algorithm (CCCA) performs a qualitative analysis of the clusters of the solutions generated by the clustering algorithms applied in the previous activity. The method works as follows:
Add every cluster (bundle) in all solutions to a unique list
Compute the
Eliminate duplicates in
Sort the clusters in
Try to combine, two by two, the highest-rated clusters in
If,
Output
The Maximum (MAX) algorithm is also used to select the solution with the best evaluation value obtained at the bundle assembly activity.
Finally, the best result obtained, that is, the one with the maximum evaluation value, is selected among the solutions generated until the moment and constitutes the output of the ensemble, as shown in Figure 5, that summarizes the clustering and clustering ensemble algorithms used in CEBEB pipeline.

Summary of clustering and clustering ensemble algorithms used on the pipeline: this is the data processing flow in the CEBEB approach.
Graph drawing
The third and final step of the CEBEB approach consists of providing the bundles to a rendering module to create a graph drawing that reflects the joining of the edges. However, similar to GBEB, edge routing was not the initial focus of CEBEB and has not yet been fully addressed in the context of this proposal.
This activity has been performed via a specialized implementation of the Force-Directed Edge Bundling approach by Holten and van Wijk, 11 carried out by Ferreira et al. 13 The algorithm was modified to take the bundles as input and draw them individually. According to Ferreira et al., 13 “the force-directed algorithm ensures that the individual axial symmetry of each bundle looks plausible.”
Experiments
To validate the CEBEB approach, some experiments were performed. The objective of these experiments was to ascertain whether the clustering algorithms, either individually or in an ensemble, are capable of assembling edge bundles for the GBEB problem with comparable or superior quality to those found by EEB. The experiments are now described for a set of five graphs from distinct domains, which were also analyzed by Ferreira et al. 13 in the context of the GBEB. The graphs employed are a synthetic general graph (20 nodes, 28 edges) 39 and four real-world graphs: Zachary Karate Club (34 nodes, 78 edges), 40 Dolphin Social Network (62 nodes, 159 edges), 41 Movie Lens (160 nodes, 161 edges) 42 and Les Misérables (77 nodes, 254 edges). 43
The compatibility measures used in these experiments were generated assuming an angular compatibility of
An important problem to note, however, is that Ferreira et al.
13
implemented the visibility compatibility measure differently. The authors calculated the visibility value between two edges using the order of the edges as they appear in a bundle. Thus, to facilitate comparison, both approaches had to be adjusted to consider the minimum value of
Results of 100 independent runs of EEB with the adjustment in visibility compatibility.
In the CEBEB pipeline, after calculating the compatibility between the edges of the graph, and as part of the Bundle Assembly step, the estimate of
The best results found by the algorithms in each activity of the Bundle Assembly step mentioned above are presented in Tables 3 to 7 for the various graphs used. The “Evaluation” column contains the values of the objective function (the higher the value, the better). The number of executions and the execution time for each algorithm are also shown.
Results of the CEBEB for the graph Synthetic.
Results of the CEBEB for the graph Zachary Karate Club.
Results of the CEBEB for the graph Dolphin Social Network.
Results of the CEBEB for the graph Movie Lens.
Results of clustering algorithms used in the CEBEB for the graph Les Misrables.
Finally, Figure 6 presents the comparison between the execution time of the CEBEB approach and 100 executions of the EEB, a non-deterministic evolutionary method used in Ferreira et al. 13 to solve the GBEB problem. Since, solving the GBEB repeatedly by EEB, implies the possibility of a different result at each time of execution, the CEBEB result was compared to those 100 independent EEB runs. Figure 7 also compares the results of CEBEB with the best, median, and worst case of EEB for the GBEB problem. As CEBEB has deterministic characteristics, the data were obtained from a single execution of the complete optimization pipeline proposed.

Comparison between the total execution time in seconds (s) of the CEBEB and 100 independent runs of the EEB for each of the five graphs studied.

Comparison between the evaluation of the solution obtained by the CEBEB and the best, median, and worst solutions obtained by the EEB for each graph.
The tests were performed on a Dell G3 3590 laptop with a 2.6 GHz Intel Core i7 processor and 8 GB of RAM, with the support of the Scikit-learn 44 as a machine learning API.
The edge-bundling graph drawing solutions generated by the experiments are presented in Figures 8 to 12. In these figures, the edges that were joined into bundles appear in red and the bundle with the highest number of edges is highlighted in blue. If there is more than one bundle with the same highest number of edges, both are painted in blue.

Comparison between the drawings generated by the EEB and CEBEB approaches for EB instances of Synthetic graph: (a) original drawing, (b) EEB (Bundles:15, Evaluation:57.79), and (c) CEBEB (Bundles:13, Evaluation:68.71).

Comparison between the drawings generated by the EEB and CEBEB approaches for EB instances of Zachary Karate Club graph: (a) original drawing, (b) EEB (Bundles:36, Evaluation:137.10), and (c) CEBEB (Bundles:34, Evaluation:144.74).

Comparison between the drawings generated by the EEB and CEBEB approaches for EB instances of the Dolphin Social Network graph: (a) original drawing, (b) EEB (Bundles:81, Evaluation:181.68), and (c) CEBEB (Bundles:89, Evaluation:227.52).

Comparison between the drawings generated by the EEB and CEBEB approaches for EB instances of Movie Lens graph: (a) original drawing, (b) EEB (Bundles:63, Evaluation:561.60), and (c) CEBEB (Bundles:46, Evaluation:722.64).

Comparison between the drawings generated by the EEB and CEBEB approaches for EB instances of Les Misrables graph: (a) original drawing, (b) EEB (Bundles:114, Evaluation:546.16), and (c) CEBEB (Bundles:108, Evaluation:731.05).
Discussion
One of the difficulties reported by Ferreira et al. 13 when conducting the GBEB tests was related to the execution time of the evolutionary algorithm for this specific problem, which made it impossible to carry out experiments on larger graphs. However, when the adjustment on the visibility compatibility calculation was made, the execution time of the algorithm decreased significantly, and it is quite similar to the CEBEB processing time for most of the analyzed graphs, as shown in Figure 6. Despite the time required by the CEBEB to solve the problem being similar or greater in some cases, it is worth mentioning that the quality of the solutions found by this method was about 30% better than the median of the EEB. In addition, in all the five graphs studied, the solution found by the CEBEB is even better than the best solution of the EEB (Figure 7).
One aspect that deserves special attention is the fact that the Agglomerative Hierarchical Clustering (AH) approach managed to find the best CEBEB solutions at Level 1 for all the graphs tested, and that solution also was selected as the best at Level 2 in four of the studied graphs. Such solutions are better than the median of the solutions found by the EEB and were obtained in a very short time, when considering only the execution of the different combinations of the AH parameters (Level 1), and comparing with the time spent in a single execution of the evolutionary algorithm.
On the other hand, the Affinity Propagation (AP) algorithm did not present good results for any of the graphs tested, being the worst solution in all cases. It is also worth mentioning that the method with the highest time and processing costs, the Gaussian Mixture Model (GMM), does not appear among the three best solutions for any of the studied graphs. This indicates that the CEBEB approach can be improved by excluding from the clustering ensemble those algorithms that are not so efficient in the domain of the GBEB problem. Consequently, the CEBEB run-time would also be improved.
Among the consensus methods, the MAX algorithm (which takes the solution with the highest objective function value) produced the best solution in four of the five graphs. The proposed CCCA method produced the best solution only for the Movie Lens graph. Nevertheless, in all cases, the latter method resulted in solutions with a much lower number of clusters than the other ensemble approaches.
Besides achieving the best of edge compatibility values, regarding the human element in the evaluation of edge bundles, the present authors consider most of the drawings generated from the solutions given by CEBEB to be more “visually pleasing” compared to the drawings of the best solutions found by EEB. As can be seen in Figures 8 and 10 to 12 (except in Figure 9), CEBEB’s visual solutions have fewer crossings between edges and/or bundles and appear to have captured geometric compatibility better.
It should be noted that, in some of these drawings, both the EEB and CEBEB methods have assembled edge bundles that can create ambiguity. This may indicate that the penalty value used in equation (4) (see the −3 constant) should be higher for bundles with low compatibility. Future investigations may be necessary to determine adjustments for that constant and for the thresholds.
Conclusion
This paper treats the task of solving the general edge bundling problem (the GBEB) as a clustering problem using compatibility metrics to evaluate the generated solutions in an optimization pipeline, combined in a Clustering Ensemble-based Edge Bundling (CEBEB) approach. A CEBEB-based method was developed and tested on various existing and synthetic graphs. As far as we know, it is the first reported approach that uses clustering ensemble algorithms applied to edge bundling with an optimization focus.
The proposed method was capable of generating relatively good solutions compared to those produced by an existing, evolutionary algorithm for the GBEB, called EEB. It also has the potential to be accelerated and become faster than the previous algorithm, as it is possible to select only the most effective clustering algorithms to compose Level 1 or employs hyperparameter analysis to select the most effective combination of parameters. Furthermore, the clustering algorithms can be run in parallel since they do not depend on each other within Levels 1 and 2.
Based on the results obtained so far, the proposed method appears to be a very promising alternative for identifying edge bundling solutions that have few bundles while maintaining a high compatibility between the edges.
Furthermore, the study helped improve the EEB results through an efficient implementation of the visibility compatibility criterion.
As future work, the specialized Force-Directed Edge Bundling used as the post-processing algorithm for creating the graph drawings could be tuned to avoid routing edges too closely to unrelated nodes. The proposed method could also be extended to perform a multi-objective assessment, so that each compatibility measure is treated as an independent criterion, thus providing compromise solutions for a final user manual selection step. Additionally, the CEBEB method should be tested on larger and more complex graphs, for which the edge bundling technique has a greater impact on visual readability. For example, it could be applied to very large social networks, featuring community clusters varying in density of nodes and with different distances between them.
