Abstract
In this paper, an efficient and polynomial edge detection algorithm based on a hierarchical graph-partition approach is presented. After transforming a digital image into a graph network, the proposed algorithm proceeds by iteratively dividing the image into regions, and then transforming this hierarchical region map into a sequence of boundary maps. This allows the proposed algorithm to operate naturally with colour or hyperspectral images, as well as to detect edges at different levels of detail in a simultaneous and consistent manner. Such a sequence of edge maps can be seen as jointly approximating the different levels of detail that humans may use when recognizing objects in an image. This idea is taken to base the evaluation methodology of the proposed algorithm, that extends the usual boundary-based evaluation methodology based on the matching of the automatic maps with a set of human ground truth, reference maps. The computational experiences carried out to benchmark the performance of the proposed algorithm over the BSDS500 dataset suggest that the proposed method attains a statistically significant better performance than some well-known detectors as Canny or Sobel.
Introduction
The purpose of computer vision is imitating human cognition in order to electronically perceive and understand a digital image or video [41]. Computer vision has many real-world applications, from machine inspection to medical imaging and others (see for example [6, 42]). Thus, computer vision aims to solve very complex tasks, for which it relies on various image processing operations such as image capture, preprocessing, edge detection, region extraction, region labelling, high-level identification, qualitative/quantitative conclusion, among others [41].
In this paper, we focus on the Edge Detection problem or task. Edge detection is widely applied in many disciplines as medical imaging, object detection and recognition tasks, among others [10, 45]. In some cases, the output of the edge detection algorithm already provides the solution needed in such tasks [4, 38]; in other cases, the obtained output is just an intermediate step of the image analysis operations to be applied, as in the feature detection and/or object recognition tasks [19].
Edge detectors are image processing algorithms which analyze brightness variations between pixels of an image in order to detect the edges or boundaries that define the different objects appearing in such an image. More specifically, edge detectors tipically analyze the intensity function of each pixel in the image [41], in such a way that when significant differences between two neighboring pixels are located, an edge (or boundary) of the region containing one of these pixels is signalled or detected [15]. These boundaries can be depicted in the digital image by drawing a white line onto these signalled pixels and setting the other, non-signalled ones to the black background. Most of the methods for edge detection have been developed for gray-scale images (see for example [7, 39]). Furthermore, the output of these edge detection algorithms is usually a unique binary boundary map.
The edge detection algorithm proposed in this paper deals with arbitrary digital images (gray-scale, RGB, hyperspectral, etc.), and is based on an unsupervised hierarchical graph-partition algorithm, namely the Divide-and-Link (D&L) algorithm [13]. The idea is to extend this hierarchical graph-partition algorithm to obtain a hierarchical sequence of region maps on the image [12], and then transform these regions into a sequence of boundary maps. We will refer to the proposed method as the Image Divide-and-Link (ID&L) algorithm. The hierarchical graph-partition algorithm we rely on to do this is similar to other classical hierarchical graph-based partition algorithms, (see, e.g., [11, 40] and the extensions in [16, 44]).
Thus, the output of the proposed ID&L method consists of a sequence of boundary maps in which the found objects are displayed hierarchically. This sequence of boundary maps as a whole can be seen as an approximation to the different levels of detail that humans may use when recognizing objects in an image; it allows, for instance, to choose the appropriate partition of interest or to obtain a soft boundary map by joining the sequence of boundary maps and assigning a level of intensity to each border according to its level of hierarchy. In this way, obtaining a sequential output provides flexibility in many aspects, as opposed to obtaining only one boundary map.
We have chosen the BSDS500 dataset [25] as the context for the computational experiences conducted for evaluating the proposed ID&L method. This dataset is used along the usual boundary-based evaluation methodology for edge-detection [1, 26], which is extended in this paper following [35]. In this way we allow many-to-many comparisons between the sequence of automatic boundary maps provided by the ID&L and the set of human reference boundary maps provided with BSDS500 as ground truth. The results of these experiments are quite promising, as ID&L shows a statistically significant better performance than common detectors as Canny [7] or Sobel [39].
The paper is organized as follows: In Section 2 we provide a basic description of the Divide-and-Link algorithm for hierarchical network clustering [13], which is the basis of the proposed algorithm. After these preliminaries, the proposed edge detection ID&L algorithm is described in Section 3. Particularly, Subsection 3.1 discusses how to model a digital image as a graph network, in which graph-edges are valued through a dissimilarity measure. The threshold selection procedure for the hierarchical partition process is explained in Subsection 3.2. The construction of successive region maps obtained from a set of thresholds is detailed in Subsection 3.3. A contraction procedure to simplify the image network into a coarse net and avoid separation of homogeneous regions is described in Subsection 3.4. An example of the construction of successive region maps from a set of thresholds is detailed in Subsection 3.5. Finally, the procedure to transform region maps into a boundary maps is explained in Subsection 3.6. Then, an example of the proposed edge detection ID&L algorithm is shown in Subsection 3.7. After the proposed ID&L method has been exposed, Section 4 describes the evaluation methodology that will be later applied at the computational experiences. Particularly, the usual boundary-based evaluation methodology [1, 9] is recalled in Subsection 4.1, and its extension to allow evaluating the ID&L algorithm is summarized in Subsection 4.2. Then, the benchmarking and the results of the computational experiences carried out in order to evaluate the performance of the proposed method and compare it with that of other well-known edge-detection techniques are described in Section 5. Finally, some conclusions are shed in Section 6.
The Divide-and-Link algorithm for hierarchical network clustering
This section is devoted to present a brief summary of the D&L hierarchical network clustering algorithm [13], which provides the basis for the proposed image-focused, edge-detection ID&L algorithm.
Previously to that, let us stress that in this paper we will be using the word edge in two different senses. The first relates to its usage in the term edge-detection, that refers to the border pixels that conform the boundary of an object in an image. The second usage refers instead to graph edges, that is, the undirected links that represent relationships between nodes of a graph.
This distinction is relevant since, when modeling a digital image as a graph, the graph’s nodes are associated to the image pixels, while graph’s edges usually represent a neighborhood relationship between pixels. However, the pixels that composes the boundary of an object in an image are usually referred to as edges or edge pixels (and typically identified through an edge-detection technique), although these are actually being represented by graph nodes instead of graph edges. Therefore, whenever the context does not make it clear to what usage of the word edge do we refer, we will try to avoid potential confusions by using the term object-edges for the first sense, and graph-edges for the second one.
As just introduced, the D&L algorithm deals with the task of hierarchical clustering of a graph. More specifically, the D&L algorithm departs from a graph G = (V, E) and two measures d and l, respectively allowing an assessment of each graph-edge’s divisive and linking power or strength. From these ingredients, the D&L algorithm produces a hierarchical partition of the network N = {G, d, l} by constructing a succession of spanning forests composed of the most-powerful divisive and linking edges, from which the divisive edges are removed in order to provide a succession of partitions of the graph G given by the connected components thus obtained. In each iteration, the set of graph-edges E is updated by deleting those edges that separate each pair of connected components, until the algorithm ends once E becomes the empty set. For the purpose of reference in the exposition of the new ID&L algorithm introduced in this work, the base D&L algorithm can be summarized as follows (see [13] for further details and a discussion on the performance of this D&L algorithm for community detection and other social network analysis tasks): For each edge e ∈ E of the graph, compute its divisive and linking power through the measures d and l, respectively denoted by d
e
and l
e
. Compute Rank the edges of the network, taking first the edges in Emax (i.e. the edges that are going to break or divide the network). The remaining edges are ranked l
e
1
> l
e
2
> … (i.e. the linking edges are ranked from the greatest linking power to the lowest). Based on this arrangement, a spanning forest S is constructed by means of a Kruskal-like algorithm. The graph partition We update the set of edges E by deleting those edges joining the different clusters of Repeat steps (1)–(6) while E≠ ∅.
The ID&L edge detection algorithm
The principles of the above D&L algorithm [13] can be adapted to the image processing context, and specifically extended in order to enable it to identify the borders of hierarchical regions in an image. In this way, we can propose a new algorithm for the edge detection task, based on the hierarchical region identification approach of the D&L algorithm. We will refer to such new algorithm described below as the Image-Divide-and-Link (ID&L) algorithm. This section is devoted to present it. Particularly, the proposed ID&L algorithm consists of several steps that are described in the following subsections.
In summary, the first step of the ID&L algorithm we propose consists on modeling an image as a network, where the pixels are modeled as a graph and the dissimilarities between pixels are computed (Subsection 3.1). In the second step, the thresholds values which are going to define the partitions in the iterative process are determined (Subsection 3.2). Then, the third step consists on determining hierarchical region maps (Subsection 3.3). A main improvement of the proposed ID&L algorithm in order to adapt it to the context of image processing consists on contracting the pixels that meet a threshold value in a single node in order to reduce computational time and to avoid dividing homogeneous regions (Subsection 3.4). In the final step, the region map obtained through the previous steps is transformed into a boundary map (Subsection 3.6).
Modeling an image as a network
Let us first expose the procedure to represent a digital image as a network (i.e., a graph in which the relationship that links nodes through edges is valued).
Graph representation of a digital image
An image I of dimension r × s is usually presented as a set of pixels I = {P
ij
, i = 1, … r ; j = 1, …, s}, where (i, j) represents the position of the pixel in the image, and where

Some topologies applied to images (networks). (a): four (b-c): eight, and (d) twelve neighbours.
Given an edge e ab joining two pixels a = P ij and b = Pi′j′, let d ab ≥ 0 be the degree of dissimilarity between them. Let also D = {d ab |e ab ∈ E} denote the set of all dissimilarities. Therefore, the information contained in a digital image can be summarized by the network N = {G ; D}.
In this paper we focus on the features of the proposed edge-detection algorithm. In the computational experience described in Section 5 we simply compute dissimilarity measure through the Euclidean distance between the visual spectral information of the pixels, in RGB-color converted to the CIELab color space (CIE76). However, in practice this measure has to be defined taking into account the specific problem and the characteristics of the elements considered.
Let us remark that the new ID&L algorithm just needs a single dissimilarity measure between adjacent pixels -not a pair of similarity/dissimilarity (or linking/divisive strength) measures as needed by the basis D&L algorithm. Notice also that this kind of dissimilarity information cannot be updated in each iteration, as happens with the D&L when applied to social networks analysis (see [13]).
Threshold values in the iterative process
It is important to notice that in the above described D&L algorithm for hierarchical network clustering, the main aim is to obtain a complete partition or dendrogram, in which the number of groups or clusters increases by just one unit in each iteration until all nodes belong to singleton clusters. This is meaningful in the context of social network analysis, as it provides a description of the process by which a given network of items breaks into successively smaller communities, up to the point in which communities are identified with single items.
However, in the context of image processing, such a maximum-depth description or dendrogram may actually not be interesting at all. This is particularly true regarding the edge-detection task, intended to separate objects in an image, which usually are composed of many pixels. Once a certain level of detail is attained, it may not be meaningful to further persist on separating pixels into differentiated, smaller objects.
For this reason, a main difference of the here proposed ID&L algorithm with respect to the basic D&L approach is that now the maximum depth of the dendrogram is prefixed by means of a pre-specified parameter K, that sets the number of iterations the divisive process will be repeated. The idea is that this parameter K identifies the number of different levels of detail that are applied in the edge-detection task, in such a way that each repetition of the divisive process will provide a further description of the details present in each previously separated object. In this sense, in our opinion K has to take relatively small values, usually lower than 10, but anyway much lower than N = r · s, the number of pixels in the image, that would constitute the upper bound in the case of a complete dendrogram. Anyway, this parameter has to be chosen taking into account the requirements that, on the one hand, the hierarchical sequence has to allow acquiring the different levels of detail actually needed and, on the other hand, the computational cost of obtaining the partition must be kept as reduced as possible.
Another main difference, consequence of this prefixed maximum-depth of the partition process, is that the thresholds α -that determine at each iteration which nodes will belong to different regions- have also to be pre-specified once K has been set.
The main idea at this regard is that these thresholds α have to follow a decreasing scheme, in such a way that, as at the algorithm’s first steps α takes a relatively high value, only strongly dissimilar adjacent pixels are separated; this allows recognizing first the main, more contrasted objects (or clusters) of the image. Afterwards, as α decreases, more and more pixels in the clusters already detected are also separated, allowing then to recognize further details in the objects. And obviously, the range in which α may vary depends on the values of the dissimilarities associated to the edges; if
In general, many solutions can be adopted to the interesting problem of selecting an appropriate number of detail levels K and obtaining an adequate sequence of K thresholds α1 > … > α K , in such a way that a divisive process, acquiring further and relevant levels of detail at each step, is produced. Both decisions have to depend on the size and characteristics of the problem. In this work, for the computational experiments described in Section 5, we chose to assign K = 6 and followed a mixed learning approach regarding the election of the α thresholds, based on a preselected family of thresholds configurations, from which a single configuration is selected after a learning process on a set of training images.
More specifically, in this work 8 different configurations for the values α are considered, in each of which K = 6 different intermediate values are chosen between the extremes

(I)-(VIII) Different distributions considered for the α values.
Once the depth of the dendrogram K and the thresholds
For each t from 0 to K + 1 do: Take α = α
t
and divide the set of edges E into Div = {e ∈ E / d
e
≥ α} and Link = {e ∈ E / d
e
< α}. Rank all the edges in E in two steps. First, rank the edges in Div from the highest to the lowest dissimilarity d
e
. After that, add the edges in Link from the lowest to highest dissimilarity d
e
. Based on this arrangement, a spanning forest S is constructed following a Kruskal-like algorithm. The partition Update the set of edges E by deleting those edges joining the different clusters of
Notice that two successive partitions
Coarse networks in the hierarchical partition algorithm
In the context of the application of the D&L hierarchical partition approach to a digital image modeled by a network as above, it is a good idea to try avoiding the potential noise derived from very homogeneous regions, which should not be divided for any threshold used in the partition process. Particularly, bunches of nodes connected by very low dissimilarity edges have to be somehow identified and excluded from the partition process. We use coarse networks to this aim and in order to reduce computational time. This constitutes another main difference between the application of the D&L approach to social network analysis (as performed in [13]) and the here proposed ID&L algorithm for edge-detection tasks.
The central idea is to contract the pixels network N = {G ; D}, so that each homogeneous region is collapsed in a single node (referred to as contracted or shrunken node), before the partition process begins. This kind of contraction is a well-known graph technique, see e.g. [14] for further details.
Thus, let α τ be a predefined dissimilarity threshold; the endpoints a, b of any edge e ab with a dissimilarity d ab ≤ α τ will be considered as a unique node, and the network is rearranged so that the edges incident to this new node are given by the edges that were incident to any of those endpoints. As a consequence, the pixels included in these new contracted nodes will never be separated in the hierarchical partition process. The above contracting operation may be performed in the following way: Let E (α τ ) = {e ab ∈ E ∣ d ab ≤ α τ } be the set of edges with dissimilarity lower than the predefined threshold. Any connected component of the partial graph G (α τ ) = (V, E (α τ )) is merged into a new node, in such a way that the edges incident to it correspond to the edges (in E) incident to some of the pixels of the associated connected component. This operation is illustrated in Fig. 3.

Nodes contraction: d ab ≤ α τ .
As a result of this contraction operation, a new network
The application of the above described hierarchical partition procedure to this contracted network produces the non-trivial partitions
The addition of this contraction step generalizes the hierarchical partition process described in Section 2 and Subsection 3.3, in the sense that for values
Before to proceed with the example, let us remark that the contraction of the pixel network can also be done without taking into account the dissimilarity values, just considering the neighbors of each pixels. For example, each window L × L can be coarsened into one pixel following the same idea. Thus, treating windows as a single node is equivalent to as if the image was reduced and there was no noise remaining in them. Therefore, this idea may provide a further reduction of the noise contained in an image. Once the images have been analyzed, these windows are re-expanded to their original size.
Consider the gray-scale image shown in Fig. 4a. This image is of dimension 5 × 6. Any pixel (i, j) ∈ {1, 2, 3, 4, 5} × {1, 2, 3, 4, 5, 6} has a gray intensity value P ij ∈ {1, 2, 3, 4, 5}, as shown in Fig. 4b. Take d ((i, j) , (i*, j*)) = | P ij - P i * j * | as the dissimilarity measure between pixels. The image network N = {G ; D} is depicted in Fig. 4c.

(a) Gray-scale image, (b) associated matrix, and (c) network N.
For this network, the maximum and minimum dissimilarity are given by
Excluding the partitions
This behavior is however easily avoided by applying the contraction operation on the initial network. The network

Iterations of a basic hierarchical region map process.

(a): Original network, G (V, E), where |V|=30 and |E|=49; and (b): Network

Course net over hierarchical region map: cut edges are red.
The successive non-trivial partitions of the improved hierarchical algorithm are shown in Fig. 7. Now, (s0, s1, …, s5) = (1, 2, 3, 5, 8, 30), and the clusters obtained after expanding the contracted nodes do not separate adjacent pixels with equal gray intensity. Notice also that the algorithm has to perform a much smaller computational effort in this case, since the number of nodes and edges actually considered are reduced significantly. This shows how the contraction operation introduced in Subsection 3.4 helps improving the ID&L hierarchical partition process, clearly allowing a much better adaptation to the image-processing context and particularly to the edge-detection task.
Once the clusters or regions have been determined, their borders or edges have to be identified. The partitions of the set of nodes into hierarchically-arranged families of connected regions obtained in the previous steps can be delimited by the borders of these regions. And these borders can be easily identified by means of the set of divisive graph-edges of the spanning forests. The boundary-edges set are the minimal set of graph-edges which separate the regions of an image partition (see [12]). An advantage of using a connected-components partition (or segmentation) approach to edge-detection is that regions are then disconnected by the boundary graph-edges, imposing the pixel boundaries of the objects to be closed curves.
A procedure to transform the region map output of the exposed hierarchical partition algorithm into a set of boundary maps, the typical output of an edge-detection algorithm, is provided next. To this aim, it is necessary to classify the pixels within a given hierarchical region map into a pair of Black and White classes as follows:
Black = {P
i
∈ V ∣ C (P
i
) = C (P
j
) forall e
ij
= (P
i
, P
j
) ∈ E} White = V - Black
The previous definition provides a method to visualize each partition of a hierarchical region map through a boundary map in which a pixel is colored white if it has at least one neighbor in other region, and pixels surrounded by pixels of the same region are colored black. In this way, it is possible to represent a region map as a boundary map (white-class pixels are boundaries and black-class pixels are not-boundaries).
Nevertheless, notice that after this transformation, the object-edges or boundaries are two-pixels thick since the procedure places the pixel-border between any two regions in both regions simultaneously. This may not be an adequate way of representing object-edges. Furthermore, it complicates any comparison with other edge detection techniques, which usually produce one-pixel thick object-edges. Then, to eliminate this problem, the edges between two regions are assigned to the boundary pixels of the larger region (i.e. that with a greater number of pixels). That is, from any pair of adjacent pixels in different regions, just the one in the larger region is kept as a white, boundary pixel, the other one in the smaller region being switched to black. In this way, the final boundary map is just one-pixel thick.
This regions-to-boundaries transformation operation constitutes the last step of the ID&L algorithm here proposed. Therefore, the output of the ID&L algorithm is a family of boundary maps Bt, t = 1, ..., K, obtained by applying this last step on each partition Pt, t = 1, ..., K provided by the hierarchical partition procedure.
Example of the image divide and Link algorithm
Figure 8 illustrates the output provided by the ID&L algorithm. This figure depicts the boundary maps resulting from applying the ID&L algorithm on the well-known Bear image. As it can be seen, the ID&L pictures the evolution of the hierarchical region map process, representing different detail levels of an image at each step. Particularly, such a hierarchical region map process can be understood as an approximation to the different detail levels with which an image can be segmented by humans.

Application of the ID&L algorithm on the Bear image. (a) Original image. (b)-(h) Boundary maps resulting from the vector of thresholds α = (α1, α2, α3, α4, α5, α6, α7, α8) = (47.79, 40.55, 33.29, 28.47, 23.63, 18.79, 11.55, 4.30).
Edge detection techniques as the proposed ID&L are most usually evaluated through a so-called boundary-based evaluation methodology. Given a digital image I, the basis of this methodology is to compare the boundary map provided by an edge detector algorithm applied on I with a set of human-made boundary maps that act as reference or ground truth about the objects actually present in I.
We have chosen to evaluate the proposed ID&L algorithm through this methodology, instead of the alternative region-based evaluation methodology. Two are the reasons for this election. First, the evaluation of object recognition techniques by means of their boundaries is far more developed and extended than its counterpart focusing on the regions or objects themselves. In this sense, there are more available benchmark datasets, and these are more standardized and accepted, for edge-based evaluation than for region-based evaluation. Similarly, the evaluation methodologies usually applied in the edge-based approach are quite more established and finds a more extended consensus than those applied in the region-based approach [2, 24]. And secondly, object recognition tasks are more typically identified with widely known edge-detectors such as Canny or Sobel than with segmentation procedures 1 , which have not found a so-wide reach. Consequently, these edge-detection techniques are usually more easily appreciated as providing appropriate benchmarks for comparisons than not-so-well-known segmentation techniques [18, 23].
Then, the first part of this section is devoted to expose the basics of the boundary-based evaluation methodology that will be applied in the computational experiments later in this paper, which is quite similar to the approach proposed in [26] (see the remark below in Section 4.1). However, in order to enable the evaluation of an edge-detection technique providing not just a single boundary map per image, but rather a sequence of boundary maps per image, as ID&L does, the basic framework of Section 4.1 has to be extended to the case of many-to-many comparisons between such sequence of automatic boundary maps and the set of human ground truth references. This extension is described in Section 4.2.
Basic boundary-based evaluation
In this section, we recall the basics of the boundary-based evaluation methodology in the context of edge detectors (see, for instance, [1]). The methodology for benchmarking boundary (or edge-) detection algorithms developed in [26] departs from a dataset with several different images, that can be further divided into a training and a test set. Each image in this dataset has to be accompanied by a set of human-made reference boundary maps that serve as ground truth for evaluating the automatic boundary maps that constitute the output of an edge detection technique (see [1]). Therefore, a one-to-many comparison between the automatic boundary map and the set of human maps has to be performed. This is accomplished by performing a set of one-to-one comparisons between the automatic map and each human map. The results of these one-to-one comparisons must then be aggregated to generate the overall result of the one-to-many comparison.
In more detail, the method evaluates the quality of a boundary detector by applying the precision-recall framework for comparing automatic boundaries against human ground truth. Each one-to-one comparison is based on a matching of the boundary pixels of the automatic map with those of the human map, i.e. on a one-to-one correspondence between the boundary pixels of both maps. A distance threshold δ is defined to specify the tolerance level to small boundary localization errors. Then, an unmatched automatic boundary pixel that lies closer than a distance δ from a human boundary pixel is counted as a true positive (TP). Otherwise, unmatched automatic boundary pixels are counted as false positives (FP). And unmatched human boundary pixels are counted as false negatives (FN). In this way, a confusion matrix is created, from which it is possible to calculate precision (
Now, it is necessary to aggregate the F-measures of the different one-to-one comparisons associated to an image I into a one-to-many quality measure of the automatic map as a boundary map for the objects in such image I. This aggregated measure for image I will be denoted by F
I
, and is obtained through an aggregation operator
Notice that F I measures the performance of a specific parametric configuration of a given edge-detection procedure with respect to an image I. In order to obtain a global measure of the performance of such parametric configuration over all the images in the dataset, the F I measures obtained on the different images of the set are usually averaged through an arithmetic mean. Let us denote by F* such global measure of performance. Then, notice that an F* measure is obtained on the training set for each parametric configuration of the method being evaluated, while only the F* measure of that parametric configuration found to be optimal on the training set is computed on the test set. Particularly, such optimal configuration is identified as that producing the highest or maximum F* on a training set.
In the previous section, we have described how to evaluate an edge detection method producing just a single automatic boundary map for each parametric configuration (for a given image). However, notice that a hierarchical procedure as the proposed ID&L algorithm instead produces a sequence of candidate boundary maps for each parametric configuration. This leads to a more complex evaluation context, in which many-to-many comparisons arise as a consequence of the many automatic boundary maps and many human boundary maps to be compared for each image and for each parametric configuration. Therefore, it is necessary to procure an adaptation of the edge-based evaluation methodology for sequences of candidate boundary maps.
To address these many-to-many comparisons arising in the context of the evaluation of a sequence of automatic boundary maps, we follow the approach introduced in [35]. The main point of this approach is to consider the set of human ground truths available for each image as a sample of the different levels of detail that may be used when delimiting objects in an image, in such a way that the sequence of automatic boundary maps is interpreted as an approximation to such sample. That is, each human reference’s level of detail is best approximated by exactly one automatic boundary map of the sequence. Therefore, these best-approximating automatic maps have to be taken as the basis for the evaluation of the sequence as a whole. In other words, by means of the approach in [35], the many-to-many comparisons problem is reduced into a kind of one-to-many comparison framework, in which just the automatic map that best matches a specific human map is used to compare the sequence with that human ground truth. Such one-to-one comparisons are carried out as described above, and their results are aggregated into a one-to-many performance measure F I for each image, and later in a global F* measure for each parametric configuration following the same principles.
Formally, let
From these best-approximating automatic maps for each human, a measure of the sequence’s performance in capturing the different levels of detail provided by the sample of human ground truths is computed via an aggregation operator
The evaluation procedure to obtain a global F* measure for each parametric configuration over the training set is the same as before. And similarly, just the F* of the optimal configuration is computed on the test set. Finally, let us a remark about the aggregation operator
In this section, we describe the computational experiments carried out for the benchmarking of the proposed ID&L algorithm. This benchmarking consists of a comparison of the results provided by the ID&L algorithm with those obtained through the well-known Canny and Sobel detectors, which constitute widely extended methodologies for object recognition, and particularly for edge-detection tasks. This comparison is carried out in the context of the Berkeley Segmentation Dataset and Benchmark 500 (BSDS500) [1, 26]. In order to do so, we have first used the BSDS500 training set (200 images which have a resolution of 481 × 321 pixels) to find out the optimal parametric configurations of each of the contending methods. Then, the optimal configurations of each method have been validated in the BSDS500 test set (another 200 images with a resolution of 481 × 321 pixels).
This section is then organized by first providing a more detailed description of the experimental setup or procedure used in these computational experiments. After that, the results obtained on the training set are described. And finally, the results obtained on the test set are presented and discussed.
Experimental procedure
The Berkeley Segmentation Dataset (BSDS500) [25] consist of 500 natural images, which are divided into a training set of 200 images, a test set of 200 images and a validation set of 100 images. Each image of BSDS500 is accompanied by a set of 4 to 7 human-made reference boundary maps. Our experimental procedure involves the comparison of five different methods, namely Sobel, Autoadapted Sobel (A-Sobel), Canny, Autoadapted Canny (A-Canny) and Image-Divide-and-Link (ID&L). Note that the autoadapted methods are self-configuring versions of their counterparts. A description of each of the methods of boundary detection, in terms of the Bezdek breakdown structure [3] and the parametric configuration used, is included in Table 1. Notice that each of such methods will be executed repeatedly on the BSDS500 training set to determine the best possible configuration of its parameters. In addition, the comparisons of the methods were performed with Gaussian filter (σ s = 1 in Tables 1-7) and without Gaussian filter (σ s = 0). Indeed, ID&L algorithm performs quite well without any previous preprocessing. This makes the proposal method differ even more from other proposals.
Schematic decomposition of the boundary detection algorithms used according to the Bezdek Breakdown Structure [3]
Schematic decomposition of the boundary detection algorithms used according to the Bezdek Breakdown Structure [3]
Schematic decomposition of the applied ID&L algorithm [12]
Global results for each of the methods on the BSDS500 training set
Global results for each of the methods on the BSDS500 test set
Wilcoxon test of the comparisons of the ID&L algorithm with all the considered edge-detectors for
Wilcoxon test of the comparisons of the ID&L algorithm with all the considered edge-detectors for
Wilcoxon test of the comparisons of the ID&L algorithm with all the considered edge-detectors for
Table 2 describes the setting of the ID&L algorithm. Notice that the range of some parameters depends on the dissimilarity measure d as well as on the specific ranges such d takes on each image. Particularly, in this paper we have computed the dissimilarity between adjacent pixels as the Euclidean distance between their spectral information, in RGB-color converted to the CIELab color space (CIE76). Moreover, although the ID&L algorithm has four parameters (K, α, α
τ
, L), in this illustrative experiment we only focus on varying the set of thresholds α = (α0, …, α
K
) that guides the region map process. The parameters K, L and α
τ
are left constant along the different executions of the algorithm. Their specific values have been chosen to produce enough suitable and detailed results. Particularly, we have observed that
Regarding the thresholds α, the 8 different general configurations exposed in Subsection 3.2 have been considered. As discussed there, each image possesses a different range
Each one of the edge detection methods and the proposed ID&L algorithm has been executed on the BSDS500 training set to determine their optimal parametric configurations through the generation of a precision and recall (P&R) curve. The main advantage of using precision and recall data is that, for each algorithm, it enables to compare not only the goodness of the partitions generated by them, but also the results produced by the same algorithm using different input parameters. In order to compare the performance of the ID&L algorithm regarding the others methods in a quantitative manner, we have graphed three P&R curves based on three aggregation operators

P&R curves for all the algorithms and best precision/recall values (bolded icons) obtained using

P&R curves for all the algorithms and best precision/recall values (bolded icons) obtained using

P&R curves for all the algorithms and best precision/recall values (bolded icons) obtained using
For more detail, the best precision/recall pairs, more specifically the highest F* measures, corresponding to the best resulting parameter configuration for each of the methods, are included in Table 3. In this table, for both conditioning cases σ s = 0 and σ s = 1, our proposed algorithm ID&L obtains the highest maximum, mean and minimum F* on the training set.
Once the optimal configuration of the parameters has been determined for each of the methods and for each of the aggregation operators, we have executed each algorithm with their optimal parameters on the BSDS500 test set. The results gathered on such set of test images are shown in Table 4.
It is possible to summarize these results by saying that ID&L obtains a higher performance in comparison to the other edge detection methods considered. Wilcoxon tests’ results are shown in Tables 5, 6 and 7, allowing to conclude that the proposed ID&L algorithm performs significantly better than the other considered detectors on the BSDS500 dataset.
In terms of run-time, the CPU time was no more than one minute for each boundary map obtained by the ID&L algorithm. These times were measured on a 3.00 GHz Core(TM)2 Duo with 3 GB of RAM.
Conclusions
This paper has presented the ID&L algorithm, an innovative edge detection method based on a hierarchical graph-partition approach. Particularly, the ID&L algorithm is based on the D&L hierarchical graph clustering algorithm, developed for hierarchical community detection tasks in the context of social network analysis. In this sense, the proposed ID&L detector results from a deep adaptation of the D&L algorithm to the context of image processing, including many new features that extends the previous technique to suit it to the task of edge detection. As described in Section 3, these new features include: the usage of just a single dissimilarity measure to configure a graph network from a digital image, the introduction of different parameters to control the sequence of levels of detail at which object are detected, the introduction of a coarse net step to collapse homogeneous regions and the procedure to transform region maps into boundary maps.
Due to its graph-based approach, the basic information used by ID&L algorithm is the dissimilarity degree between pairs of neighbouring nodes or pixels. This enables the ID&L algorithm to operate naturally on colour or hyperspectral digital images once an appropriate dissimilarity measure has been provided. Furthermore, due to the hierarchical nature of the underlying graph-partition methodology, the ID&L algorithm allows to simultaneously detect edges at different levels of detail, providing a sequence of boundary maps in which the detail level is hierarchically increased at each step (that is, new objects are detected inside the previously detected objects). Let us remark that, although other (non-hierarchical) edge detection procedures may provide different detail levels by varying the parameters configuration, typically the so-obtained boundary maps are not hierarchical. This feature of the proposed ID&L method tries to somehow emulate the human ability to recognize objects in a hierarchical way, attending to different scales and levels of contrast, detail, etc. To this aim, the ID&L algorithm allows the user to specify the number K of different levels of detail the algorithm has to work with, as well as the dissimilarity thresholds α = (α1, …, α K ) that separate these different levels. This hierarchical feature of the proposed ID&L detector poses the interesting problem of providing a method to automatically select adequate values for these parameters K and α from just the initial information provided in a digital image. In this work, we have prefixed K and then tested different spatial distributions or configurations of the thresholds α1, …, α K along the range of observed dissimilarities between pixels of an image. Experimental results suggest that a good performance is attained when thresholds are distributed more or less uniformly, but with a greater density at the middle of the dissimilarity range (e.g. distribution VII in Fig. 2). Anyway, this bi-level parameter selection problem constitutes a main direction of future work regarding the development of the ID&L algorithm. The application of some machine learning approach to search for an optimal parameter configuration, and the development of an heuristic, one-step method based on the frequency distribution of dissimilarities between pixels are also under development.
The computational experiments conducted for the benchmarking of the proposed ID&L algorithm on the BSDS500 image dataset suggest a good performance of ID&L when compared with best-known detectors based on Canny [7] or Sobel [39]. This performance comparison between typical edge detectors (providing a single boundary map for each image) and a hierarchical method as the ID&L (providing a sequence of maps) has been made possible by extending the usual boundary-based edge detection evaluation methodology as described in Section 4. This extension allows addressing many-to-many comparisons as those arising from the matching of a sequence of candidate boundary maps and a set of human reference maps that provide the ground truth for the evaluation.
Particularly, ID&L showed a statistically significant better performance than that of all the considered methods. Moreover, this statistically significant better performance is successively obtained when the performance measures associated to the comparisons with different human references are aggregated under different schemes (conjunctive, averaging, disjunctive). This means that, for instance, for a given image the best (resp. worst) matching between a map of the ID&L sequence and one of the human reference maps tends to be better than the best (worst) matching between the map provided by the non-hierarchical detectors and any human reference. This suggests that the superior performance of the ID&L detector is robust, or at least enough solid to be maintained under different semantics of the comparisons.
Footnotes
Let us emphasize that a main difference between segmentation and edge detection is that, while edge detection identifies boundaries by pixels (i.e. graph nodes), segmentation identifies boudaries as graph edges separating nodes or pixels in different regions. In this sense, although both are intended to address the same task of object recognition, edge detection and image segmentation are intrinsically different techniques, since they produce rather different outputs [
].
Acknowledgments
This research has been partially supported by the Government of Spain, grant TIN2015-66471-P, Government of Community of Madrid, grant S2013/ICE-2845 (CASI-CAM) and UCM Research Group 910149.
