A novel edge detection algorithm based on a hierarchical graph-partition approach

Abstract

In this paper, an efficient and polynomial edge detection algorithm based on a hierarchical graph-partition approach is presented. After transforming a digital image into a graph network, the proposed algorithm proceeds by iteratively dividing the image into regions, and then transforming this hierarchical region map into a sequence of boundary maps. This allows the proposed algorithm to operate naturally with colour or hyperspectral images, as well as to detect edges at different levels of detail in a simultaneous and consistent manner. Such a sequence of edge maps can be seen as jointly approximating the different levels of detail that humans may use when recognizing objects in an image. This idea is taken to base the evaluation methodology of the proposed algorithm, that extends the usual boundary-based evaluation methodology based on the matching of the automatic maps with a set of human ground truth, reference maps. The computational experiences carried out to benchmark the performance of the proposed algorithm over the BSDS500 dataset suggest that the proposed method attains a statistically significant better performance than some well-known detectors as Canny or Sobel.

Keywords

Edge detection hierarchical partition graph-based approach boundary map benchmarking

1 Introduction

The purpose of computer vision is imitating human cognition in order to electronically perceive and understand a digital image or video [41]. Computer vision has many real-world applications, from machine inspection to medical imaging and others (see for example [6 , 42]). Thus, computer vision aims to solve very complex tasks, for which it relies on various image processing operations such as image capture, preprocessing, edge detection, region extraction, region labelling, high-level identification, qualitative/quantitative conclusion, among others [41].

In this paper, we focus on the Edge Detection problem or task. Edge detection is widely applied in many disciplines as medical imaging, object detection and recognition tasks, among others [10 , 45]. In some cases, the output of the edge detection algorithm already provides the solution needed in such tasks [4 , 38]; in other cases, the obtained output is just an intermediate step of the image analysis operations to be applied, as in the feature detection and/or object recognition tasks [19].

Edge detectors are image processing algorithms which analyze brightness variations between pixels of an image in order to detect the edges or boundaries that define the different objects appearing in such an image. More specifically, edge detectors tipically analyze the intensity function of each pixel in the image [41], in such a way that when significant differences between two neighboring pixels are located, an edge (or boundary) of the region containing one of these pixels is signalled or detected [15]. These boundaries can be depicted in the digital image by drawing a white line onto these signalled pixels and setting the other, non-signalled ones to the black background. Most of the methods for edge detection have been developed for gray-scale images (see for example [7 , 39]). Furthermore, the output of these edge detection algorithms is usually a unique binary boundary map.

The edge detection algorithm proposed in this paper deals with arbitrary digital images (gray-scale, RGB, hyperspectral, etc.), and is based on an unsupervised hierarchical graph-partition algorithm, namely the Divide-and-Link (D&L) algorithm [13]. The idea is to extend this hierarchical graph-partition algorithm to obtain a hierarchical sequence of region maps on the image [12], and then transform these regions into a sequence of boundary maps. We will refer to the proposed method as the Image Divide-and-Link (ID&L) algorithm. The hierarchical graph-partition algorithm we rely on to do this is similar to other classical hierarchical graph-based partition algorithms, (see, e.g., [11, 40] and the extensions in [16, 44]).

Thus, the output of the proposed ID&L method consists of a sequence of boundary maps in which the found objects are displayed hierarchically. This sequence of boundary maps as a whole can be seen as an approximation to the different levels of detail that humans may use when recognizing objects in an image; it allows, for instance, to choose the appropriate partition of interest or to obtain a soft boundary map by joining the sequence of boundary maps and assigning a level of intensity to each border according to its level of hierarchy. In this way, obtaining a sequential output provides flexibility in many aspects, as opposed to obtaining only one boundary map.

We have chosen the BSDS500 dataset [25] as the context for the computational experiences conducted for evaluating the proposed ID&L method. This dataset is used along the usual boundary-based evaluation methodology for edge-detection [1, 26], which is extended in this paper following [35]. In this way we allow many-to-many comparisons between the sequence of automatic boundary maps provided by the ID&L and the set of human reference boundary maps provided with BSDS500 as ground truth. The results of these experiments are quite promising, as ID&L shows a statistically significant better performance than common detectors as Canny [7] or Sobel [39].

The paper is organized as follows: In Section 2 we provide a basic description of the Divide-and-Link algorithm for hierarchical network clustering [13], which is the basis of the proposed algorithm. After these preliminaries, the proposed edge detection ID&L algorithm is described in Section 3. Particularly, Subsection 3.1 discusses how to model a digital image as a graph network, in which graph-edges are valued through a dissimilarity measure. The threshold selection procedure for the hierarchical partition process is explained in Subsection 3.2. The construction of successive region maps obtained from a set of thresholds is detailed in Subsection 3.3. A contraction procedure to simplify the image network into a coarse net and avoid separation of homogeneous regions is described in Subsection 3.4. An example of the construction of successive region maps from a set of thresholds is detailed in Subsection 3.5. Finally, the procedure to transform region maps into a boundary maps is explained in Subsection 3.6. Then, an example of the proposed edge detection ID&L algorithm is shown in Subsection 3.7. After the proposed ID&L method has been exposed, Section 4 describes the evaluation methodology that will be later applied at the computational experiences. Particularly, the usual boundary-based evaluation methodology [1, 9] is recalled in Subsection 4.1, and its extension to allow evaluating the ID&L algorithm is summarized in Subsection 4.2. Then, the benchmarking and the results of the computational experiences carried out in order to evaluate the performance of the proposed method and compare it with that of other well-known edge-detection techniques are described in Section 5. Finally, some conclusions are shed in Section 6.

2 The Divide-and-Link algorithm for hierarchical network clustering

This section is devoted to present a brief summary of the D&L hierarchical network clustering algorithm [13], which provides the basis for the proposed image-focused, edge-detection ID&L algorithm.

Previously to that, let us stress that in this paper we will be using the word edge in two different senses. The first relates to its usage in the term edge-detection, that refers to the border pixels that conform the boundary of an object in an image. The second usage refers instead to graph edges, that is, the undirected links that represent relationships between nodes of a graph.

This distinction is relevant since, when modeling a digital image as a graph, the graph’s nodes are associated to the image pixels, while graph’s edges usually represent a neighborhood relationship between pixels. However, the pixels that composes the boundary of an object in an image are usually referred to as edges or edge pixels (and typically identified through an edge-detection technique), although these are actually being represented by graph nodes instead of graph edges. Therefore, whenever the context does not make it clear to what usage of the word edge do we refer, we will try to avoid potential confusions by using the term object-edges for the first sense, and graph-edges for the second one.

As just introduced, the D&L algorithm deals with the task of hierarchical clustering of a graph. More specifically, the D&L algorithm departs from a graph G = (V, E) and two measures d and l, respectively allowing an assessment of each graph-edge’s divisive and linking power or strength. From these ingredients, the D&L algorithm produces a hierarchical partition of the network N = {G, d, l} by constructing a succession of spanning forests composed of the most-powerful divisive and linking edges, from which the divisive edges are removed in order to provide a succession of partitions of the graph G given by the connected components thus obtained. In each iteration, the set of graph-edges E is updated by deleting those edges that separate each pair of connected components, until the algorithm ends once E becomes the empty set. For the purpose of reference in the exposition of the new ID&L algorithm introduced in this work, the base D&L algorithm can be summarized as follows (see [13] for further details and a discussion on the performance of this D&L algorithm for community detection and other social network analysis tasks):

For each edge e ∈ E of the graph, compute its divisive and linking power through the measures d and l, respectively denoted by d_e and l_e.

Compute $α = max_{e \in E} {d_{e}}$ and find E_max = {e ∈ E / d_e = α} ⊂ E.

Rank the edges of the network, taking first the edges in E_max (i.e. the edges that are going to break or divide the network). The remaining edges are ranked l_e
₁> l_e
₂ > … (i.e. the linking edges are ranked from the greatest linking power to the lowest).

Based on this arrangement, a spanning forest S is constructed by means of a Kruskal-like algorithm.

The graph partition $P$ provided in this step is obtained as the connected components of the spanning forest S^* obtained after the deletion of those edges of S that belong to the divisive links E_max.

We update the set of edges E by deleting those edges joining the different clusters of $P$ .

Repeat steps (1)–(6) while E≠ ∅.

3 The ID&L edge detection algorithm

The principles of the above D&L algorithm [13] can be adapted to the image processing context, and specifically extended in order to enable it to identify the borders of hierarchical regions in an image. In this way, we can propose a new algorithm for the edge detection task, based on the hierarchical region identification approach of the D&L algorithm. We will refer to such new algorithm described below as the Image-Divide-and-Link (ID&L) algorithm. This section is devoted to present it. Particularly, the proposed ID&L algorithm consists of several steps that are described in the following subsections.

In summary, the first step of the ID&L algorithm we propose consists on modeling an image as a network, where the pixels are modeled as a graph and the dissimilarities between pixels are computed (Subsection 3.1). In the second step, the thresholds values which are going to define the partitions in the iterative process are determined (Subsection 3.2). Then, the third step consists on determining hierarchical region maps (Subsection 3.3). A main improvement of the proposed ID&L algorithm in order to adapt it to the context of image processing consists on contracting the pixels that meet a threshold value in a single node in order to reduce computational time and to avoid dividing homogeneous regions (Subsection 3.4). In the final step, the region map obtained through the previous steps is transformed into a boundary map (Subsection 3.6).

3.1 Modeling an image as a network

Let us first expose the procedure to represent a digital image as a network (i.e., a graph in which the relationship that links nodes through edges is valued).

3.1.1 Graph representation of a digital image

An image I of dimension r × s is usually presented as a set of pixels I = {P_ij, i = 1, … r ; j = 1, …, s}, where (i, j) represents the position of the pixel in the image, and where $P_{ij} = [P_{ij}^{1}, P_{ij}^{2}, P_{ij}^{3}]$ represents the visible spectral information associated to the intensity of the (i, j)-pixel in the RGB spectrum. Then, given an image I, we can associate it to a graph G = (V, E), in which V is the set of pixels and the set of edges E is obtained by considering two nodes/pixels P_ij, P_i′j′ to be adjacent if and only if they share one coordinate and they are contiguous. Thus, an edge e = {a, b} (also denoted as e_ab) exists in E if and only if the pixels a = P_ij and b = P_i′j′ are adjacent. In other words, we consider a 4-connectivity neighboring topology (see Fig. 1(a)) on the image pixels. Other topologies can be considered, as shown in Fig. 1(b), (c) and (d).

Fig.1

Some topologies applied to images (networks). (a): four (b-c): eight, and (d) twelve neighbours.

3.1.2 Dissimilarity between pixels

Given an edge e_ab joining two pixels a = P_ij and b = P_i′j′, let d_ab ≥ 0 be the degree of dissimilarity between them. Let also D = {d_ab|e_ab ∈ E} denote the set of all dissimilarities. Therefore, the information contained in a digital image can be summarized by the network N = {G ; D}.

In this paper we focus on the features of the proposed edge-detection algorithm. In the computational experience described in Section 5 we simply compute dissimilarity measure through the Euclidean distance between the visual spectral information of the pixels, in RGB-color converted to the CIELab color space (CIE76). However, in practice this measure has to be defined taking into account the specific problem and the characteristics of the elements considered.

Let us remark that the new ID&L algorithm just needs a single dissimilarity measure between adjacent pixels -not a pair of similarity/dissimilarity (or linking/divisive strength) measures as needed by the basis D&L algorithm. Notice also that this kind of dissimilarity information cannot be updated in each iteration, as happens with the D&L when applied to social networks analysis (see [13]).

3.2 Threshold values in the iterative process

It is important to notice that in the above described D&L algorithm for hierarchical network clustering, the main aim is to obtain a complete partition or dendrogram, in which the number of groups or clusters increases by just one unit in each iteration until all nodes belong to singleton clusters. This is meaningful in the context of social network analysis, as it provides a description of the process by which a given network of items breaks into successively smaller communities, up to the point in which communities are identified with single items.

However, in the context of image processing, such a maximum-depth description or dendrogram may actually not be interesting at all. This is particularly true regarding the edge-detection task, intended to separate objects in an image, which usually are composed of many pixels. Once a certain level of detail is attained, it may not be meaningful to further persist on separating pixels into differentiated, smaller objects.

For this reason, a main difference of the here proposed ID&L algorithm with respect to the basic D&L approach is that now the maximum depth of the dendrogram is prefixed by means of a pre-specified parameter K, that sets the number of iterations the divisive process will be repeated. The idea is that this parameter K identifies the number of different levels of detail that are applied in the edge-detection task, in such a way that each repetition of the divisive process will provide a further description of the details present in each previously separated object. In this sense, in our opinion K has to take relatively small values, usually lower than 10, but anyway much lower than N = r · s, the number of pixels in the image, that would constitute the upper bound in the case of a complete dendrogram. Anyway, this parameter has to be chosen taking into account the requirements that, on the one hand, the hierarchical sequence has to allow acquiring the different levels of detail actually needed and, on the other hand, the computational cost of obtaining the partition must be kept as reduced as possible.

Another main difference, consequence of this prefixed maximum-depth of the partition process, is that the thresholds α -that determine at each iteration which nodes will belong to different regions- have also to be pre-specified once K has been set.

The main idea at this regard is that these thresholds α have to follow a decreasing scheme, in such a way that, as at the algorithm’s first steps α takes a relatively high value, only strongly dissimilar adjacent pixels are separated; this allows recognizing first the main, more contrasted objects (or clusters) of the image. Afterwards, as α decreases, more and more pixels in the clusters already detected are also separated, allowing then to recognize further details in the objects. And obviously, the range in which α may vary depends on the values of the dissimilarities associated to the edges; if $α > \bar{d}$ , where $\bar{d} = max (D)$ , all pixels would be assigned to the trivial cluster V. On the other hand, if $α \leq \underline{d}$ , where $\underline{d} = min (D)$ , each pixel would define a trivial cluster with only one element (i.e. singletons).

In general, many solutions can be adopted to the interesting problem of selecting an appropriate number of detail levels K and obtaining an adequate sequence of K thresholds α₁ > … > α_K, in such a way that a divisive process, acquiring further and relevant levels of detail at each step, is produced. Both decisions have to depend on the size and characteristics of the problem. In this work, for the computational experiments described in Section 5, we chose to assign K = 6 and followed a mixed learning approach regarding the election of the α thresholds, based on a preselected family of thresholds configurations, from which a single configuration is selected after a learning process on a set of training images.

More specifically, in this work 8 different configurations for the values α are considered, in each of which K = 6 different intermediate values are chosen between the extremes $\underline{d}$ and $\bar{d}$ according to different distributions. This is illustrated in Fig. 2, in which the different configurations are enumerated with the Roman numbers I to VIII. The first distribution I considers K equidistant thresholds uniformly distributed in the range $[\underline{d}, \bar{d}]$ . Then, other 4 combinations are derived departing from the midpoint $w = (\underline{d} + \bar{d}) / 2$ of the interval $[\underline{d}, \bar{d}]$ , in such a way that 3 thresholds are uniformly distributed at either side of w and the remaining 2 in the other side (distributions II and III). Configurations IV and V are obtained in a similar way, now uniformly placing 4 thresholds on either side of the midpoint w and the remaining one in the other. Finally, the remaining 3 distributions are obtained by dividing the range $[\underline{d}, \bar{d}]$ into 3 equal parts, so that one threshold is placed at the center of two of these parts, and the remaining two thresholds are uniformly distributed in the remaining part. These are configurations VI, VII and VIII.

Fig.2

(I)-(VIII) Different distributions considered for the α values.

3.3 Determining hierarchical region maps

Once the depth of the dendrogram K and the thresholds $\underline{d} = α_{0} > α_{1} > \dots > α_{K + 1} = \bar{d}$ are selected, the hierarchical partitions of the image network are obtained following similar steps to the D & L algorithm described in Section 2. The process is as follows.

For each t from 0 to K + 1 do:

Take α = α_t and divide the set of edges E into Div = {e ∈ E / d_e ≥ α} and Link = {e ∈ E / d_e < α}.

Rank all the edges in E in two steps. First, rank the edges in Div from the highest to the lowest dissimilarity d_e. After that, add the edges in Link from the lowest to highest dissimilarity d_e.

Based on this arrangement, a spanning forest S is constructed following a Kruskal-like algorithm.

The partition $P^{t}$ provided at this step is constructed as the connected components of the spanning forest S^* obtained after the deletion of those edges of E that belong to the divisive links Div (i.e. $P^{t}$ = {connected components of S - Div}.

Update the set of edges E by deleting those edges joining the different clusters of $P^{t}$ .

Notice that two successive partitions $P^{t - 1}$ and $P^{t}$ will be different if there exists an edge e_ab ∈ E such that α_t-1 > d_ab ≥ α_t. In this way, the number of different threshold’s values that can effectively provide different partitions is | {d_e|e ∈ E} | ≤ m = |E|. However, the number of applied thresholds is relevant to determine the partitions obtained after the last iteration, since the partition obtained after applying a single threshold α₂ is not necessarily the same as would be obtained by e.g. applying thresholds α₁ > α₂. Notice also that the above algorithm has a major loop based in the hierarchical depth K, the upper bound of which is n = |V|, which is attained when $| P^{i} | = | P^{i - 1} | + 1$ for all i. As commented above, in an edge-detection context a very large number of partitions might obviously be uninformative, and therefore we concluded that the parameter K has to take relatively small values. This significantly contributes to reduce the computational time needed to produce the sequence of partitions.

Remark. The use of a spanning tree structure is not new in the graph-based image segmentation context. Although [11 , 44] also use a spanning tree based on a Kruskal-like algorithm, the main difference with their works is that the proposed ID&L algorithm uses a spanning forest that is constructed with a varying edge order. That is, a decreasing dissimilarity-order for the edges whose weight is larger or equal than a threshold and an increasing order for the remaining edges. Also, the algorithm proposed by [11] does not explicitly produce a hierarchy of partitions as our hierarchical clustering algorithm does. In contrast to the ID&L algorithm, the algorithms presented in [28, 30] use only a single construction of the minimum spanning tree.

3.4 Coarse networks in the hierarchical partition algorithm

In the context of the application of the D&L hierarchical partition approach to a digital image modeled by a network as above, it is a good idea to try avoiding the potential noise derived from very homogeneous regions, which should not be divided for any threshold used in the partition process. Particularly, bunches of nodes connected by very low dissimilarity edges have to be somehow identified and excluded from the partition process. We use coarse networks to this aim and in order to reduce computational time. This constitutes another main difference between the application of the D&L approach to social network analysis (as performed in [13]) and the here proposed ID&L algorithm for edge-detection tasks.

The central idea is to contract the pixels network N = {G ; D}, so that each homogeneous region is collapsed in a single node (referred to as contracted or shrunken node), before the partition process begins. This kind of contraction is a well-known graph technique, see e.g. [14] for further details.

Thus, let α_τ be a predefined dissimilarity threshold; the endpoints a, b of any edge e_ab with a dissimilarity d_ab ≤ α_τ will be considered as a unique node, and the network is rearranged so that the edges incident to this new node are given by the edges that were incident to any of those endpoints. As a consequence, the pixels included in these new contracted nodes will never be separated in the hierarchical partition process. The above contracting operation may be performed in the following way: Let E (α_τ) = {e_ab ∈ E ∣ d_ab ≤ α_τ} be the set of edges with dissimilarity lower than the predefined threshold. Any connected component of the partial graph G (α_τ) = (V, E (α_τ)) is merged into a new node, in such a way that the edges incident to it correspond to the edges (in E) incident to some of the pixels of the associated connected component. This operation is illustrated in Fig. 3.

Fig.3

Nodes contraction: d_ab ≤ α_τ.

As a result of this contraction operation, a new network $\underline{N} = (\underline{V}, \underline{E})$ is obtained, where $\underline{V}$ is the set of connected components of G (α_τ) and every edge of $\underline{E}$ is associated to an original edge of E with a dissimilarity greater than α_τ.

The application of the above described hierarchical partition procedure to this contracted network produces the non-trivial partitions ${{\underline{P}}^{1}, \dots, {\underline{P}}^{K}}$ . Then, the partition procedure is finished when for any i and t the clusters ${\underline{C}}_{i}^{t} \in {\underline{P}}^{t}$ are expanded so that the original pixels are recovered.

The addition of this contraction step generalizes the hierarchical partition process described in Section 2 and Subsection 3.3, in the sense that for values $α_{τ} < \underline{d}$ the same partitions are obtained as if the contraction operation is not applied. Particularly, as we will illustrate in the next example, the previous application of this contraction operation constitutes a significant improvement with respect to the basis algorithm, both in terms of performance and computational effort.

Before to proceed with the example, let us remark that the contraction of the pixel network can also be done without taking into account the dissimilarity values, just considering the neighbors of each pixels. For example, each window L × L can be coarsened into one pixel following the same idea. Thus, treating windows as a single node is equivalent to as if the image was reduced and there was no noise remaining in them. Therefore, this idea may provide a further reduction of the noise contained in an image. Once the images have been analyzed, these windows are re-expanded to their original size.

3.5 Example of the hierarchical region map algorithm

Consider the gray-scale image shown in Fig. 4a. This image is of dimension 5 × 6. Any pixel (i, j) ∈ {1, 2, 3, 4, 5} × {1, 2, 3, 4, 5, 6} has a gray intensity value P_ij ∈ {1, 2, 3, 4, 5}, as shown in Fig. 4b. Take d ((i, j) , (i^*, j^*)) = | P_ij - P_{i
^*
j
^*} | as the dissimilarity measure between pixels. The image network N = {G ; D} is depicted in Fig. 4c.

Fig.4

(a) Gray-scale image, (b) associated matrix, and (c) network N.

For this network, the maximum and minimum dissimilarity are given by $\bar{d} = 4$ and $\underline{d} = 0$ . For illustrative purposes, let K = 4 (leading to K + 2 partitions, with 2 trivial partitions and K = 4 non-trivial ones) so that $α_{0} = 5 > \bar{d} \geq α_{1} = 4 > α_{2} > α_{3} > α_{4} = 1 > \underline{d} = 0 \geq α_{5} = 0$ .

Excluding the partitions $P^{0}$ and $P^{5}$ , the K = 4 non-trivial partitions obtained through the basis partition process exposed in Section 3.3 (that is, without applying the contraction operation) are shown in Fig. 5. Notice that the number s_t of clusters $C_{i}^{t}$ obtained at each iteration t are (s₀, s₁, …, s₅) = (1, 2, 3, 8, 16, 30). Particularly, at the maximum non-trivial level of detail, i.e. after using the threshold α₄ = 1, some adjacent pixels with the same gray intensity are placed on different clusters. In fact, this undesirable situation starts to occur with the threshold α₃ = 2, as the cycle-avoidance requirement imposed at the construction of the spanning forest also avoids some linking edges to be added. As a result, many similar adjacent pixels are separated at the last iterations, rendering an unnecessarily high number of clusters.

This behavior is however easily avoided by applying the contraction operation on the initial network. The network $\underline{N}$ obtained after a contraction step with α_τ = 0 is shown in Figs. 6 and 7.

Fig.5

Iterations of a basic hierarchical region map process.

Fig.6

(a): Original network, G (V, E), where |V|=30 and |E|=49; and (b): Network $\underline{G} = (\underline{V}, \underline{E})$ by using α_τ = 0, where |V|=8 and |E|=24.

Fig.7

Course net over hierarchical region map: cut edges are red.

The successive non-trivial partitions of the improved hierarchical algorithm are shown in Fig. 7. Now, (s₀, s₁, …, s₅) = (1, 2, 3, 5, 8, 30), and the clusters obtained after expanding the contracted nodes do not separate adjacent pixels with equal gray intensity. Notice also that the algorithm has to perform a much smaller computational effort in this case, since the number of nodes and edges actually considered are reduced significantly. This shows how the contraction operation introduced in Subsection 3.4 helps improving the ID&L hierarchical partition process, clearly allowing a much better adaptation to the image-processing context and particularly to the edge-detection task.

3.6 Transforming a region map into a boundary map

Once the clusters or regions have been determined, their borders or edges have to be identified. The partitions of the set of nodes into hierarchically-arranged families of connected regions obtained in the previous steps can be delimited by the borders of these regions. And these borders can be easily identified by means of the set of divisive graph-edges of the spanning forests. The boundary-edges set are the minimal set of graph-edges which separate the regions of an image partition (see [12]). An advantage of using a connected-components partition (or segmentation) approach to edge-detection is that regions are then disconnected by the boundary graph-edges, imposing the pixel boundaries of the objects to be closed curves.

A procedure to transform the region map output of the exposed hierarchical partition algorithm into a set of boundary maps, the typical output of an edge-detection algorithm, is provided next. To this aim, it is necessary to classify the pixels within a given hierarchical region map into a pair of Black and White classes as follows:

Definition 3.1. Let N = {G = (V, E) , D} be an image network, and let P = {C₁, ..., C_t} be a partition of such network. Let also $C (P_{i}) \in P$ denote the region to which the pixel P_i belongs, with P_i ∈ Vforall i ∈ {1, …, n}. Then, the White and Black classes are defined in the following way:

Black = {P_i ∈ V ∣ C (P_i) = C (P_j) forall e_ij = (P_i, P_j) ∈ E}

White = V - Black

The previous definition provides a method to visualize each partition of a hierarchical region map through a boundary map in which a pixel is colored white if it has at least one neighbor in other region, and pixels surrounded by pixels of the same region are colored black. In this way, it is possible to represent a region map as a boundary map (white-class pixels are boundaries and black-class pixels are not-boundaries).

Nevertheless, notice that after this transformation, the object-edges or boundaries are two-pixels thick since the procedure places the pixel-border between any two regions in both regions simultaneously. This may not be an adequate way of representing object-edges. Furthermore, it complicates any comparison with other edge detection techniques, which usually produce one-pixel thick object-edges. Then, to eliminate this problem, the edges between two regions are assigned to the boundary pixels of the larger region (i.e. that with a greater number of pixels). That is, from any pair of adjacent pixels in different regions, just the one in the larger region is kept as a white, boundary pixel, the other one in the smaller region being switched to black. In this way, the final boundary map is just one-pixel thick.

This regions-to-boundaries transformation operation constitutes the last step of the ID&L algorithm here proposed. Therefore, the output of the ID&L algorithm is a family of boundary maps B^t, t = 1, ..., K, obtained by applying this last step on each partition P^t, t = 1, ..., K provided by the hierarchical partition procedure.

Remark. Although this transformation produces a family of binary boundary maps, a single soft boundary map can be obtained through the aggregation of the set of partitions (see details in [12]).

3.7 Example of the image divide and Link algorithm

Figure 8 illustrates the output provided by the ID&L algorithm. This figure depicts the boundary maps resulting from applying the ID&L algorithm on the well-known Bear image. As it can be seen, the ID&L pictures the evolution of the hierarchical region map process, representing different detail levels of an image at each step. Particularly, such a hierarchical region map process can be understood as an approximation to the different detail levels with which an image can be segmented by humans.

Fig.8

Application of the ID&L algorithm on the Bear image. (a) Original image. (b)-(h) Boundary maps resulting from the vector of thresholds α = (α₁, α₂, α₃, α₄, α₅, α₆, α₇, α₈) = (47.79, 40.55, 33.29, 28.47, 23.63, 18.79, 11.55, 4.30).

4 A boundary-based evaluation methodology for the ID&L algorithm

Edge detection techniques as the proposed ID&L are most usually evaluated through a so-called boundary-based evaluation methodology. Given a digital image I, the basis of this methodology is to compare the boundary map provided by an edge detector algorithm applied on I with a set of human-made boundary maps that act as reference or ground truth about the objects actually present in I.

We have chosen to evaluate the proposed ID&L algorithm through this methodology, instead of the alternative region-based evaluation methodology. Two are the reasons for this election. First, the evaluation of object recognition techniques by means of their boundaries is far more developed and extended than its counterpart focusing on the regions or objects themselves. In this sense, there are more available benchmark datasets, and these are more standardized and accepted, for edge-based evaluation than for region-based evaluation. Similarly, the evaluation methodologies usually applied in the edge-based approach are quite more established and finds a more extended consensus than those applied in the region-based approach [2 , 24]. And secondly, object recognition tasks are more typically identified with widely known edge-detectors such as Canny or Sobel than with segmentation procedures 1 , which have not found a so-wide reach. Consequently, these edge-detection techniques are usually more easily appreciated as providing appropriate benchmarks for comparisons than not-so-well-known segmentation techniques [18, 23].

Then, the first part of this section is devoted to expose the basics of the boundary-based evaluation methodology that will be applied in the computational experiments later in this paper, which is quite similar to the approach proposed in [26] (see the remark below in Section 4.1). However, in order to enable the evaluation of an edge-detection technique providing not just a single boundary map per image, but rather a sequence of boundary maps per image, as ID&L does, the basic framework of Section 4.1 has to be extended to the case of many-to-many comparisons between such sequence of automatic boundary maps and the set of human ground truth references. This extension is described in Section 4.2.

4.1 Basic boundary-based evaluation

In this section, we recall the basics of the boundary-based evaluation methodology in the context of edge detectors (see, for instance, [1]). The methodology for benchmarking boundary (or edge-) detection algorithms developed in [26] departs from a dataset with several different images, that can be further divided into a training and a test set. Each image in this dataset has to be accompanied by a set of human-made reference boundary maps that serve as ground truth for evaluating the automatic boundary maps that constitute the output of an edge detection technique (see [1]). Therefore, a one-to-many comparison between the automatic boundary map and the set of human maps has to be performed. This is accomplished by performing a set of one-to-one comparisons between the automatic map and each human map. The results of these one-to-one comparisons must then be aggregated to generate the overall result of the one-to-many comparison.

In more detail, the method evaluates the quality of a boundary detector by applying the precision-recall framework for comparing automatic boundaries against human ground truth. Each one-to-one comparison is based on a matching of the boundary pixels of the automatic map with those of the human map, i.e. on a one-to-one correspondence between the boundary pixels of both maps. A distance threshold δ is defined to specify the tolerance level to small boundary localization errors. Then, an unmatched automatic boundary pixel that lies closer than a distance δ from a human boundary pixel is counted as a true positive (TP). Otherwise, unmatched automatic boundary pixels are counted as false positives (FP). And unmatched human boundary pixels are counted as false negatives (FN). In this way, a confusion matrix is created, from which it is possible to calculate precision (Prec) and recall (Rec) scores for each one-to-one comparison between an automatic and a human map. These precision and recall scores are then usually averaged through a harmonic mean into an F-measure, that constitute the most accepted alternative in recent years (see [1, 9]) to evaluate the performance of each one-to-one comparison. Formally, given a candidate automatic boundary map E_cd and a ground-truth human boundary map E_gt, its comparison’s F-measure is computed as follows: $F (E_{cd}, E_{gt}) = \frac{Prec (E_{cd}, E_{gt}) \cdot Rec (E_{cd}, E_{gt})}{β Prec (E_{cd}, E_{gt}) + (1 - β) Rec (E_{cd}, E_{gt})}$ (1) where a harmonic mean is obtained for β = 0.5 and $Prec (E_{cd}, E_{gt}) = \frac{TP}{TP + FP}$ (2) $Rec (E_{cd}, E_{gt}) = \frac{TP}{TP + FN}$ (3)

Now, it is necessary to aggregate the F-measures of the different one-to-one comparisons associated to an image I into a one-to-many quality measure of the automatic map as a boundary map for the objects in such image I. This aggregated measure for image I will be denoted by F_I, and is obtained through an aggregation operator $A$ (see [5] for more information on aggregation operators). This operator $A$ rules the semantics of how do we aggregate the one-to-one comparison’s results into a one-to-many comparison’s quality measure. Usually, this aggregation operator is taken to be the maximum, but other operators may be used if a different semantic is needed in a given context. We will further discuss this point below. Hence, let $E_{gt}^{I} = {E_{gt, 1}^{I}, \dots, E_{gt, n}^{I}}$ be a set of ground truth human boundary maps of image I. The result F_I of the one-to-many comparison of a candidate automatic map E_cd with respect to $E_{gt}^{I}$ is given by:

$\begin{matrix} F_{I} (E_{cd}, E_{gt}^{I}) \\ = A (F (E_{cd}, E_{gt, 1}^{I}), \dots, F (E_{cd}, E_{gt, n}^{I})) . \end{matrix}$ (4)

Notice that F_I measures the performance of a specific parametric configuration of a given edge-detection procedure with respect to an image I. In order to obtain a global measure of the performance of such parametric configuration over all the images in the dataset, the F_I measures obtained on the different images of the set are usually averaged through an arithmetic mean. Let us denote by F^* such global measure of performance. Then, notice that an F^* measure is obtained on the training set for each parametric configuration of the method being evaluated, while only the F^* measure of that parametric configuration found to be optimal on the training set is computed on the test set. Particularly, such optimal configuration is identified as that producing the highest or maximum F^* on a training set.

Remark. This evaluation methodology just described is, in fact, very similar to the approach proposed by [26], with two relevant exceptions. First, the correspondence of the boundaries in two boundary images is not carried out using the CSA algorithm, but using instead the Estrada-Jepson validation algorithm [9], which is publicly available for download at [34]. This decision is made to avoid the non-deterministic behavior of the CSA algorithm, as well as to allow boundary pixels to be matched to more than one pixel in the counterpart image. Second, the handling of the multiple ground truth is different. In [1] and [26] the one-to-many comparison yields a combined confusion matrix, so that the information from all of the one-to-one comparisons is indelibly merged. In our approach, each one-to-one comparison is carried out independently, and the results are only merged afterwards. In such a way, this alternative is more informative and interpretable, as well as clearer in measuring the “human-likeliness” of the computer-made boundary maps.

4.2 Extension of the boundary-based evaluation methodology for sequences of boundary maps

In the previous section, we have described how to evaluate an edge detection method producing just a single automatic boundary map for each parametric configuration (for a given image). However, notice that a hierarchical procedure as the proposed ID&L algorithm instead produces a sequence of candidate boundary maps for each parametric configuration. This leads to a more complex evaluation context, in which many-to-many comparisons arise as a consequence of the many automatic boundary maps and many human boundary maps to be compared for each image and for each parametric configuration. Therefore, it is necessary to procure an adaptation of the edge-based evaluation methodology for sequences of candidate boundary maps.

To address these many-to-many comparisons arising in the context of the evaluation of a sequence of automatic boundary maps, we follow the approach introduced in [35]. The main point of this approach is to consider the set of human ground truths available for each image as a sample of the different levels of detail that may be used when delimiting objects in an image, in such a way that the sequence of automatic boundary maps is interpreted as an approximation to such sample. That is, each human reference’s level of detail is best approximated by exactly one automatic boundary map of the sequence. Therefore, these best-approximating automatic maps have to be taken as the basis for the evaluation of the sequence as a whole. In other words, by means of the approach in [35], the many-to-many comparisons problem is reduced into a kind of one-to-many comparison framework, in which just the automatic map that best matches a specific human map is used to compare the sequence with that human ground truth. Such one-to-one comparisons are carried out as described above, and their results are aggregated into a one-to-many performance measure F_I for each image, and later in a global F^* measure for each parametric configuration following the same principles.

Formally, let $E_{gt}^{I} = {E_{gt, 1}^{I}, \dots, E_{gt, n}^{I}}$ be as before the set of ground truth images available for an image I. Then, for each human boundary map $E_{gt, i}^{I}, i = 1, \dots, n$ , a one-to-one comparison is carried out with each of the automatic maps B_t obtained from the sequence B = {B₁, ..., B_K}. This produces an F-measure F_ki for each comparison. Then, the automatic map Bⁱ of the candidate sequence with the highest F-measure for each human i is selected: $B^{i} = B_{k^{*}}, where k^{*} = \arg max_{1 \leq k \leq K} F_{ti} .$ (5)

From these best-approximating automatic maps for each human, a measure of the sequence’s performance in capturing the different levels of detail provided by the sample of human ground truths is computed via an aggregation operator $A$ ruling the semantics of the one-to-many comparisons’ result. Similarly to above, it is:

$\begin{matrix} F_{I} (B, E_{gt}^{I}) \\ = A (F (B^{1}, E_{gt, 1}^{I}), \dots, F (B^{n}, E_{gt, n}^{I})) . \end{matrix}$ (6)

The evaluation procedure to obtain a global F^* measure for each parametric configuration over the training set is the same as before. And similarly, just the F^* of the optimal configuration is computed on the test set. Finally, let us a remark about the aggregation operator $A$ . As commented above, the aggregation operator $A$ rules the semantics of how the comparisons along the different human ground truths are aggregated. Three aggregation operators are used in the preliminary experimental experiences here described:

$A = max$ : perhaps the most widely used, it represents a Turing-like approach measuring how close is the automatic detector to actually look like a human.

$A = mean$ : fits the idea of measuring how well the automatic boundary map approximates on the average the sample provided by the set of human ground truths.

$A = min$ : it is more exigent and fits the idea of measuring how well an automatic map is simultaneously close to all the sample provided by the set human ground truths.

5 Computational experiments and benchmarking

In this section, we describe the computational experiments carried out for the benchmarking of the proposed ID&L algorithm. This benchmarking consists of a comparison of the results provided by the ID&L algorithm with those obtained through the well-known Canny and Sobel detectors, which constitute widely extended methodologies for object recognition, and particularly for edge-detection tasks. This comparison is carried out in the context of the Berkeley Segmentation Dataset and Benchmark 500 (BSDS500) [1, 26]. In order to do so, we have first used the BSDS500 training set (200 images which have a resolution of 481 × 321 pixels) to find out the optimal parametric configurations of each of the contending methods. Then, the optimal configurations of each method have been validated in the BSDS500 test set (another 200 images with a resolution of 481 × 321 pixels).

This section is then organized by first providing a more detailed description of the experimental setup or procedure used in these computational experiments. After that, the results obtained on the training set are described. And finally, the results obtained on the test set are presented and discussed.

5.1 Experimental procedure

The Berkeley Segmentation Dataset (BSDS500) [25] consist of 500 natural images, which are divided into a training set of 200 images, a test set of 200 images and a validation set of 100 images. Each image of BSDS500 is accompanied by a set of 4 to 7 human-made reference boundary maps. Our experimental procedure involves the comparison of five different methods, namely Sobel, Autoadapted Sobel (A-Sobel), Canny, Autoadapted Canny (A-Canny) and Image-Divide-and-Link (ID&L). Note that the autoadapted methods are self-configuring versions of their counterparts. A description of each of the methods of boundary detection, in terms of the Bezdek breakdown structure [3] and the parametric configuration used, is included in Table 1. Notice that each of such methods will be executed repeatedly on the BSDS500 training set to determine the best possible configuration of its parameters. In addition, the comparisons of the methods were performed with Gaussian filter (σ_s = 1 in Tables 1 -7) and without Gaussian filter (σ_s = 0). Indeed, ID&L algorithm performs quite well without any previous preprocessing. This makes the proposal method differ even more from other proposals.

Table 1
Schematic decomposition of the boundary detection algorithms used according to the Bezdek Breakdown Structure [3]

Method Sobel A-Sobel Canny A-Canny

1. Conditioning Gaussian Smoothing with σ_s ∈ {0, 1}

2. Feat. ext. Sobel masks Canny filter with σ_c ∈ {2, 4}

3. Blending Euclidean magnitude of gradient and Non-maxima suppresion [36]

4. Scaling Thresholding Rosin method Hysteresis Liu method

t ∈ (0, 1) [37] t₁, t₂ ∈ (0, 1) [22]

Parameter list σ_s, t σ _s σ_s, σ_c, t₁, t₂ σ_s, σ_c

Method	Sobel	A-Sobel	Canny	A-Canny
1. Conditioning	Gaussian Smoothing with σ_s ∈ {0, 1}
2. Feat. ext.	Sobel masks	Canny filter with σ_c ∈ {2, 4}
3. Blending	Euclidean magnitude of gradient and Non-maxima suppresion [36]
4. Scaling	Thresholding	Rosin method	Hysteresis	Liu method
	t ∈ (0, 1)	[37]	t₁, t₂ ∈ (0, 1)	[22]
Parameter list	σ_s, t	σ _s	σ_s, σ_c, t₁, t₂	σ_s, σ_c

Table 2

Schematic decomposition of the applied ID&L algorithm [12]

Method	Image-Divide-and-Link
1. Conditioning	Gaussian Smoothing with σ_s ∈ {0, 1}
2. Numbers of cutting edges	K = 6
3. Threshold values	$α = (α_{0}, \dots, α_{K}) \in [\underline{d}, \dots, \bar{d}]$
4. Course net parameters	$α_{τ} = 9 % \times \bar{d}; L = 3$
Parameters list	σ_s, K, α, α_τ, L

Table 3

Global results for each of the methods on the BSDS500 training set

Conditioning	Method	max	mean	min
σ_s = 0	Sobel	0.4732	0.3823	0.2588
	A-Sobel	0.3909	0.3017	0.2044
	Canny (σ_c = 2)	0.5442	0.4466	0.3028
	Canny (σ_c = 4)	0.5670	0.4726	0.3247
	A-Canny (σ_c = 2)	0.4504	0.3549	0.2411
	A-Canny (σ_c = 4)	0.5360	0.4424	0.3085
	ID&L	0.5964	0.5037	0.3554
σ_s = 1	Sobel	0.5078	0.4138	0.2816
	A-Sobel	0.4334	0.3395	0.2299
	Canny (σ_c = 2)	0.5520	0.454	0.308
	Canny (σ_c = 4)	0.5663	0.4729	0.3252
	A-Canny (σ_c = 2)	0.4711	0.3747	0.2565
	A-Canny (σ_c = 4)	0.5363	0.4448	0.3100
	ID&L	0.6107	0.5166	0.3656

Table 4

Global results for each of the methods on the BSDS500 test set

Conditioning	Method	max	mean	min
σ_s = 0	Sobel	0.4790	0.3930	0.2836
	A-Sobel	0.3990	0.3148	0.2213
	Canny (σ_c = 2)	0.5366	0.4488	0.3244
	Canny (σ_c = 4)	0.5513	0.4705	0.3474
	A-Canny (σ_c = 2)	0.4609	0.3629	0.2607
	A-Canny (σ_c = 4)	0.5359	0.4517	0.3316
	ID&L	0.5819	0.4960	0.3761
σ_s = 1	Sobel	0.5098	0.4212	0.3049
	A-Sobel	0.4455	0.3570	0.2521
	Canny (σ_c = 2)	0.5435	0.4563	0.3319
	Canny (σ_c = 4)	0.5514	0.4709	0.3466
	A-Canny (σ_c = 2)	0.4803	0.3865	0.2754
	A-Canny (σ_c = 4)	0.5366	0.4531	0.3327
	ID&L	0.5926	0.5067	0.3860

Table 5

Wilcoxon test of the comparisons of the ID&L algorithm with all the considered edge-detectors for $A = \max$ with σ_s = 0

Method	Sobel	A-Sobel	Canny (σ_c = 2)	Canny (σ_c = 4)	A-Canny (σ_c = 2)	A-Canny (σ_c = 4)
Positive ranks	14	7	46	61	22	55
Negative ranks	186	193	154	139	178	145
Z	-11.456	-12.185	-8.059	-5.448	-11.279	-6.716
p-value	<.001	<.001	<.001	<.001	<.001	<.001

Table 6

Wilcoxon test of the comparisons of the ID&L algorithm with all the considered edge-detectors for $A = mean$ with σ_s = 0

Method	Sobel	A-Sobel	Canny (σ_c = 2)	Canny (σ_c = 4)	A-Canny (σ_c = 2)	A-Canny (σ_c = 4)
Positive ranks	12	2	37	71	10	52
Negative ranks	188	198	163	129	190	148
Z	-11.729	-12.235	-9.082	-4.908	-11.914	-7.158
p-value	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001

Table 7

Wilcoxon test of the comparisons of the ID&L algorithm with all the considered edge-detectors for $A = \min$ with σ_s = 0

Method	Sobel	A-Sobel	Canny (σ_c = 2)	Canny (σ_c = 4)	A-Canny (σ_c = 2)	A-Canny (σ_c = 4)
Positive ranks	11	7	32	62	21	56
Negative ranks	189	193	168	138	179	144
Z	-11.899	-12.152	-9.864	-5.653	-11.463	-7.248
p-value	<0.001	<0.001	<0.001	<0.001	<0.001	<0.001

Table 2 describes the setting of the ID&L algorithm. Notice that the range of some parameters depends on the dissimilarity measure d as well as on the specific ranges such d takes on each image. Particularly, in this paper we have computed the dissimilarity between adjacent pixels as the Euclidean distance between their spectral information, in RGB-color converted to the CIELab color space (CIE76). Moreover, although the ID&L algorithm has four parameters (K, α, α_τ, L), in this illustrative experiment we only focus on varying the set of thresholds α = (α₀, …, α_K) that guides the region map process. The parameters K, L and α_τ are left constant along the different executions of the algorithm. Their specific values have been chosen to produce enough suitable and detailed results. Particularly, we have observed that $α_{τ} = 0.09 \times \bar{d}$ provides a good balance allowing to significantly reduce the computational effort of the algorithm while at the same time not contracting too much the network so some details can be still recognized. We have chosen K = 6 since it provides a set of levels of detail in the sequence of boundary maps similar to the number of human ground truths available for the different train and test images. Therefore, in order to compact the image, we have chosen L = 3, avoiding in this way too many details. Also, this size windows 3 × 3 is usually a standard size in edge detection.

Regarding the thresholds α, the 8 different general configurations exposed in Subsection 3.2 have been considered. As discussed there, each image possesses a different range $\underline{d}, \dots, \bar{d}$ , instead of assigning a set of specific values for the thresholds α₀, …, α_K to be used in all the images, it is rather more convenient to define a set of distributions of these thresholds in the range $\underline{d}, \dots, \bar{d}$ , which will provide the specific values for the thresholds once adapted to each image’s specific range of dissimilarities. Particularly, this approach allows the comparison between different parametric configurations of the thresholds to be consistent along the different images.

5.2 Results on the training set

Each one of the edge detection methods and the proposed ID&L algorithm has been executed on the BSDS500 training set to determine their optimal parametric configurations through the generation of a precision and recall (P&R) curve. The main advantage of using precision and recall data is that, for each algorithm, it enables to compare not only the goodness of the partitions generated by them, but also the results produced by the same algorithm using different input parameters. In order to compare the performance of the ID&L algorithm regarding the others methods in a quantitative manner, we have graphed three P&R curves based on three aggregation operators $A$ : maximum, mean and minimum. Figures 9, 10 and 11 shows the P&R curves for these three aggregation operators considered, and provides the best resulting parameter configurations obtained for each method over the training sample.

Fig.9

P&R curves for all the algorithms and best precision/recall values (bolded icons) obtained using $A = \max$ and σ_s = 0. The best resulting parameters corresponding to the bolded icons are: Sobel (t = 0.30), Canny (σ_c = 2, t₁ = 0.156, t₂ = 0.39), Canny (σ_c = 4, t₁ = 0.124, t₂ = 0.31), ID&L (K = 6, α = VII distribution, α_τ = 0.09, L = 3).

Fig.10

P&R curves for all the algorithms and best precision/recall values (bolded icons) obtained using $A = mean$ and σ_s = 0. The best resulting parameters corresponding to the bolded icons are: Sobel (t = 0.31), Canny (σ_c = 2, t₁ = 0.156, t₂ = 0.39), Canny (σ_c = 4, t₁ = 0.124, t₂ = 0.31), ID&L (K = 6, α = VII distribution, α_τ = 0.09, L = 3).

Fig.11

P&R curves for all the algorithms and best precision/recall values (bolded icons) obtained using $A = \min$ and σ_s = 0. The best resulting parameters corresponding to the bolded icons are: Sobel (t = 0.31), Canny (σ_c = 2, t₁ = 0.156, t₂ = 0.39), Canny (σ_c = 4, t₁ = 0.112, t₂ = 0.28), ID&L (K = 6, α = VII distribution, α_τ = 0.09, L = 3).

For more detail, the best precision/recall pairs, more specifically the highest F^* measures, corresponding to the best resulting parameter configuration for each of the methods, are included in Table 3. In this table, for both conditioning cases σ_s = 0 and σ_s = 1, our proposed algorithm ID&L obtains the highest maximum, mean and minimum F^* on the training set.

5.3 Results on the test set

Once the optimal configuration of the parameters has been determined for each of the methods and for each of the aggregation operators, we have executed each algorithm with their optimal parameters on the BSDS500 test set. The results gathered on such set of test images are shown in Table 4.

It is possible to summarize these results by saying that ID&L obtains a higher performance in comparison to the other edge detection methods considered. Wilcoxon tests’ results are shown in Tables 5, 6 and 7, allowing to conclude that the proposed ID&L algorithm performs significantly better than the other considered detectors on the BSDS500 dataset.

In terms of run-time, the CPU time was no more than one minute for each boundary map obtained by the ID&L algorithm. These times were measured on a 3.00 GHz Core(TM)2 Duo with 3 GB of RAM.

6 Conclusions

This paper has presented the ID&L algorithm, an innovative edge detection method based on a hierarchical graph-partition approach. Particularly, the ID&L algorithm is based on the D&L hierarchical graph clustering algorithm, developed for hierarchical community detection tasks in the context of social network analysis. In this sense, the proposed ID&L detector results from a deep adaptation of the D&L algorithm to the context of image processing, including many new features that extends the previous technique to suit it to the task of edge detection. As described in Section 3, these new features include:

the usage of just a single dissimilarity measure to configure a graph network from a digital image,

the introduction of different parameters to control the sequence of levels of detail at which object are detected,

the introduction of a coarse net step to collapse homogeneous regions and the procedure to transform region maps into boundary maps.

Due to its graph-based approach, the basic information used by ID&L algorithm is the dissimilarity degree between pairs of neighbouring nodes or pixels. This enables the ID&L algorithm to operate naturally on colour or hyperspectral digital images once an appropriate dissimilarity measure has been provided. Furthermore, due to the hierarchical nature of the underlying graph-partition methodology, the ID&L algorithm allows to simultaneously detect edges at different levels of detail, providing a sequence of boundary maps in which the detail level is hierarchically increased at each step (that is, new objects are detected inside the previously detected objects). Let us remark that, although other (non-hierarchical) edge detection procedures may provide different detail levels by varying the parameters configuration, typically the so-obtained boundary maps are not hierarchical. This feature of the proposed ID&L method tries to somehow emulate the human ability to recognize objects in a hierarchical way, attending to different scales and levels of contrast, detail, etc. To this aim, the ID&L algorithm allows the user to specify the number K of different levels of detail the algorithm has to work with, as well as the dissimilarity thresholds α = (α₁, …, α_K) that separate these different levels. This hierarchical feature of the proposed ID&L detector poses the interesting problem of providing a method to automatically select adequate values for these parameters K and α from just the initial information provided in a digital image. In this work, we have prefixed K and then tested different spatial distributions or configurations of the thresholds α₁, …, α_K along the range of observed dissimilarities between pixels of an image. Experimental results suggest that a good performance is attained when thresholds are distributed more or less uniformly, but with a greater density at the middle of the dissimilarity range (e.g. distribution VII in Fig. 2). Anyway, this bi-level parameter selection problem constitutes a main direction of future work regarding the development of the ID&L algorithm. The application of some machine learning approach to search for an optimal parameter configuration, and the development of an heuristic, one-step method based on the frequency distribution of dissimilarities between pixels are also under development.

The computational experiments conducted for the benchmarking of the proposed ID&L algorithm on the BSDS500 image dataset suggest a good performance of ID&L when compared with best-known detectors based on Canny [7] or Sobel [39]. This performance comparison between typical edge detectors (providing a single boundary map for each image) and a hierarchical method as the ID&L (providing a sequence of maps) has been made possible by extending the usual boundary-based edge detection evaluation methodology as described in Section 4. This extension allows addressing many-to-many comparisons as those arising from the matching of a sequence of candidate boundary maps and a set of human reference maps that provide the ground truth for the evaluation.

Particularly, ID&L showed a statistically significant better performance than that of all the considered methods. Moreover, this statistically significant better performance is successively obtained when the performance measures associated to the comparisons with different human references are aggregated under different schemes (conjunctive, averaging, disjunctive). This means that, for instance, for a given image the best (resp. worst) matching between a map of the ID&L sequence and one of the human reference maps tends to be better than the best (worst) matching between the map provided by the non-hierarchical detectors and any human reference. This suggests that the superior performance of the ID&L detector is robust, or at least enough solid to be maintained under different semantics of the comparisons.

Footnotes

Let us emphasize that a main difference between segmentation and edge detection is that, while edge detection identifies boundaries by pixels (i.e. graph nodes), segmentation identifies boudaries as graph edges separating nodes or pixels in different regions. In this sense, although both are intended to address the same task of object recognition, edge detection and image segmentation are intrinsically different techniques, since they produce rather different outputs [].

Acknowledgments

This research has been partially supported by the Government of Spain, grant TIN2015-66471-P, Government of Community of Madrid, grant S2013/ICE-2845 (CASI-CAM) and UCM Research Group 910149.

References

Arbeláez

, Maire

, Fowlkes

and Malik

, Contour detection and hierarchical image segmentation, IEEE Trans on Pattern Analysis an Machine Intelligence 33(5) (2011), 898–916.

Baddeley

A.J.

, An error metric for binary images, Robust computer vision 5978 (1992), 59–78.

Bezdek

J.C.

, Chandrasekhar

and Attikouzel

, A geometric approach to edge detection, IEEE Trans on Fuzzy Systems 6(1) (1998), 52–75.

Bustince

, Barrenechea

, Pagola

and Fernández

, Interval-valued fuzzy sets constructed from matrices: Application to edge detection, Fuzzy Sets and Systems 160(13) (2009), 1819–1840.

Bustince

, Montero

, Barrenechea

and Pagola

, Semiautoduality in a restricted family of aggregation operators, Fuzzy Sets and Systems 158(12) (2007), 1360–1377.

Campomanes-Álvarez

, Ibáñez

and Cordón

, Design of criteria to assess craniofacial correspondence in forensic identification based on computer vision and fuzzy integrals, Applied Soft Computing 46 (2016), 596–612.

Canny

, A computational approach to edge detection, IEEE Trans on Pattern Analysis and Machine Intelligence 8(6) (1986), 679–698.

Danková

, Hodáková

, Perfilieva

and Vajgl

, Edge detection using F-transform, 11th Int Conf on Intelligent Systems Design and Applications (2011), 672–677.

Estrada

F.J.

and Jepson

A.D.

, Benchmarking image segmentation algorithms, Int Journal of Computer Vision 85(2) (2009), 167–181.

10.

Fathy

and Siyal

M.Y.

, An image detection technique based on morphological edge detection and background differencing for real-time traffic analysis, Pattern Recognition Letters 16(12) (1995), 1321–1330.

11.

Felzenszwalb

P.F.

and Huttenlocher

D.P.

, Efficient graph-based image segmentation, Int Journal of Computer Vision 59(2) (2004), 167–181.

12.

Gómez

, Yáñez

, Guada

, Rodríguez

J.T.

, Montero

and Zarrazola

, Fuzzy image segmentation based upon hierarchical clustering, Knowledge-Based Systems 87 (2015), 26–37.

13.

Gómez

, Zarrazola

, Yáñez

and Montero

, A Divide-and-Link algorithm for hierarchical clustering in networks, Information Sciences 316 (2015), 308–328.

14.

Gondran

, Minoux

and Vajda

, Graphs and algorithms, John Wiley & Sons, Inc, 1984.

15.

Guada

, Gómez

, Rodríguez

J.T.

, Yáñez

and Montero

, Classifying image analysis techniques from their output, Int. Journal of Computational Intelligence Systems 9(supp. 1) (2016), 43–68.

16.

Guimarães

S.J.F.

, Cousty

, Kenmochi

and Najman

, An efficient hierarchical graph based image segmentation, 14th Int. Workshop on Structural and Syntactic Pattern Recognition, 2012.

17.

Hernández-Matamoros

, Bonarini

, Escamilla-Hernández

, Nakano-Miyatake

and Perez-Meana

, Facial expression recognition with automatic segmentation of face regions using a fuzzy based classification approach, Knowledge-Based Systems 110 (2016), 1–14.

18.

Hou

, Yuille

and Koch

, A meta-theory of boundary detection benchmarks, Proceedings of the NIPSWorkshop on Human Computation for Science and Computational Sustainability, 2012.

19.

Lalonde

, Beaulieu

and Gagnon

, Fast and robust optic disc detection using pyramidal decomposition and Hausdorffbased template matching, IEEE Trans. on Medical Imaging 20(11) (2001), 1193–1200.

20.

Liang

L.R.

and Looney

C.G.

, Competitive fuzzy edge detection, Applied Soft Computing 3(2) (2003), 123–137.

21.

Liu

and Haralick

R.M.

, Optimal matching problem in detection and recognition performance evaluation, Pattern Recognition 35(10) (2002), 2125–2139.

22.

Liu

, Yu

, Liu

and Li

, Bowstring-based dualthreshold computation method for adaptive Canny edge detector, 28th Int. Conf. of Image and Vision Computing New Zealand, 2013, pp. 13–18.

23.

López-Molina

, Bustince

and De Baets

, Separability criteria for the evaluation of boundary detection benchmarks, IEEE Trans on Image Processing 25(3) (2016), 1047–1055.

24.

López-Molina

and De Baets

, and Bustince

, Quantitative error measures for edge detection, Pattern Recognition 46(4) (2013), 1125–1139.

25.

Martin

, Fowlkes

, Tal

and Malik

, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, 8th Int. Conf. in Computer Vision 2 (2001), 416–423.

26.

Martin

, Fowlkes

and Malik

, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Trans on Pattern Analysis and Machine Intelligence 26(5) (2004), 530–549.

27.

Monga

, Deriche

and Rocchisani

, 3D edge detection using recursive filtering: Application to scanner images, CVGIP: Image Understanding 53(1) (1991), 76–87.

28.

Najman

, Cousty

and Perret

, Playing with Kruskal: Algorithms for morphological trees in edge-weighted graphs, Int Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing 7883, Springer, 2013, pp. 135–146.

29.

Napalkova

, Rozenblit

J.W.

, Hwang

, Hamilton

A.J.

and Suantak

, An optimal motion planning method for computerassisted surgical training, Applied Soft Computing 24 (2014), 889–899.

30.

Ouzounis

G.K.

and Soille

, Pattern spectra from partition pyramids and hierarchies, Int Symposium on Mathematical Morphology and Its Applications to Signal and Image Processing, Springer, 2011, pp. 108–119.

31.

Pal

S.K.

and King

R.A.

, On edge detection of X-ray images using fuzzy sets, IEEE Trans on Pattern Analysis and Machine Intelligence 5(1) (1983), 69–77.

32.

Perfilieva

, Hodáková

and Hurtík

, Differentiation by the F-transform and application to edge detection, Fuzzy Sets and Systems 288 (2016), 96–114.

33.

Prewitt

, Object enhancement and extraction, Picture processing and Psychopictorics 10(1) (1970), 15–19.

34.

Kermit Research Unit (Ghent University), The Kermit Image Toolkit (KITT), 2015, url: www.kermitimagetoolkit.com

35.

Rodríguez

J.T.

, Guada

, Gómez

, Yáñez

and Montero

, A methodology for hierarchical image segmentation evaluation, IPMU Conf, (Springer, ed.), Communications in Computer and Information Science 610 (2016), 635–647.

36.

Rosenfeld

and Thurston

, Edge and curve detection for visual scene analysis, IEEE Trans on Computers 100(5) (1971), 562–569.

37.

Rosin

P.L.

, Unimodal thresholding, Pattern Recognition 34(11) (2001), 2083–2096.

38.

Silva

, Frery

, Sandri

, Bustince

, Barrenechea

and Marco-Detchart

, Optical images-based edge detection in synthetic aperture radar images, Knowledge-Based Systems 87 (2015), 38–46.

39.

Sobel

, An isotropic 3×3 image gradient operator, Machine Vision for Three Dimensional Scenes (1990), 376–379.

40.

Soille

, Constrained connectivity for hierarchical image partitioning and simplification, IEEE Trans on Pattern Analysis and Machine Intelligence 30(7) (2008), 1132–1145.

41.

Sonka

, Hlavac

and Boyle

, Image processing, analysis, and machine vision, Cengage Learning, 2014.

42.

Tellaeche

, Pajares

, Burgos-Artizzu

X.P.

and Ribeiro

, A computer vision approach for weeds identification through Support Vector Machines, Applied Soft Computing 11(1) (2011), 908–915.

43.

Verma

M.R.

, Majumdar

A.K.

and Chatterjee

, Edge detection in fingerprints, Pattern Recognition 20(5) (1987), 513–523.

44.

, Carlinet

, Géraud

and Najman

, Hierarchical segmentation using tree-based shape space, IEEE Trans on Pattern Analysis and Machine Intelligence 39(3) (2017), 457–469.

45.

Zielke

, Brauckmann

and Vonseelen

, Intensity and edge-based symmetry detection with an application to carfollowing, CVGIP: Image Understanding 58(2) (1993), 177–190.