Abstract
During the last years, object-based image segmentation (OBIA) has seen a considerable increase in the image segmentation. OBIA is generally based on superpixel methods, in which the clustering-based method plays an increasingly important role. Most clustering methods for generating superpixels suffer from inaccurate classification points with inappropriate cluster centers. To solve the problem, we propose a competitive mechanism-based superpixel generation (CMSuG) method, which both accelerates convergence and promotes robustness for noise sensitivity. Then, image segmentation results will be obtained by a region adjacent graph (RAG)-based merging algorithm after constructing an RAG. However, high segmentation accuracy is customarily accompanied by expensive time-consuming costs. To improve computational efficiency, we address a parallel CMSuG algorithm, the time of which is much less than the CMSuG method. In addition, we present a parallel RAG method to decrease the expensive time-consuming cost in serial RAG construction. By leveraging parallel techniques, the running time of the whole image segmentation method decline with the time complexity from O (N) + O (K2) to O (N/K) or O (K2), in which N is the size of an input image and K is the given number of the superpixel. In the experiments, both nature image and remote sensing image segmentation results demonstrate that our CMSuG method outperforms the state-of-the-art superpixel generation methods, and then performs well for image segmentation in turn. Compared with the serial segmentation method, our parallel techniques gain more than four times acceleration in both remote sensing image dataset and nature image dataset.
Introduction
Object-based image analysis (OBIA) with image objects instead of individual pixels, has attracted considerable attention in the computer vision field [1]. Image objects are more likely to extract the information of homogeneous regions, thus reduce the burden of redundant information extracted with single pixels. OBIA is usually composed of three parts: (1) image pre-segmentation; (2) feature extraction based on objects; (3) objects classification and image merging [1]. Good superpixels with higher homogeneity are suitable for image objects.
Superpixel is generated by grouping local adjacent similar pixels into a cluster [2]. The principle of grouping pixels contains intra-region similarity and inter-region dissimilarity, which lies in the colour and other properties of each pixel. It means that the pixels are similar in the same region and dissimilar in different parts. In contrast with single-pixel, superpixel significantly improves the computational efficiency and resists the sensitivity of noise in the image segmentation field [2, 3]. Due to the above advantages, superpixel plays a key role in computer vision applications, such as image segmentation and classification [4, 5], pattern recognition [6], object detection [7], and short video processing [8].
Recently, various images are segmented with superpixel, such as remote sensing image [9, 10] and natural image [11]. The widely used methods for superpixel generation are generally classified into two kinds: graph-based and non-graph-based algorithms [2]. In these methods, the simple linear iterative clustering (SLIC) [2] method is widely applied to image segmentation due to its lower computational complexity [12, 13]. SLIC and the improved SLIC methods depend on the initial clusters strongly, which suffers from dead points or underutilization points. These pixels are assigned to a cluster that is not suitable for them. The sensitivity to original cluster centres results in the wrong categorization of noise points.
Image segmentation based on superpixels is to classify all superpixels into a set of regions with the minimum number. These regions satisfy two conditions simultaneously: maximum correlation within a region and maximum dissimilarity between regions according to some properties of superpixels such as colours. Compared with the seeded region growing and mean-shift methods [1], graph methods implicitly arrange superpixels into the mathematical structure and obtain the segmentation result with more efficient computation. In all graph methods, the minimum spanning tree (MST) is widely used by the reason of the higher computational efficiency [14]. The MST algorithm is based on the construction of a region adjacency graph (RAG), where each vertex is denoted as a superpixel and each edge links the adjacency vertexes. The weight of each edge is computed with the dissimilarity of adjacent vertexes, which is calculated by the features of adjacent superpixels [14].
In this paper, we propose a competitive mechanism-based image segmentation method to improve above all problems. To tackle the problem in superpixel generation, we introduce competitive mechanism to weaken the dependence of initial clusters in fuzzy simple linear iterative clustering (FSLIC) [15]. Inspired by [16, 17], we propose a

Examples of superpixels are shown on 100/300/500 scale by the CMSuG algorithm and parallel CMSuG algorithm, in which habor is a remote sensing image and plane is a nature image.
In some actual image applications, real-time processing is an urgent request, such as search and rescue mission, real-time target detection and so on. However, the time complexity of CMSuG algorithm and superpixel-based RAG construction are related to the size of an input image. Then the time-consuming cost is too expensive in the whole image segmentation process. In the case of limited resources, to meet the need of real-time image segmentation, we use the graphical processing units (GPU) platform to improve the above-proposed image segmentation method by the advantage of low cost, lightweight and portability [18]. Because of the multi-thread parallel processing of GPU, we propose parallel CMSuG and parallel RAG construction methods to decrease computation time by assigning each pixel to a single thread in the Compute Device Unified Architecture (CUDA) of NVidia. Figure 1(b) and (d) display superpixel generation results by parallel CMSuG algorithm.
The main contributions of this paper is as follows:
(1) We propose a CMSuG algorithm by introducing competitive mechanism, which is able to cope with the absence of non-initial clusters in FSLIC.
(2) To complete the image segmentation task, we propose a post-merging algorithm with superpixel by using the RAG-based merging method.
(3) To overcome the time-consuming in CMSuG method and RAG construction, we address the parallel CMSuG method and RAG construction method. For doing that, image segmentation has a great efficiency, which in turn satisfying the request of real-time image processing.
Superpixel is a homogeneous group that clustering similar pixels in local adjacent areas. Due to that advantage, superpixel-based instead of pixel-based methods used in image segmentation can be likely to connect contextual structures. High-efficient computation is distinct in superpixel-based image segmentation method with fewer relationship construction.
In the superpixel-based image segmentation field, it is mainly composed of two phases: (1) superpixel generation; (2) post-merging based on the feature of superpixels. Generating superpixel can be accelerated by the support of parallel computation device GPU. With the extracted feature of each superpixel, post-merging is to classify all local adjacent superpixels with similar features into a category. An intuitive method is that the relationship between adjacent superpixels can be constructed in a graph, which is used to complete post-merging.
Superpixel generation methods
Superpixel generation methods include graph-based and non-graph-based superpixel generation algorithms.
Graph-based algorithms
An image is described as an undirected graph, which considers each pixel as a vertex and connects a pair of adjacent vertices with an edge [19]. The weight of edges is usually calculated with the dissimilarity between pixel and its eight neighbors [19]. Classical graph-based method mainly contains Felzenszwalb and Huttenlocher (FH) [19], normalized cut (NC) [20] and entropy rate superpixels(ERS) [3]. In fact, FH is an algorithm of looking for a minimal spanning tree (MST) with O (N log N) complexity, where N is the total number of image pixels. NC algorithm generates superpixels by minimizing the cost function of cut between different parties, the time complexity of which is
Non-graph-based algorithms
The non-graph-based algorithm is usually used with gradient ascent and clustering-based methods. Gradient ascent starts with an initial pixel and then update clusters in the direction of gradient descent [2], which mainly incorporates mean-shift (MS) [22], watershed (WS) [23] and Turbopixel (TP) [24] methods. Both MS and WS generate irregularly shaped superpixels due to the absence of a controlling compactness factor. The complexity of the two algorithms is O (N2) and O (N log N), respectively. In addition, superpixels generated by TP have weak boundary adherence with O (N) complexity.
Another commonly used superpixel generation method adopts the clustering theory. The SLIC method was firstly proposed in 2012 [2]. It is a modified clustering method based on the k-means clustering algorithm. The performance of SLIC outperforms the above two categories in compactness controlling, and it is faster than them with O (N) complexity. However, the algorithm is short of robustness [15]. Subsequently, to improve the performance of SLIC, certain clustering algorithms are proposed based on SLIC. Linear spectral clustering (LSC) [25] has achieved both boundary adherence and global image properties by combining NC formulation and computational similarity of SLIC, but without robustness in ambiguous pixels. Simple non-iterative clustering (SNIC) [26] updates the cluster centers online with lower memory requests, but the superpixel is irregular. All above superpixel generation algorithms based on SLIC falls into weak regularity and wrong pixel allocation with the initial pre-set cluster, the time complexity of which is O (N). Wu et al. [15] developed FSLIC by adopting multiple classifications for each pixel with fuzzy membership. The complexity of FSLIC is O (11N). Compared with the above SLIC, LSC, SNIC algorithms, FSLIC achieves a high-performance improvement. However, FSLIC segmentation still has weak robustness for blurry and noise images without considering pixels that do not belong to existing clusters.
Parallel superpixel generation method
The growing of GPU platform is a parallel computing architecture [27], by which the running time of image processing reduces sharply [28]. High-speed superpixel generation methods based on GPU improve the expensive time cost of previous superpixel generation. A modified MS algorithm was proposed under the CUDA framework [29], which worked more than 20 times faster than an improved MS algorithm based on CPU. The parallel TP algorithm [30] is a parallel level-set evolution process via superpixel expansion in each thread. It has improved 15 times faster than the TP algorithm. Carl et al. [31] introduced a parallel SLIC method based on GPU by assigning a pixel into a thread, which was 83 times faster than SLIC. Similar to the assignment of pixels in parallel SLIC, a modified LSC based on GPU has obtained impressive results [32], which has achieved 83 times faster than LSC. However, the lower time cost is accompanied by lower segmentation properties. Good performance of improved algorithms based on GPU is up to the basic algorithms on CPU.
Post-merging method based on graph
The superpixel-based merging method is performed well on a graph constructed on pre-segmentation [33]. Graph model includes RAG [19] and nearest neighbor graph (NNG) [34]. Although NNG accelerates the merging procedure by searching for the best neighbor, NNG is built with the dependence of RAG [35]. It is time-consuming for searching NNG in each iterative merging step. RAG Construction is to extract road [14, 36] based on superpixels generated by the superpixel generation method. In the RAG-based merging algorithm, it starts with a vertex, then merge it and a corresponding adjacent vertex until not satisfy the merging condition. However, RAG construction relies on travelling all pixels in each superpixel, which takes up time.
Methodology
Our competitive mechanism-based superpixel generation method for image segmentation contains the following steps: (1) CMSuG method. (2) Post-merging with superpixels. This procedure, two phrases included, is mainly about RAG construction with superpixels and RAG-based merging. The whole processing chain of the proposed image segmentation is illustrated in Fig. 2.

Flowchart of the proposed image segmentation method.
In clustering-based superpixel generation methods, initial clusters limit pixel classification. If a pixel belongs to an external cluster, we call that the pixel is assigned to the competitor of existing clusters. By introducing competitive mechanism in FSLIC, we propose a CMSuG method.
Superpixels are image blocks that are composed of adjacent pixels with similar features. So, each cluster is expected to be updated along the direction that contains the largest number of similar pixels. At the same time, the objective of the expectation is also to suppress the growth of clusters in a direction that deviates from the expected direction, which is competitors. In the CMSuG algorithm, not only each initial cluster is modified with a certain learning rate but also the competitors of which are learning with a smaller factor. The CMSuG algorithm is expressed as follows:
(1) Colour space conversion. The input image data is usually an RGB image, which is converted to CIELab colour space [l
i
, a
i
, b
i
]
T
. Each image point is represented by a five-dimensional vector composed of the color vector and XY coordinates [x
i
, y
i
]
T
, which is denoted as
(2) Cluster centroids initialization. Given superpixel number K, the size of each superpixel can be computed by N/K, where N is the total number of image pixels. Initial clustering centers are sampled at regular grid steps
(3) Multi-cluster associations foundation. For each pixel i in a 2S × 2S searching scope around the center
(4) Fuzzy membership matrix updating with competitive mechanism. The objective of updating clusters with competitive mechanism is to ensure each maximum cluster with minimum competitors. For the first, it is feasible to minimize the distances between all pixels and the corresponding cluster centers. Motivated by the theory of competitive learning [16], the second target is to suppress the growth of competitors by introducing novel membership constraint function
Let objective optimization problem (2) convert to an unconstrained optimization problem by introducing Lagrange multipliers, which describes as follows,
Then, the membership matrix U can be computed by following equation,
Neighboring pixels usually belong to the same cluster. To improve the robustness of pixels classification, the degree of membership u
ij
companies with the spatial information of 3 × 3 neighbors of each pixel [15]. Thus, an improving fuzzy membership matrix in the CMSuG method is computed with
(5) Cluster centers updating with new CIELab and coordinate values. Each cluster center is updated by the weighted linear combination of pixel set in which the ith pixel belongs to the jth cluster, as shown in Equation (5),
(6) Labels assignment and enforce connectivity. Each pixel is assigned to the cluster with the greatest fuzzy partition value. There may be some isolated pixels after labels assignment, these pixels are assigned to the nearest cluster center.
The detailed process of the CMSuG algorithm is shown in Algorithm1
Algorithm1

An example of RAG construction based on superpixels.
Superpixel-based image segmentation is depicted mathematically as follows [37]: I is an image with superpixel sets {R
i
}, final segment sets of I are {S
k
},
RAG = (V, E) can express adjacent relationships between segments obtained by the CMSuG method. Each vertex v i ∈ V in RAG represents an object, each edge e ij ∈ E connects a pair of adjacency vertex v i , v j , where i ≠ j. The weight w ij of edge e ij is the dissimilarity of adjacent superpixels. With the relationship built in an RAG, post-merging can be done to gain the final segmentation results.
RAG construction
An example of RAG construction with superpixels generated with CMSuG method is shown in Fig. 3, which vividly express the relationship between superpixel and RAG. As seen in Fig. 3, a graph constructed with superpixels mainly contains two phrases, building edges between adjacency superpixels and computing the weight of each edge. Each edge reflects the adjacent relationship between two adjacent superpixels, the weight of each edge is the difference between the two neighborhoods.

An example of RAG construction based on superpixels.
Weights of edges measure the difference of two adjacent superpixels, which depend on the features of all superpixels. Color features of images are likely to catch dissimilarity for its generalization in all fields. Three color attributes of superpixels are computed by the mean of all pixels’ attributions in a superpixel. Let S
i
(r
i
, g
i
, b
i
) represent the feature vector of the region i, which is calculated by the following formulas:
We set 3-dimension matrix
Algorithm 2

An example of RAG construction based on superpixels.
RAG-based merging algorithm is to obtain the segmentation result by some merging criterion. It means that two segments are merged into a new component by the way of measuring the difference of the two and a given criterion. The final segmentation result requires heterogeneous enough between the two adjacent segments, so do merging operation for two components
In the criterion, there is at least one vertex in
Initial component
Complexity analysis
As the fuzzy membership degree is updated with 8-neighbors for each pixel, and the fuzzy membership matrix is three dimensions, the computational time of the CMSuG algorithm is O (11N) + O (z), where N is the total number of image pixels, z is the initial cluster number. Actually, CMSuG algorithm needs few times iterations to reach termination condition. Without considering the maximum iteration number, the CMSuG algorithm is O (N) on account of z ⪡ N.
Serial RAG construction requires sequentially travelling all pixels and all superpixels to extract features, the time complexity of which is O (N) + O (K). The time complexity of adjacent matrix extraction is O (N * 8), which is because each pixel has 8-neighbors. At last, the time complexity of RAG output requests O (K2). It means that all superpixels need to travel. Then, the time complexity of serial RAG construction is O (N) + O (K2).
RAG-based merging algorithm travels all edges with O (K) by traversing all vertex pairs, the total number of which is at most 8 * K. The time complexity of the sorting operation is at least with O (|E| log(|E|)), where |E| is the total number of vertex pairs. Thus, the complexity of RAG-based algorithm with RAG is O (K log K) at most.
In the whole image segmentation method by using CMSuG method, the time complexity is O (N) + O (K2).
Parallel optimization
Because the time complexity of the whole image segmentation is related to an input image size N and the superpixel number K, the time-consuming cost is too expensive in image segmentation tasks. It is unbeneficial for real-time image processing. In the whole segmentation method, the CMSuG method and serial RAG construction algorithm are the processes that need much time.
Traveling all pixels in CMSuG and RAG construction can be modified with parallel thread on GPU. With the support of the parallel techniques, they are done with a thread per pixel.
Parallel CMSuG method
Each step, which is in the CMSuG method, runs in a way that processes all pixels sequentially. The expensive time cost is adverse for the application of real-time image processing. The troublesome computation problem becomes easier to move the image to GPU by assigning parallel threads. For the acceleration advantage of parallel threads of GPU, we propose a parallel CMSuG algorithm to modify the CMSuG algorithm. In the whole algorithm expected updating cluster centers, we set the thread number to be 16 * 16 in a block, so the block number in a grid is ⌈ (h I /16)⌉ * ⌈ (w I /16) ⌉, where h I is the height and w I is the width of an input image I.
For clearly depicted, the main parallel procedures on GPU platform are as follows:
(1) Image space conversion. Transfer I from CPU to GPU with CUDA function cudaMemcpyAsync (). Convert image to CIELab color space by using one thread per pixel, and store the result into I T .
(2) Cluster centers initialization. Use one thread for each cluster center to initialize cluster centers. For convenience, the block dimension is the same as image space conversion. Each cluster center contains center coordinates, total membership degree values of associated pixels, and color information. As seen that, the number of cluster centers is superpixel numbers in CMSuG method. Thus, a new map representing all cluster centers is created as M
C
, which size is h
C
* w
C
, where h
C
=⌈ h
I
/S ⌉ and w
C
=⌈ w
I
/S ⌉ are the total number of clusters in horizontal and vertical direction, and
(3) Finding multi-cluster associations. For each pixel in I T , compute the distance from the current pixel to its closest nine clusters instead of using search space in the CMSuG method. The information of each cluster is storied in matrix M C . The reason for doing above operations is that each pixel is visited by its nearest 3 × 3 cluster centers at most. Find the first t* minimum distance values for each pixel and storage them in matrix D while calculating the distance. At the same time, save these clusters’ labels that are the index of clusters into matrix G. The implication of D and G are the same as the CMSuG algorithm, the size of which is h I * w I * t*. After that, set a small enough value for D to avoid being zero. Do them operations on one thread per pixel, and the block size is the same as the step (2). The block size also applies to the following step (4) and step (5).
(4) Updating membership matrix with competitive mechanism. Update t*-dimension membership matrix U in dimensional order. Calculate H by combining each pixel with its 8-neighborhood. Update matrix with H. All above matrix is updated with one thread per pixel. The same definitions of U, H, are shown in the CMSuG method.
(5) Calculating membership matrix-based CIELab images and coordinate matrices. Computing them by combining original CIELab I
T
and a membership matrix (. In the calculated operation, each pixel is calculated with a thread, and new CIELab images are stored in
(6) Updating centers
Secondly, the values that are storied temporarily in
(7) Selection of optimal cluster association. Select the maximum membership degree value from multiple values in and then put the value in G l into final label matrix L by using one thread per pixel. The size of L is h I * w I * 1. The blocks in this this process and the following two steps are the same as that in step (2).
(8) Enforce connectivity. Eliminate isolated pixels with one thread per pixel to prompt the connectivity of each superpixel. If the label of each pixel and its surrounding pixels are all different, change it as its neighborhood. At last, transfer label L from GPU to CPU by CUDA function cudaMemcpy ().
Parallel RAG construction
Two steps in RAG construction, feature extraction, and adjacent matrix extraction, require traveling all pixels, which is time-consuming in the whole image segmentation. So, we propose a parallel RAG construction method with the advantage of GPU to improve the two processes. Our parallel RAG construction method consists parallel feature extraction and parallel adjacent matrix extraction based superpixels. The flowchart of parallel RAG construction is shown in Fig. 4.

Flowchart of parallel RAG construction.
First, transfer I and L from CPU to GPU with CUDA function cudaMemcpyAsync (), where I is the initial image and L is obtained with parallel CMSuG method. In the whole process, the number of threads in each block is 16 * 16.
One step is computing adjacent matrix, adjacent edges are found from each pixel and its 4-neighborhood with a thread per pixel. If the label of current pixel is different with its all neighbors, the value of current position in adjacent matrix is set as 1. Otherwise, it is set as 0. The block number is ⌈ (h I /16)* (w I /16) ⌉ in this step, where h I , w I , I is the same as parallel CMSuG method. And the adjacent edges are storied in matrix A. For convenience, let the size of matrix A be K * K with the pre-set number K of superpixel.
The other step is extracting color features of each superpixel by by using a similar operation as updating cluster centers in parallel CMSuG algorithm. The purpose of parallel feature extraction is to compute features of all vertexes by assigning a superpixel to only one thread. The description is shown vividly in Fig. 5. Construct a new map M
S
to storage superpixel information, the map contains color information

An example of parallel feature extraction process based on superpixels.
At last, transfer adjacent matrix A and feature matrix of superpixels M R * from GPU to CPU by CUDA function cudaMemcpy ().
Although, data transform between CPU and GPU is not be ignored in parallel computing. This process is improving with the optimizing CUDA function continuously. Without considering transform tasks, the time complexity of the parallel processing is analysed.
In the parallel CMSuG algorithm, recurrence in updating centers except for computing fuzzy partition matrix. The time complexity of parallel reduction is O (⌈9 * (N/K)/162 ⌉ * t*) with the need for three maps being reduced, where K is the pre-set number of superpixels, t* is the number of multi-clusters. With visiting the 8-neighborhood in computing fuzzy partition membership, so the time complexity of the parallel CMSuG algorithm is O (N/K) by the constant t*.
The adjacent matrix extracted only need four times cycle, and the time complexity of feature extraction is O (⌈9 * (N/K)/162 ⌉) in parallel RAG construction. However, the RAG construction travels all elements in an adjacent matrix with size K * K. Compared with series RAG construction, the time complexity of parallel RAG construction is reduced from O (N) + O (K2) to O (K2).
By using the parallel technique in the whole image segmentation method, the time complexity is either O (N/K) or O (K2), which depends on the value of
Experiments
Experiment settings
Research in images with undisclosed data is becoming more and more popular, the universality of the algorithm is particularly important. For that, we used the remote sensing data set (RSD) of the sea surface from Google Earth [39], which includes 50 images with 500 × 312 size. The data set are produced by our group, and the ground truth is made by three professionals. Then, we selected 50 nature images from BSD500 as the verified data sets, the size is 321 × 481.
There are certain parameters in our image segmentation method, the color initialization factor is 10 according to experience in previous papers [2]. Let fuzzy partition matrix exponent m be 2 by the introduction in FSLIC [15]. Most pixels do not have more than 3 labels that have been proved in FSLIC, so we adopted the maximum cluster number t* = 3. Let ξ be 16 in our experiments motivated by some experiments in previous papers [31]. Similar to FSLIC in [15], we set the maximum iterative number N* = 10. We selected parameter α as 0.9 according to experience and experiments.
All algorithms are programmed by Visual Studio 2019 with OpenCv 4.5 library and tested on a Windows 10 64-bit with Intel CPU Xeon E5-2689 at 2.6GHz, an NVIDIA GeForce GTX 1070Ti 8GB and 16GB RAM.
Benchmark metrics
Each step in our whole method, which contains superpixel generation, RAG construction, and graph-based image merging, influences the performance of final segmentation results. Thus, we compared the performance of these procedures individually. We evaluated the state-of-the-art superpixel generation algorithms FSLIC [15] and our CMSuG method, parallel FSLIC designed by us and our parallel CMSuG method by: boundary recall (BR) [40], boundary precision (BP) [26], Under-segmentation error (UE) [2, 26] and Compactness (CO) [41]. To test the segmentation results with a graph-based merging algorithm, we computed the BR, BP, and UE to depict the influence of different superpixel generation methods. The running time of these algorithms, which consist of superpixel generation, RAG construction, RAG-based merging algorithms, are displayed.
Boundary recall
Boundary adherence of superpixel is the most property of measuring segmentation methods [2]. Boundary recall(BR) measures the amount of boundary pixels coverage on superpixels and ground truth. A high BR indicates that a very high correct boundary hit rate can be shown. Usually, minor errors are allowed between superpixels and ground truth. Then, BR is calculated by:
BR only considers the true positive boundary rate for superpixels, while boundary precision (BP) adds the performance of false-positive ratio for judging accuracy. Poor BP score is possibly accompanied by extremely high BR. BP is a criterion for judging the positive superpixel boundaries in all superpixel boundaries, then we calculate the BP by
Under-segmentation error was proposed to evaluate the overlap error that computing the leakage pixel from superpixels across the boundary of ground truth [2]. Recent state-of-the-art works tend to use improved UE in [42], which is proposes to overcome both sides penalizes happening in a superpixel crosses a ground truth boundary. As below, the formula of UE is
Compactness (CO) of shape is the important index for superpixel generation, it is considered to produce the superpixel with regular shape and smooth boundaries. It is likely to capture information from compact superpixel boundaries while not easily extract many and diverse boundary information with non-compact superpixels. It is denoted as:
To evaluate the influence of different superpixel generation methods in image segmentation results, we have done some experiments in two images from different data sets. We present a visual comparison of the two segmentation results in Figs. 7. As shown in the two figures, the segmentation results with FSLIC, CMSuG, parallel FSLIC, and parallel CMSuG superpixel generation methods are displayed clearly. We selected 200 as the pre-segmentation scale in the two example images from RSD and BSD. In order to avoid the threshold is too large and cause under-segmentation, let τ = 40. The performance of these segmentation results is depicted in Tables 1 and 2. BR, BP, UE, and CO measure the boundary recall, boundary precision, under-segmentation error, and compactness of all superpixels generated by different superpixel generation methods. Correspondingly, BR M , BP M , and UE M are boundary recall, boundary precision and under-segmentation error for testing image segmentation results after running graph-based merging algorithm. Running time reflects the efficiency of each algorithm. We set TIME S , TIME R , TIME PR , and TIME M as the time of running superpixel generation algorithm, RAG construction method, parallel RAG construction method, and post-merging method respectively. In the table, all values with bold font indicate the best performance of all algorithms.

The first line images are original RSD image and superpixels with FSLIC, CMSuG, Paralle FSLIC, Parallel CMSuG on 200 scale. The second line are ground truth and segmentation results with merging threshold τ = 40.

The first line images are original BSD image and superpixels with FSLIC, CMSuG, Parallel FSLIC, Parallel CMSuG on 300 scale. The second line are ground truth and segmentation results with merging threshold τ = 40.
Performance metrics of superpixel segmentation algorithms at K=200 and merging threshold with τ = 40 on an RSD image
Table 1 depicts the performance of different image segmentation methods for an RSD image. Obviously, all performance expect running time of our image segmentation with CMSuG method is better than that of image segmentation with FSLIC method. Superpixels generated with our parallel CMSuG method perform best in BR, BP, and UE, while the best CO value is obtained by using our CMSuG method. As demonstrated in Table 1, superpixels generated by CMSuG algorithm exceed that by FSLIC. And we can see the same phenomenon that superpixels generated by parallel CMSuG and parallel FSLIC algorithms. That is to say, our superpixel generation method with competitive mechanism performs better than that without competitive mechanism method both in CPU and GPU platforms.
After merging an image with superpixels generated by the above algorithms, the BR M and UE M of parallel CMGuS method outperforms the other methods. The results in the better BR M , BP M , and UE M value for CMSuG when comparing with FSLIC. Highest BP M value of segmentation result with CMSuG method demonstrates that positive boundary is the highest. This value of parallel CMSuG method is also better than that of parallel FSLIC method.
We can see that superpixels generated by parallel CMSuG method are done with the highest efficiency. And an obvious phenomenon is that the parallel superpixel generation method is faster than the ordinary method by about one hundred times. Running time of parallel RAG construction method is far less than RAG construction on CPU platform, in which CMSuG method is than FSLIC method whether to take parallel idea. However, the merging running time of our CMSuG method is the highest while FSLIC is the lowest. In general, we can see that the performance of our image segmentation method with the CMSuG method is better than that without competitive mechanism in both CPU and GPU platforms.
Analysis of BSD
In Table 2, boundary recall and compactness of superpixels produced by CMSuG method outperform other superpixel generation methods. And under segmentation error of CMSuG method is lower than FSLIC algorithm with 0.002. Compared with the parallel FSLIC method in CO performance with 1.0442, superpixels with a higher compactness value of 1.0615 are obtained by the parallel CMSuG method. Although, the other performance of parallel CMSuG method is slightly lower than parallel FSLIC method, the time cost is the lowest. As illustrated Fig. 7, the segmentation result accompanied with obvious wrong segments happens in image segmentation with FSLIC method by the large area of fuselage gone. Thus, all performance gained by FSLIC method is the worst. CMSuG performs best in boundary precision and under segmentation error. The value of BR M is the highest of parallel FSLIC method, and the same measuring value along with the other two values of parallel CMSuG is slightly lower than it. We can see that the RAG-based image segmentation method with CMSuG algorithm obtains high-performance segmentation results in lower running time cost than FSLIC. Compared to image segmentation with parallel FSLIC, the running time of parallel CMSuG algorithm is slightly less than parallel FSLIC. Generally speaking, the image segmentation method by using competitive mechanism exceeds the non-competitive mechanism image segmentation method in CPU platform. In the GPU platform, the two parallel methods are about the same with a tiny difference. Our parallel RAG construction method is faster than the previous serial method by 700 times, which is beneficial for real-time image segmentation.
Performance metrics of superpixel segmentation algorithms at K=300 and merging threshold with τ = 40 on an BSD image
Performance metrics of superpixel segmentation algorithms at K=300 and merging threshold with τ = 40 on an BSD image
In the above two examples, the segmentation results with competitive mechanism achieve significantly superior results. Our CMSuG method generates superpixels by suppressing the growth of competitors. This technique helps all clusters group into robust associations rapidly. When the iterative number increases, two periods from all convergence processes are plotted. With a big change of error values, a picture does not show us overall changes significantly. Figure 8 shows the contrast of error values of the two superpixel generation techniques both on the remote sensing image of Fig. 6 and nature image of Fig. 7. We can see that errors decline with a far-fast time by the values of CMSuG being below that of FSLIC. Our CMSuG method reaches convergence with lower errors at the iterative time 10. For a remote sensing image, the error value is 25.71 obtained by CMSuG method while that is 48.17 by FSLIC. Error value is 69.31 gained with the CMSuG method while the other is 71.85 at the 10th iteration on an image from BSD.
So, the competitive mechanism technique is beneficial for fastening convergence in the superpixel generation method.
Quantitative analysis
Our image segmentation method consists of two steps: superpixel generation and post-merging with superpixels. The comparison of superpixel-based segmentation results is presented as certain line charts lie in Figs. 9 to 14 . Similar to qualitative analysis, we use P-CMSuG and P-FSLIC as simplified forms of parallel CMSuG and parallel FSLIC in all pictures.

Superpixels performance comparison of all methods in RSD.

Superpixel performance comparison of all methods in BSD500.

Time comparison of RAG construction based on superpixels generated with different superpixel generation methods, in which the graph in the left is obtained on RSD and the right is on BSD.

The post-merging results analysis with different superpixels in RSD.

The post-merging results analysis with different superpixels in BSD.

The whole time analysis with serial and parallel techniques in both RSD and BSD.
In this part, we analyzed 50 images from RSD with the above-mentioned metrics. The comparison of superpixels is presented as several line charts lie in Fig. 9. To verify the generalization of our algorithms, we selected 50 images from BSD500 and the performance is shown in Fig. 10.
It can be seen from Figs. 9 and 10 that the BR value of parallel CMSuG method is the highest in two dataset, and that of CMSuG is constantly higher than FSLIC in the whole sight. That is to say, superpixels generated by parallel CMSuG algorithm hit the boundary of ground truth with the maximum probability in the two data sets. And CMSuG also gains great probability of hitting the truth boundary by comparing the superpixel generation method without competitive mechanism. The same phenomenon is found in BP values, which is that BP values of parallel CMSuG is the highest and that of CMSuG is higher than FSLIC. The higher the BP values are, the more the positive boundaries. However, When the superpixel number grows to 700, BR and BP values are the lowest in RSD abnormally. An obvious conclusion is that competitive mechanism-based superpixel generation method obtains superpixels with perfect hitting possibility of boundaries and large number of positive boundary.
There is no significant difference of UE value between parallel CMSuG and parallel FSLIC. Under segmentation errors of parallel CMSuG is lower than CMSuG method in RSD while the opposite situation occurs in BSD. Whatever, FSLIC method obviously falls into a large under segmentation error with higher UE values. This indicates that competitive mechanism helps to achieve better superpixels.
The values of metric CO of the proposed method CMSuG are the highest, which shows that the CMSuG method is better than the FSLIC method for both remote sensing images and nature images. We can also see that CO values obtained by parallel CMSuG method is higher than parallel FSLIC algorithm on RSD. The higher the CO value, the better the superpixels. It means that, competitive mechanism-based superpixel generation algorithm obtains much more regular and compact superpixels than that without competitive mechanism.
Because of the computation of finding the biggest competitor, the CMSuG algorithm need a more expensive time cost than FSLIC algorithm. While in the parallel CMSuG algorithm, it is improved by GPU with parallel threads. Using parallel technique for CMSuG method achieves 55 times faster than serial CMSuG method in the two dataset. Coincidentally, compared with FSLIC, parallel FSLIC algorithm also gets speed up 50 times. And the time cost of the parallel CMSuG superpixel generation method is shorter than parallel FSLIC. These results are plotted in Figs. 9(e) and 10(e), in which line charts in the small box are enlarged displayed the running time of parallel algorithms. The above results demonstrated that our competitive mechanism-based superpixel generation algorithm in generating superpixels exceeds the-state-of-art algorithm FSLIC. For all the above analysis of superpixel generation methods, a notable conclusion is that our parallel CMSuG method could play an important role in real-time image processing.
Analysis of post-merging method
Our post-merging method consists of RAG construction and RAG-based merging with superpixels. The parallel RAG construction method is designed for decreasing time cost in the whole image segmentation. We make a comparison for the running time of serial and parallel RAG construction in Fig. 11. Each line represents the RAG construction time after obtaining superpixels in the two subgraphs. Serial construction is displayed at the top and parallel construction at the bottom. Superpixels generated with the same generation method are diagrammed with the same symbols.
It can be seen that parallel RAG construction time is significantly less than the serial algorithm by the maximum value of the ordinate. Time cost by running serial RAG construction method increases with the growth of superpixel numbers, while the time cost line oscillates at the same level by the parallel method. Parallel RAG construction method based on superpixels generated with parallel superpixel generation method takes 0.2 milliseconds more. Parallel RAG construction is faster than the serial method with at least 161 and 121 times on RSD and BSD respectively. No matter what scale for the two datasets, parallel RAG construction gains high-efficient improvement. So, in real-time image processing field, our parallel RAG construction can make image segmentation tasks more efficiently be completed.
For analysing the performance of the RAG-based merging method, we selected threshold τ = 20 in the following experiments. Post-merging results construction of RSD and BSD is manifested in Figs. 12 and 13. In the two displayed results, we can see that the BP value of all merging results based on parallel CMSuG algorithm is the best. That value obtained with CMSuG method, meantime, is higher than that with FSLIC except for an outlier at size of 700 in RSD. High boundary precious of segments based on CMSuG method catches our sights with its highest values in BSD. However, these values after parallel CMSuG method are at the lowest in BSD. In RSD, BP M of merging results with CMSuG method is higher than that of FSLIC except an outlier happens in 700. The results shown in 700 demonstrate that the segmentation results are related to the performance of superpixels. Under segmentation error of all segments after running merging algorithm with superpixels generated by using CMSuG method achieves the greatest success on both RSD and BSD data, which are shown in Figs. 12 and 13 clearly.
The running time of the graph-based merging algorithm is related to the number of superpixels, the more number the higher the time cost. The experiments show that merging time-based on all superpixel generation methods has no obvious difference. Actually, a large number of superpixels is unnecessary for image segmentation, which has great performance when set the superpixel number less than 400.
Analysis of whole segmentation method
We compare the segmentation results by our whole segmentation method and other three pixel-based segmentation methods, which contain MS [22] and FH [19] methods. For simplicity, we use CMIS as an image segmentation method with CMSuG and serial RAG construction. Similarly, the proposed image segmentation method with parallel CMSuG and parallel RAG techniques is simplified as P-CMIS. Tables 3 and 4 summary the performance of all segmentation methods, in which the BR*, BP*, UE* and TIME* represent boundary recall, boundary precision, under-segmentation error and running time with the whole segmentation methods respectively.
Segmentation results comparison on RSD set
Segmentation results comparison on RSD set
Segmentation results comparison on BSD set
Although the BR value obtained by MS algorithm is higher, the BP values are the lowest and the computation time is the highest in the two dataset. Our CMIS method can achive the best BP values, and the time is lower than that of MS and FH. As shown in the two tables, we can see that the values of the metrics TIME* of the proposed parallel method is the best. It demonstrates that our P-CMIS algorithm can quickly obtain the segmentation results. In addition, our CMIS method is faster than the MS and FH methods. In a word, both of our CMIS and P-CMIS methods have their own significant advantages. Our segmentation algorithm, not only parallel but serial methods, outperforms than the other two methods with its own advantages.
The running time of our two segmentation method has a huge difference. For that, we compared the total time of our competitive mechanism-based image segmentation method with parallel technique and without it. Figure 14 shows the total time of two image segmentation methods and its corresponding time assignment of each step running on two datasets with superpixel number 300.
A fact that can be seen in the figure is that the total time of CMIS is four times of P-CMIS when run on two different datasets. In the whole improved image segmentation, parallel technique is mainly used to generate superpixels and build RAG, and because of this, we achieve the high-efficient image segmentation method.
There are two important parameters in our CMIS method, controlling factor α and merging threshold τ. Parameter α is used to control the growth of rivals, which impacts on generating superpixels. Threshold τ controls the scale of all segments, which in turn affects the segmentation results. We analyze the changes of superpixel and image segments with the two parameters respectively.
Analysis of α
For CMSuG method, 0.1/0.3/0.5/0.7/0.9 were selected to product superpixels. We analyze the influence of α on the generation of superpixels based on CMSuG and parallel CMSuG method in two dataset RSD and BSD, which are shown in Figs. 15 16.

The performance comparisons of CMSuG and parallel CMSuG methods with different parameter α in RSD.

The performance comparisons of CMSuG and parallel CMSuG methods with different parameter α in BSD.

The post-merging results analysis with 200 superpixels in RSD is shown in first line. And the second line is the results with 400 superpixels.

The post-merging results analysis with 100 superpixels in BSD is shown in first line. And the second line is the results with 300 superpixels.
CMSuG algorithm fluctuates irregularly in boundary performance for RSD when α = 0.1 and α = 0.5. In BSD, the BR values increase with the development of α, the best performance is gained with α = 0.9. Good performance of BR values obtained by parallel CMSuG method are illustrated in both RSD and BSD when α = 0.9. As can be seen in Fig. 16, the highest BP value is also obtained with α = 0.9 by CMSuG method on BSD dataset. In Fig. 15, expect the outliers α = 0.1/0.5, CMSuG performs well with α = 0.9. Both superpixels of RSD and BSD, which are generated by parallel CMSuG method, have no difference in charts of BP values with different α.
Different UE values obtained by the CMSuG method are shown in Fig. 16. We can see that under segmentation errors are the lowest when α = 0.9. There is no obvious difference in all UE values generated by parallel CMSuG method whether in RSD or BSD. With α = 0.9, superpixels generated with CMSuG method are the most compact and regular.
As the parameter increases, the degree of restraint on competitors strengthens. Thus, superior superpixels happen to competitors with a low probability of becoming a cluster.
In order to evaluate the image segmentation results by different merging threshold τ selected as 20/40/60/80/100/120, we used the fixed variable of superpixel number as 200/400 in RSD and 100/300 in BSD.
After generating superpixels, image merging results are analysed in Figs. 17 18 . With increasing merging threshold, the performance of boundary recall values decreases on both RSD and BSD. BP value chart shows an upward trend followed by a downward trend. When merging threshold grows, under segmentation error increases significantly. Although, running time of merging step in CMIS increases with the growth of threshold τ. For merging time cost, the influence of threshold is less than superpixel number.
And an interesting phenomenon can be seen, merging operation with CMSuG superpixel generation method is always outstanding than FSLIC. Our experiments explain that the well-behaved merging result is CMSuG even with small numbers of objects. But selecting threshold with an adaptive method is necessary, which means that the same merging threshold is unsuitable for all number of superpixels.
Conclusion
Based on the FSLIC superpixel generation algorithm, we proposed a more robust superpixel generation method CMSuG by introducing competitive mechanism, which converges at a faster rate. The excellent performance of our method has been displayed by certain experiments compared to the state-of-the-art superpixel generation method in remote sensing dataset and nature dataset. However, good superpixels depend on superpixel scale, too fine pre-segmentation has a high number of superpixels. It will be more effective if a pending image has a suitable superpixel number by an adaptive method. Subsequently, outperforming segmentation results can be obtained by using a proposed superpixel-based post-merging algorithm with great superpixels. However, the merging result is affected by the threshold. Thus, adaptive threshold selection method could be of future interest.
To improve the speed of our segmentation method, a high-speed superpixel generation method, parallel CMSuG, was proposed based on GPU. And the superiority of its improvement greatly reduced the running time of CMSuG. Experiments in this paper showed the superiority of parallel CMSuG compared with parallel FSLIC. In addition, by using parallel techniques, a parallel RAG construction method was proposed. The two steps, parallel CMSuG and parallel RAG construction, speed up the overall segmentation algorithm fourfold. Because of the parallel technique, real-time image processing becomes true with high performance. In our parallel method, computer resources are not being fully utilized, and more improved resource assignments will be our future research.
Footnotes
Acknowledgment
The authors would like to thank the google company for the favor of remote sensing images. The paper is supported by the National Natural Science Foundation of China under Grant No. 62072135. The paper is also supported by the Postdoctoral Program of Heilongjiang Hengxun Technology Co., Ltd.
