Abstract
Concrete building structures are prone to cracking as they are subjected to environmental temperatures, freeze-thaw cycles, and other operational environmental factors. Failure to detect cracks in the key building structure at the early stage can result in serious accidents and associated economic losses. A new method using the SE-U-Net model based on a conditional generative adversarial network (CGAN) has been developed to identify small cracks in concrete structures in this paper. This proposed method was a pixel-level U-Net model based on a generative network, that was integrated the original convolutional layer with an attention mechanism, and an SE module in the jump connection section was added to improve the identifiability of the model. The discriminative network compared the generated images with real images using the PatchGAN model. Through the adversarial training of generator and discriminator, the performance of generator in crack image segmentation task is improved, and the trained generation network is used to segment cracks. In damage assessments, the crack skeleton was represented by the individual pixel width and recognized using the binary morphological crack skeleton method, in which the final length, area, and average width of the crack could be determined through the geometric correction index. The results showed that compared with other methods, the proposed method could better identify subtle pixel-level cracks, and the identification accuracy is 98.48%. These methods are of great significance for the identification of cracks and the damage assessment of concrete structures in practice.
Keywords
Introduction
The concrete building structure is affected by operational loads and external environments—such as environmental temperatures and freeze-thaw cycles, and it prone to crack. The crack could be spread gradually to form a wider or longer crack. When the crack width in the concrete structure, especially in the key building structural member, is greater than 0.3 mm, the performance of the building can be compromised seriously (ACI, 1999). Consequently, the regular crack detection for concrete structures is very important for assuring the performance, service life, and aesthetics of buildings.
Local non-destructive techniques such as pulsed laser scanning and infrared thermographic have been developed for detecting cracks in concrete (Sabeenian et al., 2021). Pulsed laser scanning is to generate ultrasonic waves and the propagation waves are received by a fixed transducer to detect the defects in concrete members. Infrared thermographic technology is to measure the infrared waves to construct the digital image for defect detection (Hao et al., 2022; Moghnieh et al., 2022). Both methods, however, suffer from unclear results and high costs. An alternative, the computer vision technique, has gained interest for its cost-effectiveness in detecting surface cracks on concrete structures (Xu et al., 2018, 2019). Early image processing based crack detection methods mainly use threshold recognition algorithms (Chen et al., 2019; Hoang, 2018a) and backpropagation (BP) neural networks (Li et al., 2014). Typical threshold recognition algorithms include Prewitt’s algorithm (Lei et al., 2018) and Canny’s algorithm (Hoang and Nguyen, 2018b), these algorithms are easy to be affected by background clutter, resulting in incomplete crack recognition edge profiles. Although BP neural network is used to obtain the desired fracture output effect, the accuracy is improved, but the difficulty of parameter adjustment leads to over - or under-identification (Liu et al., 2017). The probabilistic analysis framework, which combines dynamic Bayesian network and fracture mechanics, is utilized for predicting the fatigue crack propagation of multiple fatigue details in long-span bridges (Xu et al., 2022). Automatic crack recognition using the Mask-Rcnn network (Li et al., 2021; Wu et al., 2021b) is also a commonly used crack-recognition algorithm. As the above, for fine cracks, the continuity between crack pixels and background pixels is ignored, and the detail edge information is missing.
In recent years, with the rapid development of computer hardware, many deep learning algorithms for crack detection have been proposed. The convolutional neural network is used to extracted features of images for automatic detection of pavement cracks by smart phones (Zhang et al., 2016). A constrained Boltzmann machine is used to detect cracks (Li et al., 2018) and a fused convolutional neural network to detect distributed cracks in steel girders of real bridges with great success (Li et al., 2019). Using Dual Attention Network can adaptively integrate local features with their global dependencies to obtain accurate segmentation results (Fu et al., 2019). As above, there is still a big challenge for fine crack detection in concrete structures, such as incomplete edge contours, over- and under-recognition phenomena, loss of edge details, and low recognition accuracy.
Generative adversarial network (GAN) in object recognition image generation (Chen et al., 2021; Liu and Wu, 2021; Zhang et al., 2019) and the field of image enhancement (Gao et al., 2019; Zhang et al., 2020) has made remarkable progress, effectively solving the problem of interdependence between cracks and background in crack detection, and improving the accuracy of identification results. The field of image generation focuses on generating high-quality images through GAN for model training or further analysis, while image enhancement focuses on using GAN to optimize existing semantic segmentation models and achieve more accurate crack segmentation by improving image quality. Since the existing crack data set is sufficient to support this study, this paper mainly focuses on optimizing the identification process through image enhancement techniques, rather than generating new training data. However, the problem of detail loss occurs when the existing semantic segmentation model of GAN network optimization deals with incomplete edge cracks, which is discussed in this paper.
However, the conventional GAN models are very unstable for training because of the inherent adversarial relationship between the generative and discriminative networks. To better balance these two networks, a conditional GAN (CGAN) is constructed by adding conditional information to both the generative and discriminative networks (Mirza and Osindero, 2019). As a method to learn data based on GAN models, the CGAN uses games to continuously optimize the generative and discriminative networks so that the generated images are increasingly similar to the real images in terms of distribution, and achieving enhanced pixel-to-pixel continuity, thus improving recognition accuracy. Adversarial training methods are proposed for training semantic recognition models to improve the recognition accuracy (Yildiz et al., 2021). Traditional error functions, such as the mean squared error (MSE), are much sensitive to outliers. If there are outliers in the sample, the MSE assigns higher weights to the outliers, which sacrifices the prediction effect of other normal point data, eventually reducing the overall model performance. PatchGAN models perform binary discrimination on the truth or falsity of the input data (Gao et al., 2019; Huo et al., 2018). Unlike other discriminant functions that output a single value, PatchGAN outputs an N × N matrix X. Zhang et al. (2021) used PatchGAN discriminant networks in pix2pix networks, not only improving the speed of image computation but also ensuring the model focused on image details to meet the requirements of high-resolution and high-definition details.
Compared to object detection, pixel segmentation can identify and locate cracks more accurately as cracks occupy a relatively small proportion of pixels. To obtain quantitative information such as crack length and area, the semantic segmentation algorithm is used to detect cracks (Xu et al., 2023a) and the proposed method has achieved significant improvement in the damage identification of complex structures (Xu et al., 2023b). The U-Net network is an improved fully convolutional neural network model (Shelhamer et al., 2015), and it has been applied to improve the image recognition performance in the field, such as water body recognition for road detection (Zhang et al., 2018) and Iris segmentation (Lian et al., 2018). The method reduces information loss by mapping the features of the encoding layer to the decoding layer and transferring a large amount of context information to the high-resolution layer, so as to accurately identify the crack edge. A new self-attentive adaptive (SAA) neuron to the U-Net network is added to enable the microcrack segmentation of real steel box girder bridges (Zhao et al., 2022). However, the edge detail information for fine cracks is not detected. To address the above issues such as the loss of crack-edge details during crack recognition, the squeeze and excitation (SE) module is used to identify image features by incorporating an attention mechanism in the U-Net architecture (Chen et al., 2020; Yang et al., 2021; Zhu et al., 2021). It can add the weight for each feature channel based on the value of the feature image to enhance the crack features and reduce the irrelevant features, such as background features. The effect of crack-feature recognition is enhanced, the edge detail information of cracks is retained, and then the recognition accuracy is improved. The CGAN can satisfy the requirements of pixel continuity, and the attention mechanism based on the SE module introduced in the U-Net can ensure the maintenance of crack-edge detail information in crack image recognition.
In this study, an SE-U-Net recognition model based on CGAN has been developed to identify pixel-level fine cracks in concrete structures. The issues of incomplete edge contours, loss of edge details, and low recognition accuracy in the fine crack detection are studied. Different with the traditional crack detection method based on image features, the proposed method utilises the CGAN as the basic architecture combined with an improved U-Net pixel-level image recognition method. The improved U-Net model for pixel-level recognition is used as the generative network and the PatchGAN model is used as the discriminative network. The generative network is trained against the discriminant network to continuously improve the image recognition accuracy and solve the problems of incomplete edge contours and loss of edge details in crack-recognition images. To achieve a quantitative calculation of the cracks, the morphological characteristics of cracks, including crack area, length, width and other attributes, is identified and calculated using the diametrical morphological skeleton recognition method. The results show that the proposed method could locate crack widths down to 0.1 mm and provide valuable insights for the health assessment of concrete and wooden structures.
Theory: Crack identification based on CGAN SE-U-net model
Conditional generative adversarial networks
The pixel classification using the baseline model can improve the accuracy of pixel recognition. However, the continuity between crack pixels and background pixels can be easily ignored, causing the loss of crack-edge details or the cracks to differ considerably in size and shape from the real standard in the recognition results. A GAN can better solve such problems, and its structure diagram is shown in Figure 1. However, an inherent adversarial relationship exists between the generative network (G) and discriminative network (D) in a GAN, making the generated results unstable. To better balance the two networks, a CGAN can be constructed by adding conditional information to both the generative and discriminative networks. This allows the generative network to generate the final result in a way that is not completely free and unsupervised, but rather generates the appropriate result according to the specified conditions, avoiding the problem that the generative network would incorrectly identify the cracks. CGAN can effectively solve the problem that GAN is difficult to converge, and does not significantly exacerbate the training burden of GAN. The objective function of the CGAN can be expressed as follows: The CGAN structure.
To enhance the generative power of the generator, stabilize the training of the generator, and ensure the integrity of the crack-edge segmentation, a distance function L1 is added to the objective function as follows:
The new objective function can be obtained by combining the original objective function with the distance function L1 as follows:
Generative networks: improving the U-net model
The U-Net network is a semantic segmentation convolutional neural network, comprising three parts—that is, an encoder, decoder, and skip connection (Olaf et al., 2015), as shown in Figure 2. In contrast to a general convolutional neural network, the U-Net network adds upsampling while performing deconvolution to obtain a pixel-level segmented image. It requires less training data and is suitable for accurate segmentation of small targets. The U-Net network model.
In this study, a pixel-level U-Net segmentation network was used as the generator, and the convolutional layer was replaced with an SE module to improve the model segmentation effect. The SE module was also added to the jump connection to enhance the model’s ability to extract crack details, and a dropout layer was added to prevent the network from overfitting. In the construction of the GAN generative and discriminative networks, this study adopted a pixel-level U-Net structure as the generative network for the pixel-level recognition of crack edges.
To further solve problems such as the loss of crack-edge details during crack recognition, this study used the SE structure to recognize the image features. It weighs each feature channel based on the value of the feature image, enhances the weight of crack features, and reduces the weight of irrelevant features—such as the background—to improve the effect of crack feature recognition. Consequently, in this study, the original convolutional structure was replaced by a convolution and SE structure. The combined SE module is shown in Figure 3. The SE module.
The SE module uses two convolutions—that is, the first adjusts the number of channels and the resolution of the input feature image to the desired output size and the second is used to perform the SE operation.
First, after global pooling of the feature images, a fully connected network with a bottleneck structure is passed through, and a sigmoid function is used for the output of each neuron to obtain weights with a value range of (0,1), where the number of neurons is determined by the hyperparameter r (r being the compression multiplier of the neurons). Replacing the convolutional layer with an SE module—which contains a convolutional layer—and a convolutional layer with an SE structure—compared with a single convolutional layer—further improves the model’s recognition ability.
Second, to satisfy the need for multilevel feature learning in recognition tasks, the attention mechanism SE module is added to the jump-connected part of the U-Net structure to learn the multiscale features of the encoder and obtain the importance of each feature channel. Crack features are boosted based on the importance of each feature channel, and the weights of feature recognition are distributed to all stages of the contraction path to solve the problem of the U-Net relying on fixed weights in the process of image feature recognition to enhance the effect of feature recognition and improve the recognition accuracy of the model.
After adding the SE module, the U-Net model is as shown in Figure 4, the U-Net encoder comprising the SE module with downsampling—the main function of which is to recognize features and expand the perceptual field—and the decoder consisting of the SE module with upsampling—the main function of which is to recognize features and expand the resolution. A jump connection exists between the encoder and the decoder, which splices the shallow features in the encoder with the deep features in the decoder, providing detailed information to the decoder and enabling it to obtain finer recognition boundaries. The U-Net model added to the SE module.
Discriminative networks: PatchGAN
When selecting a discriminative network, different construction forms can be selected based on different data types. In the field of imaging, convolutional networks are often used to discriminate between input data sources. To meet the demands of adversarial networks, focus on image details, and ensure high-resolution and high-definition details, the PatchGAN model was used as the discriminative network. The output image of the generative network and labeled image are used as inputs separately, and the image is convolved using a convolution kernel adapted to the input image to reduce the image area and identify the image features. A fully connected convolution was used to find the discriminant network values corresponding to the generative network sub-pest image and recognition label image. The discriminative network loss values can then be calculated using cross-entropy, the Ranger method being used for network learning optimization.
The U-Net was used as the generative network. After obtaining the generative network image recognition data, the PatchGAN multilayer convolutional network was used as the discriminative network model to maintain the high resolution and detail of the image. The output recognition image and the real labeled image of the generative network are used as the input, and the image is convolved using the convolution kernel adapted to the input image to reduce the image area and identify the image features. Each pixel of the real image and the generated image is then scored. Finally, all the pixel scores are averaged to determine the authenticity of the image. Fully-connected convolution is used to determine the discriminative network values corresponding to the generated network-recognized image and the recognized labeled image. In this study, a convolutional kernel with a step size of two and a length and width of 3 × 3 was selected for the four-layer convolutional network, the specific discriminative network structure of which is as shown in Figure 5. The discriminator structure model.
Although the number of parameters can be reduced by decreasing the number of convolutional kernels in the network, the number of eigenmaps generated per layer also decreases, leading to a decrease in the expressiveness of the network. In this study, we attempted to replace the stochastic gradient descent (SGD) algorithm with the Ranger optimizer. Stochastic gradient descent is one of the most widely used optimization algorithms in current neural networks, but the training speed of this algorithm can be slow, and the model can fall into local optimal solutions. For concrete crack-recognition applications, it is desirable for the model to exhibit a better generalization ability and faster training speed.
During training, the Ranger optimizer updates the parameters during each iteration round based on the BP algorithm. Owing to the slender contour characteristics of cracks, the initial learning rate was 10 in this study. Moreover, the smaller the loss, the smaller the deviation between the identification result and the true value. If the loss value is large or changes considerably, the parameters—that is, the initial learning rate, iteration rounds, and network depth—must be finely adjusted. The Ranger optimizer was used to dynamically adjust the learning rate during training so that the model could achieve a higher convergence rate.
The overall composition of the crack image segmentation network
In the adversarial learning process between the generator and discriminator, not only can the loss functions of the discriminator and generator be calculated, but the discriminator’s ability to discriminate the segmentation result of the generated image as false can also be improved, while the labeled image can be fitted by the continuous optimization of the generator, which can make the discriminator discriminate it as true. The two are continuously played off against each other during the training, the adversarial update based on the discriminator and generator models being used as an update of the entire network after training to improve the generator image segmentation accuracy. At the end of the training process, accurately segmented images are obtained in the generator. The overall framework of the model is as shown in Figure 6. The overall model framework.
Methods: Project examples
Crack database establishment and testing environment
In the field of image processing, it is critical to acquire and create high-quality images. CCD technology, a semiconductor imaging sensor, occupies an important position in solid-state arrays with a light-sensitive substrate comprising discrete silicon elements that can effectively capture subtle changes in the environment by injecting, transmitting, and detecting charges to provide accurate image data. CCD technology offers better performance, higher sensitivity, less image distortion, and faster response time, offering considerable advantages in crack detection. In this study, we considered this to be the best image-acquisition device.
To effectively suppress external interference, we used a light source as an important part of image acquisition and carefully studied its indicators, including brightness, distribution of the light source, distribution of the spectrum, age of use, and contrast. For example, if the light is too dim it can reduce the image contrast and be more likely to cause noise. Consequently, it is necessary to use light continuously to ensure image accuracy, selecting the light to ensure that it echoes the surroundings and better captures and displays all information. Based on the selection indicators mentioned earlier, as well as the characteristics of the concrete and precast components, we chose a light-emitting diode (LED) as the light source for image acquisition. LED light sources exhibit excellent brightness, efficient luminescence, easily adjustable brightness, long life, and rapid response. Moreover, the light source comprises many LEDs, thus meeting the needs of a variety of illumination angles.
Before capturing images of cracks in precast concrete, it is important to ensure that the distance between the camera and member is reasonable, depending on the accuracy of the camera and crack detection. When cracks of size 0.2–0.3 mm appear in precast concrete members, the accuracy of the measurement must be lower than 0.2 mm to ensure its proper use. For this purpose, the pixel resolution was 1600 × 1200; therefore, we could obtain images that included the actual dimensions of the precast members—that is, 160 × 120 mm.
As shown in Figure 7(a), the following steps can be performed to determine the position of the camera: (1) Place a 160 or 120 mm rectangle on a blank sheet of paper and attach it to the wall. (2) The image-acquisition device is mounted in a horizontal position and the head is adjusted to ensure that it remains perpendicular to the blank sheet of paper with a rectangle drawn on it. (3) By adjusting the position of the image-acquisition device, the rectangle on the sheet of paper can be fully captured. (4) Calculate the average value of the distance between the paper and camera by measuring them several times. Crack image acquisition and processing. (a) Image-acquisition distance diagram, (b) Precast concrete members, (c)Equalization transformation of the crack image histogram.

Crack images of precast concrete were obtained from an engineering project in Tianjin, China. An industrial light source was used for illumination to maintain a uniform light intensity in the camera’s field of view, and a CCD camera with 20 million pixels was used to capture 5000 raw images of cracks in the precast elements at a resolution of 4000 × 3000. Prefabricated elements included columns, beams, slabs, stairs, and balconies. Figure 7(b) shows the precast concrete members.
To obtain a large number of crack images that met the input requirements of the model, the original images were cropped to obtain 5000 images of resolution 512 × 512. Each image contained a sample of 262,144 pixels, the entire database containing a sample of 1,310,720,000 pixels. The crack images were edge-labeled with a two-pixel width using the LabelMe software, black indicating the background and red indicating the crack—that is, (0 0 0) indicating a background pixel, and (255 0 0) indicating a crack pixel.
To train and validate the performance of the model, 80% of the images were used as the training dataset, 10% as the validation dataset, and 10% as the test dataset. Five thousand images were randomly disordered; 4000 images were classified as the training dataset, 500 images were classified as the validation dataset, and the remaining 500 images were used as the test dataset.
Because cracks in concrete structures can be easily disturbed by external light or electromagnetic waves, the quality of the crack images was degraded, using unclear edges and insufficient contrast. Consequently, we must first solve these problems by preprocessing to improve the accuracy of the subsequent recognition. Crack image preprocessing operations include image grayscale, image contrast enhancement, and image multi-structure morphology filtering and denoising, as shown in Figure 7(c).
The test environment for this experiment is built using TensorFlow on windows with Intel(R) Core (TM) i7-3770 CPU @ 3.40 GHz, training, validation, and testing will be executed on HP workstations configured with 8 GB GPUs as Quadro P4000/PCle/SSE2. A virtual environment for the SE-U-Net network was created using Anaconda. CUDA 10.0 and CUDNN 7.4.1.5 are used to accelerate the training of the network.
Evaluation indicators
To quantitatively analyze the segmentation results of the model and evaluate the goodness of the crack-segmentation model for precast concrete components, the most commonly used evaluation metrics (Wu et al., 2021a), such as the pixel accuracy (PA), dice similarity coefficient (Dice), intersection over Union (IoU), and recall rate (Recall), for assessing the semantic segmentation effect were selected in this study.
Description of the four parameters: • True positive (TP): Predicted result is a positive class, the actual class is positive. • False positive (FP): Actual negative class; the predicted result is a positive class. • True negative (TN): Predicted result is a negative class, the actual class is negative. • False negative (FN): Actual positive class; the predicted result is negative.
Results and discussion
Model training
The training was performed using one image as the batch size with a maximum of 50 epochs, the entire network using a learning rate of 0.0005. A discard rate of 0.4 was used, and the network was optimally trained using Ranger gradient descent method. Network training was initiated after completing the hyperparameter configuration. There were 4000 images in the training dataset, and one image was used as the input for training in each epoch. During each iteration, the training accuracy and loss rate of the network were calculated. After each epoch, 30 images were randomly selected from the validation dataset to verify the performance of the network, and the validation accuracy and validation loss rate of the network were calculated. Figure 8(a) and Figure 8(b) show the accuracy and loss curves for the training and validation of the SE-U-Net, respectively. The red solid line represents the loss curve for the training dataset, and the blue solid line represents the loss curve for the verification dataset. As the training period increases, the errors in the training and validation sets decrease and the segmentation accuracy improves, the training loss curves are identical to the validation loss curves. When the number of the epochs increases, both the accuracy and loss of the model no longer change significantly, and the proposed model reaches convergence. The experimental results of the SE-U-Net network. (a) Case of accuracy with epoch during training, (b) Case of loss rate with epoch during training, (c) The accuracy of the SE-U-Net network for the test images.
After network training and verification, we viewed the model segmentation PA on the test dataset to evaluate the network performance. Figure 8(c) shows the evaluation results of the SE-U-Net network on the test dataset with a learning rate of
Crack quantification calculation
To identify cracks more accurately, new methods needed to be developed to estimate their lengths and widths. These methods could be approximated by calculating the diagonal length of the outer rectangle of the crack, to identify regular or straight cracks more precisely. However, these methods could also lead to errors owing to the complexity of shape bending, preventing accurate results from being obtained. Consequently, a new method was proposed in this study to estimate the length L of the cracks as follows: After the crack skeleton was extracted, we replaced it with the width of each bit of data and determined its final length by calculating the geometric correction index, expressed as follows:
By calculating the length (L) of the crack skeleton—used for geometric correction—L can be used to determine the average width (W) of the crack based on an estimate of the crack pixel area and its ratio to the length (equation (10)).
When a single pixel width is used to represent the crack skeleton, the length of the crack can be obtained by summing the skeleton pixels, the crack area being the sum of all crack pixels and the average width being obtained by the ratio of area to length.
Test set crack prediction results and quantitative analysis.
Table 1 lists the number of cracks predicted and quantified in the test dataset. We used a recognition model to estimate the sizes of the cracks and compute their skeletons. We also counted the number of pixels in each skeleton to derive their lengths. Finally, we summed the values of all the pixels to derive their average widths. Based on the quantitative analysis, we determined that the lengths of the cracks were closer to the predicted values of the area and average width, which also matched the true values of the labels.
Figure 9 shows the crack prediction metrics for a sample of 100 test dataset images, including the area, length, and width of the cracks; the statistics of these metrics matching the labeled dimensions. As is evident, except for a few anomalous prediction results, most of the data points are within the range of y = x, indicating that the prediction results do not differ much from the actual labeled results. Based on the statistics shown in Figure 9(a) and Statistical results of crack geometry on the test dataset (a) area, (b) length, (c) average width.
Comparative study
The recognition results and accuracy of different models.
Table 2 lists the detection results of the five methods in detecting the six cracks. DeepLabV3 + captures multi-scale information by increasing the receptive field, but the recognition rate of small cracks is still low. U-Net++ enhances the network depth and width but has limitations in segmenting the details of small areas, especially when the cracks are similar to the surrounding texture. The Mask-Rcnn model in crack image processing is prone to over- and under-segmentation, which may be due to its shallower network depth. The sampling module used in the surface study is difficult to extract depth features with limited crack data, which is not conducive to feature localization. The U-net used alone detected cracks and represented them completely, but also suffered from unsegmented fine cracks and some pixel point errors. Despite the overall accuracy of these methods, they are still prone to over-segmentation, under-segmentation and loss of edge details on small cracks.
By contrast, the greater depth of the GAN-based U-Net model proposed in this study and the encoding-decoding structure make the model not only extract richer semantic features, but also join the GAN to form a confrontation through continuous gaming between the generator and discriminator during the training process, which improves the segmentation accuracy after repeated training. The proposed method improves the PA by 5.9%, recovers the edge information of the object, enhances the continuity between pixels, and ensures segmentation accuracy. Although in the crack detection experiments of Case 6 although U-Net++ could detect the edge fine cracks as well as the proposed method, it could not detect the edge fine cracks of Case 4. That is, the model proposed in this paper is able to accurately identify cracks up to 0.3 mm, improving the accuracy of small cracks without significantly reducing the results of normal width cracks. The Dice, IoU, Recall, and PA of the five models are shown in Table 2.
Based on the data summarized in Table 2, the proposed GAN-based U-Net segmentation method is more accurate than the other methods. The results of the algorithm proposed in this study were 97.08%, 94.45%, 97.32%, and 98.48% for the Dice, IoU, Recall, and PA, respectively. The accuracy, completeness, and similarity of the model prediction results to those of the real labeling process improved.
Ablation Experiment of Different Optimizers.
Moreover, it can be seen from Figure 10 that the SGD optimizer fell into the local optimal solution earlier in the early training period (the number of epochs 0-20), while the Ranger optimizer showed a faster convergence speed, and the final loss was smaller than that when SGD optimizer was used. This verifies that the Ranger optimizer can accelerate the training speed more effectively and avoid falling into the local optimal solution when dealing with the task of crack identification, thus significantly improving the model performance. Training loss over epochs for Ranger and SGD
Ablation Experiment.
In order to verify the effect of CGAN network, we named the network after removing CGAN network “w/o CGAN” for comparison. The experimental results are shown in Table 4. The results show that when CGAN network is adopted, the model performance is improved by Dice 6.16%, IoU 10.57%, Recall 6.05%, PA 2.74%, and the model performance is significantly improved.
In order to evaluate the effectiveness of the SE module in the crack segmentation task, a series of ablation experiments were designed in this study. After eliminating the SE module, the method was renamed as “w/o SE module”. The experimental results are shown in Table 4. When SE module is used, the performance is improved to Dice 2.68%, IoU 1.15%, Recall 4.16%, PA 3.21%. These results indicate that the SE module can enhance the network’s ability to recognize crack features to a certain extent, especially in improving the accuracy of segmentation significantly.
By comparing the contribution of GAN network and SE module to model accuracy in Table 4, we find that CGAN network is more effective in improving model accuracy than SE module.
Conclusions
A CGAN-based SE-U-Net model has been developed for identification of fine cracks in concrete structures. The proposed model used a CGAN as its architecture, a modified U-Net network to design the generative network, and an attention mechanism SE module to enhance the weights of the crack features and suppress the weights of the background features to improve the network’s ability to capture crack details. PatchGAN was used as the discriminant network to score each pixel of the real and generated images and was combined with the Ranger optimization algorithm to solve the problem of too many parameters in the optimization process, making the model converge quickly and improving its recognition efficiency and accuracy. Through repeated game training of the generative and discriminative networks, the crack-recognition image generated by the generative network was very close to the real image, thus realizing the effect of the generative network in recognizing cracks with high accuracy. The results showed that the proposed method could accurately and effectively identify fine cracks in concrete structures. Some conclusions are obtained as follows, (1) Compared with traditional recognition algorithms and other models, the results showed that the proposed method could retain detailed crack-edge information for fine cracks of pixel-level width and near-fine extreme cracks and the PA rate reaching 98.48%. (2) The crack skeleton was represented by a single pixel width, identified by binary morphology. The area, length, and average width of the cracks could be calculated, providing useful information for the damage assessment of building structures.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Anhui Province Key Laboratory of Intelligent Building and Building Energy Saving. Grant Number IBES2020KF08.
