Abstract
Surface cracks in reinforced concrete (RC) bridge structural elements pose significant concerns regarding durability, strength, and serviceability. Micro-cracks, if left unchecked, can propagate into macro-cracks due to several factors leading to structural health deteriorations and increased maintenance costs. Generative Adversarial Networks (GANs), which leverage Convolutional Neural Networks (CNNs), provide a powerful alternative by learning from unstructured image datasets and generating realistic visual outputs. This study presents a novel GAN-based mechanism referred to as CrackGAN for predicting the propagation patterns of cracks in RC bridge elements. The model synthesizes realistic crack growth patterns and allows for predictive visualization of their evolution. Validation of predicted results were validated by comparing them with real time crack propagation recorded in laboratory tests on RC beams and slabs. The results were evaluated using Structural Similarity Index Measure (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS), demonstrating strong agreement between predicted and experimentally observed crack propagation.
Keywords
Introduction
Concrete is one of the most widely utilized construction materials for infrastructure such as bridges, buildings, and pavements. Despite its durability, concrete structures are vulnerable to degradation over time due to factors including overloading, aggressive environmental exposure, and material aging (Koch et al., 2015; Han et al., 2022 & Laxman et al., 2023). Among the earliest and most critical signs of structural distress are cracks both micro-cracks and macro-cracks which not only compromise the structural strength but can also lead to progressive failure if left unaddressed (Kim et al.,2019; Tan and Bao, 2021). Understanding the formation and propagation of cracks is therefore essential from both structural health monitoring and lifecycle maintenance perspectives. Cracks allow moisture and harmful chemicals to infiltrate the concrete, accelerating the corrosion of embedded reinforcement once these cracks reach the rebar level. This process can result in delamination, spalling, and a loss of serviceability and safety (Lin and Wang,2020; Long et al.,2021& Tian et al., 2022). Consequently, effective inspection and damage assessment procedures are vital for ensuring the integrity and longevity of concrete infrastructure (Valenca et al.,2017; Tan et al.,2021). Currently, many inspection systems still rely heavily on manual visual assessment by trained personnel (Gao and Tong, 2023). However, manual inspection introduces subjectivity, inconsistencies, and is prone to human error, especially due to fatigue and environmental distractions (Rao et al.,2021; Sowden 1990; Phares et al.,2004). In light of these limitations, recent advancements in artificial intelligence (AI) particularly in the field of deep learning (DL) have shown great promise in automating and improving the accuracy of structural damage detection (Hsieh and Tsai, 2020; Jiao et al.,2024; Ramzan et al., 2021). DL, as a data-driven approach, has demonstrated superior performance in domains such as image classification, object detection, speech recognition, and computer vision (Ho et al.,2021, 2022; Park et al.,2022; Zou et al.,2018). In structural health monitoring, DL algorithms are increasingly being used to process drone-acquired imagery for detecting and tracking cracks in hard-to-access areas. These models typically comprise three main components: (1) a suitable neural network architecture, (2) a loss function that guides the learning process, and (3) an optimization algorithm to update the model’s parameters. The task of crack segmentation and growth prediction has seen promising developments using DL methods (Anitescu et al.,2019; Do et al.,2019; Hsu et al.,2020; Goswami et al.,2020; Chakraborty et al.,2022). However, a key limitation in the current literature is the lack of robust approaches capable of accurately predicting future crack propagation patterns on concrete surfaces. Most existing studies focus either on damage detection or segmentation at a given point in time, with limited capability to forecast the temporal evolution of crack morphology. Furthermore, DL models typically require extensive experimentation with different architectures and hyper parameters, as no single model is universally optimal across datasets. Bayar and Bilir (2019) made an early attempt to model crack growth patterns using Voronoi diagrams and machine learning. Their work divided surface cracks into segments and attempted to predict the subsequent path of propagation. While pioneering in nature, the study was limited in accuracy and spatial resolution, providing only rough estimations of future crack paths. Separately, the development of Generative Adversarial Networks (GANs) by Goodfellow et al. (2014) opened new avenues for image generation tasks. Variants such as conditional GAN (cGAN) (Mirza & Osindero,2014), Pix2Pix (Isola et al.,2016), and CycleGAN (Zhu et al.,2017) have achieved significant success in image-to-image translation. Bianchi and Hebdon (2021) explored the use of inverse GANs for simulating corrosion patterns on steel bridges, leveraging semantic features learned from latent spaces originally developed for facial recognition tasks. However, their study focused exclusively on steel structures, and did not provide experimental validation for the generated outcomes, leaving a critical gap in the application of GANs to concrete crack forecasting.
To address these research gaps, this study presents a novel inverse GAN-based framework to predict the future propagation of single-surface cracks on Reinforced Concrete (RC) bridge elements. Unlike prior work that either relied on segmentation or coarse prediction of crack paths, the proposed method is capable of generating high-resolution, data-driven forecasts of how a crack will likely propagate over time. The model is trained on a comprehensive dataset of approximately 8,200 256 × 256 pix crack images sourced from three distinct databases: (1) Bridge Crack Library 2.0, (2) Conglomerate Concrete Crack Detection Dataset, and (3) Synthesized Crack Image Dataset. To isolate the core problem, only images of non-branching (single) cracks were used in the training process. The generated predictions were validated against laboratory observations of crack propagation in RC slab and beam specimens. Quantitative evaluation using Structural Similarity Index Measure (SSIM >0.9) and Learned Perceptual Image Patch Similarity (LPIPS <0.1) demonstrated a high degree of accuracy, confirming the model’s potential for real-world application. In summary, this study presents a novel, image-based predictive tool for crack propagation in concrete structures using inverse GANs filling a critical void in current structural health monitoring approaches and offering a scalable solution for proactive maintenance planning. Experiments demonstrate that the predicted crack development obtains Structural Similarity Index Measure values above 0.90 and Learned Perceptual Image Patch Similarity values below 0.20 relative to laboratory-recorded actual crack development. Moreover, apart from the perceptual similarity, physical trends including overall increasing crack length, widening within the vicinity of crack tips, and crack tip orientation consistent with the original direction of crack development are captured by the model. In contrast to existing methods for crack detection and segmentation using deep learning, instead of being confined to current-state analysis, the presented approach allows predictive visualization and retains geometric properties supportive of actual reinforced concrete crack development.
It is important to highlight here that in this study, prediction refers to data-driven estimation of future crack states through learned latent-space transitions, rather than physics-based fracture simulation. This is significantly different from existing generative models like Pix2Pix or CycleGAN, which are used more directly in image-to-image translation without considering the aspects of time progression, state continuity, or direction. In existing frameworks, each generated image is independently generated based on the input image, without considering any method related to encoding or conserving consistency in progress in the subsequent states. The proposed CrackGAN, therefore, incorporates the notion of a state variable in the latent space, along with a learnt direction, which enables the prediction related to the potential progression of the existing crack. Although it still incorporates images, without directly acting as a replacement for fracture modeling, experiments carried out utilizing the metrics related to SSIM, LPIPS, as well as geometrical characterization metrics related to cracks, prove the generated states in the subsequent phases to be consistent with the actual progression related to the crack, thus justifying the ‘predictive’ nature in the defined context.
Methodology for concrete crack prediction
The underlying strategy is designed in a form of a pipeline, which is a multi-step process, illustrated in Figure 1. Each stage in the process relates directly to a corresponding part of the strategy, enabling the reader to understand the process through which the data evolves from raw images of cracks to predicted images related to the propagation of cracks. There are five main stages in the proposed strategy: (i) crack detection and segmentation, (ii) crackGAN training and latent space learning, (iii) latent space projection and direction estimation, (iv) crack propagation vua Slerp, and finally, (v) experimental validation. Each stage in the process has intermediate results, which are illustrated in the flowchart to enable a clearer understanding of the process. This process begins from RGB images related to the cracks extracted from various databases. These images are subject to a Pix2Pix cGAN algorithmic process, which enables binary masks related to the extraction process. These binary masks are directly used as inputs into the CrackGAN Generator, which enables the dimensionality reduction process using a 100D latent space, encoding directionality, length, as well as other aspects related to the nature of the extracted cracks. This latent representation acts as the main state variable for predictive analysis. Then, latent space projection is done to project real images of cracks into the learned latent space, which helps in finding a valid starting point of crack evolution. Further, the directional crack propagation is modeled by estimating the unit direction vector in the latent space, estimated from paired latent embedding and confidence scores obtained using ResNet-based classifier and SVM-based hyperplane learning. For generating intermediate latent states of progressive crack growth, spherical interpolation (Slerp) is performed along this direction. Finally, these interpolated latent vectors are decoded back into the image space using the CrackGAN generator, producing a sequence of predicted crack propagation images. The predictions are numerically validated against experimentally observed crack growths in RCC slabs and beams using SSIM and LPIPS metrics. By showing explicitly the middle representations and mapping each block in the revised workflow to a corresponding subsection, readers can find more easily the relationship between the methods and understand better how each component adds to the overall predictive crack propagation framework. Figure 1 shows the methodology adopted in this study. Detailed discussion of each step and results has been elaborated in the section 6 of this paper. Methodology for cracks prediction.
Reproducible training specifications for PIX2PIX, RESNET-50, and SVM
To ensure full reproducibility of the developed framework, this section provides thorough training conditions for all learning components working in the study, including Pix2Pix for crack detection, ResNet-50 for crack confidence estimation, and Support Vector Machine (SVM) for latent-space direction learning.
Pix2Pix crack detection and segmentation
The Pix2Pix model was trained on a matching dataset of RGB crack images and their consistent binary crack masks composed from the Conglomerate Concrete Crack Detection Dataset and Bridge Crack Library 2.0. Each image was confirmed to be in a standardized resolution of 256 × 256 pixels. The total dataset was divided among training, validation, and test sets in a ratio of 70:15:15 with a policy of ensuring zero overlapping cases of crack instances in the data splits for an unbiased performance analysis of the models. Data augmentation was also utilized in the training of the models with a transformation step of changing the image resolution from 256 × 256 back to 286 × 286 and a random flip operation followed by pixel intensity normalization in the range [−1, 1].
The architecture of Pix2Pix contained a U-Net-style generator, with the encoder-decoder network design and skip connections introduced to preserve spatial features, as well as a 70 × 70 PatchGAN discriminator to ensure the neighborhood structure is intact in the crack masks. The Pix2Pix model was trained with the Adam optimizer, and the learning rate parameters were set to 0.0002, with momentum coefficients of β1 = 0.5 and β2 = 0.999. The loss function of the model added adversarial loss and the L1 reconstruction loss. The size of the batch was set to 1, and the model was trained for 165 epochs. All of these settings were chosen based on their general efficacy in conditional GAN-based image-to-image translation tasks and were used strictly in developing the proposed model.
Although standalone architectures like U-Net or DeepLabv3 + can be used for crack segmentation, Pix2Pix is adopted in this work due to its end-to-end image-to-image translation capability that enforces the strong spatial correspondence between input images and output crack masks. This property is crucial for the downstream latent-space modeling adopted in the proposed framework, where geometric continuity and boundary sharpness directly influence crack propagation prediction. Unlike purely pixel-wise segmentation networks, Pix2Pix learns the joint distribution between crack appearance and mask geometry through adversarial training, resulting in structurally coherent crack representations. Since this work’s primary objective is predictive crack propagation and not segmentation accuracy optimization alone, Pix2Pix is used as a reliable preprocessing stage. A systematic comparison with alternative segmentation architectures is identified as future work.
ResNet-50 crack/non-crack confidence estimation
A binary classification dataset was created by combining synthesized and real images of cracks, using samples from the CrackGAN and the outputs of Pix2Pix-based crack detection. A manual annotation process was carried out by marking each image as a crack or a non-crack based on the presence of cracks. To make the classification unbiased, an equal number of samples of crack and non-crack images were obtained. The dataset was split into training, validation, and testing data, comprising 70%, 15%, and 15%, respectively. For the classification model, the choice fell on the ResNet-50 pre-trained model on the ImageNet database, with the last fully connected layer replaced by a binary classification layer. The images were then resized to feed the network with a resolution of 224 pixels. The network was trained for 50 epochs with the Adam optimizer and a binary cross-entropy loss function and a learning rate of 0.0001. For the network training, validation data was employed to detect convergence and prevent overheating. The confidence output by the ResNet-50 model instead served as a quantitative indicator that was employed for integrating direction learning within the latent space.
SVM-latent space direction estimation
For directional learning in the latent space, the input to the Support Vector Machine (SVM) was created in the form of a concatenated feature vector that included the inverted representations of the crack images in the latent space along with the confidence scores assigned by the ResNet-50 classifier. Paired samples were created to show the progression of cracks at different instances and enabled the detection of regular patterns of crack evolution by the Support Vector Machine. The kernel function used was the Radial Basis Function (RBF) kernel with the regularization parameter (C) set to (C = 1.0) and the gamma value set to automatically scale to feature variances. Normalization of the features was done via Z-Normalization before the initiation of the learning phase. The robustness of the model was attained via 5-fold cross-validation with the objective of the learning phase to find the optimal separation hyperplane that best identifies the primary direction of the growth of the cracks in the latent space. The normalized normal to the separation hyperplane was then used to identify the direction vector for the growth of the cracks in the latent space. In instances where the logs of the primary learning phase were not retained
Dataset composition, feature-space consistency, and preprocessing strategy
The robustness and generalizability of the proposed crack propagation framework was ensured through crack images sourcing from three independent databases, i.e., (i) Bridge Crack Library 2.0, (ii) the Conglomerate Concrete Crack Detection Dataset, and (iii) a Synthesized Crack Image Dataset. Although these datasets come from different acquisition conditions, for model training, a systematic analysis has been performed to ensure consistency in feature space and data distribution. From the viewpoint of feature space, all three datasets predominantly include surface-level concrete cracks under visible-spectrum imaging with similar low-level visual characteristics, such as edge discontinuities, high-frequency linear features, and contrast gradients between crack pixels and background concrete texture. To minimize domain bias, only single, non-branching crack instances were retained by constraining the morphological feature space to variation in crack length, curvature, and width without incorporating complex crack networks. This ensured that topological discontinuities due to branching or intersecting cracks did not influence the latent representations learned in CrackGAN. Before training, all images were passed through a unified preprocessing pipeline. Raw RGB images were first resized at a fixed spatial resolution of 256 × 256 pixels using bilinear interpolation to ensure architectural compatibility across Pix2Pix and CrackGAN. In this paper, Pix2Pix-based segmentation is adopted from the established conditional GAN framework to generate binary crack masks, based on the methodology described by Hasan et al. (2025). These binary masks were then normalized in the range [−1, 1] and used as direct inputs to the CrackGAN generator. No histogram equalization or contrast enhancement was performed to prevent the artificial distortion of crack morphology. Finally, roughly equal sampling rates were enforced across sources for balanced representation. Of the total dataset comprising around 8,280 crack images, 34% were drawn from Bridge Crack Library 2.0, 38% were drawn from the Conglomerate Concrete Crack Detection Dataset, and 28% were drawn from the Synthesized Crack Image Dataset. The synthesized dataset was intentionally under-sampled to prevent bias toward artificially generated crack geometries. A stratified split was used to partition the dataset to preserve the distributional consistency of sources into 70% training, 15% validation, and 15% test sets. Only the validation set was used for monitoring adversarial convergence for preventing overfitting, whereas the test set was held out exclusively for final performance evaluation and experimental validation against laboratory observations of crack propagation. No images from the same physical specimen or instance of a crack were shared across splits, with strict independence between the training and evaluation data assured. A harmonized pre-processing and sampling strategy for CrackGAN makes sure it learns in a statistically coherent and physically meaningful feature space the behavior of crack propagation, which enhances reliability in predictive visualizations presented here.
Network performance, hyper parameter sensitivity, and convergence behavior
The performance of the proposed CrackGAN framework was assessed not only in terms of output quality but training stability and convergence behavior were also considered. Given the adversarial nature of GAN-based learning, particular attention was provided to optimizer selection, learning rate sensitivity, and generator discriminator balance.
Effect of hyper parameters
Key hyper parameters which influence model behavior included the latent vector dimension (100), learning rate (0.0002), batch size (1), and optimizer momentum parameters like β1 = 0.5, β2 = 0.999. Preliminary sensitivity checks showed that using a lower learning rate than 0.0001 significantly slowed down convergence. Higher learning rates (>0.0003) resulted in unstable adversarial oscillations and mode collapse. The adopted configuration has been shown to provide a trade-off between convergence speed and training stability in keeping with established best practices in DCGAN-based architectures. More specifically, latent space dimensionality was fixed at 100 to balance representational richness with computational efficiency. Moving beyond this threshold in latent dimensionality did not produce perceptible benefits within the representation of crack morphology, whereas lower dimensions significantly reduced variability and caused over smoothing in generated crack patterns.
The CrackGAN model exhibited characteristic adversarial training dynamics over 165 epochs. Both Generator and Discriminator losses began at approximately 1.8–1.9 and showed an initial rapid decline during the first 10 epochs as the networks learned basic feature representations. A period of erratic oscillations emerged between epochs 10 and 95, reflecting the adversarial struggle where the generator attempted to produce convincing synthetic crack images while the discriminator learned to distinguish real from generated samples. During this phase, the losses fluctuated significantly as neither network achieved stable dominance, which corresponded with the generator’s inability to produce high-quality synthetic images. After epoch 95, the training stabilized considerably, with both losses converging smoothly toward equilibrium values around 0.5–0.6, indicating that the generator had learned to consistently fool the discriminator with realistic crack patterns. The reconstruction loss demonstrated a complementary convergence pattern, starting at 2.0 and declining steeply through the first 50 epochs before continuing gradual improvement until epoch 110. Beyond epoch 110, the reconstruction loss plateaued at approximately 0.2, suggesting the model had reached its capacity for accurate latent space inversion and further training yielded diminishing returns in reconstruction quality.
Optimizer selection and convergence
The Adam optimizer was chosen for both Generator and Discriminator owing to its adaptive learning characteristics and demonstrated success in adversarial training. Other optimizers, such as RMSprop and SGD, were also experimented with in the initial stages of preliminary experiments; however, these resulted in slower convergence and greater sensitivities to initialization, especially in latent space inversion. Smooth convergence with reduced gradient oscillations makes Adam more suitable for stable crack pattern generation. Convergence behavior of the Generator and Discriminator losses is visualized for training epochs in Figure 2. Convergence behavior of CrackGAN generator and discriminator.
After an initial fluctuation phase typical of adversarial learning, both losses stabilized, indicating a balance between the competing networks. This balance is crucial to avoid the dominance of either network and guarantee meaningful latent representations.
Latent space projection stability
Convergence behavior was monitored in latent space optimization through reconstruction loss during inversion. Figure 3 shows the decay of combined reconstruction loss over iterations, showing consistent convergence within the fixed number of iterations. Early stopping criteria to reduce losses when the reduction becomes negligible, was utilized to avoid overfitting to pixel-level noise. CrackGAN latent space inversion conversion.
The reconstruction loss demonstrated a complementary convergence pattern that provides insight into the latent space learning dynamics. Starting at approximately 2.20, the loss exhibited a steep exponential decline during the initial epochs, dropping rapidly as the encoder-decoder architecture learned efficient latent representations of crack patterns. This aggressive initial improvement continued through the first 50 epochs, after which the rate of decrease became more gradual but sustained. The model continued to refine its reconstruction capabilities until approximately epoch 110, at which point the loss plateaued at around 0.2. This plateau indicates that the model reached its architectural capacity for accurately inverting the latent space representations back to the original image domain. The stabilization suggests that further training beyond epoch 110 yielded diminishing returns in reconstruction quality, as the network had fully optimized its ability to encode and decode crack features within the constraints of its architecture and the dataset characteristics.
Overall, these analyses show that the proposed CrackGAN framework results in stable training and converges reliably, making it suitable for predictive visualization of crack propagation rather than generating separated images.
Crack detection framework
The crack detection framework consists of following stages:
Stage 1: Crack detection & segmentation
The CrackGAN model was introduced as a dedicated generative model for generating the spread of concrete cracks in a realistic and controllable manner. The model is a variant of the Deep Convolutional GAN (DCGAN) architecture and is trained on binary segmentation masks of cracks, each of which was resized to 256 × 256 pixels. The architecture includes two major parts: (1) A Generator responsible for generating new patterns of cracks from a latent space, and (2) Discriminator that determines the realism of patterns being generated. Details of the network architecture are shown in Figure 4. Dcgan architecture.
The Generator is responsible for transforming a 100-dimensional latent vector into a realistic 256 × 256 × 3 image representing a crack mask. The process begins with a fully connected dense layer, which projects the input latent vector into an 8 × 8 × 256 feature map. This projection forms the base for consequent up sampling operations. The up sampling is carried out using five transposed convolutional layers (also known as deconvolution layers), each with a stride of 2, doubling the spatial resolution at every stage. The kernel size of each transposed convolution is set to 4 × 4, and each layer is followed by a Leaky ReLU activation function having a negative slope coefficient of 0.2. This choice eases the “dying ReLU” problem while maintaining non-linearity. The final layer of the Generator uses a 3 × 3 convolutional layer followed by a Tanh activation function, which normalizes pixel values in the output image within the range of [−1, 1]. This normalization advances training stability and uniformity with pre-processing applied to the input data.
The Discriminator has a structured standard convolutional neural network (CNN), tasked with distinctive between real crack masks and those produced by the Generator. It receives input images of 256 × 256 × 3 dimensions and procedures them through multiple strided convolutional layers that increasingly down sample the spatial resolution while increasing feature generalization. Each convolutional layer is followed by Leaky ReLU activation to maintain non-linearity and ensure gradient flow during backpropagation. After the convolutional processing, a flattening layer transforms the feature map into a one-dimensional vector. A Dropout layer with a rate of 0.4 is then applied to decrease overfitting. The final classification is performed using a sigmoid-activated neuron, which outputs a probability indicating whether the input is real or fake.
Stage 2: CrackGAN training & latent space learning
The learning process was an adversarial game between Generator and Discriminator. Discriminator was updated to maximize the probability of distinguishing real crack segmentation masks from the generated ones, and the Generator was updated to maximize the probability of producing segmentation masks similar to real ones. Discriminator’s weights were kept constant in the joint model during generator’s training such that only Generator was updated to enhance its output quality. Adam optimizer with learning rate 0.0002 and momentum 0.5 was utilized. The comparatively low learning rate assisted in ensuring stability when performing adversarial training. Different Binary Cross-Entropy (BCE) loss functions for both Generator and Discriminator. For Discriminator, BCE loss compared predictions to known “truth” values. During training Discriminator, two data sets were used, i.e., actual crack masks that were marked as 1.0, and created artificial masks that are marked as 0.0. This was determined through the equation (1).
Since Generator’s training was more nuanced, its BCE loss was computed through combined model. However, during training of Generator, its generated images were labeled as 1.0 (real), and not 0.0 (fake). The aim of this approach was to deceive the Discriminator by synthesizing fake images that could be classified as real. Generator’s effective BCE loss was calculated using equation (2).
The process created a minimax game where Discriminator attempted to minimize its BCE loss by correctly identifying real and fake image, and Generator attempted to minimize its BCE loss by creating images that maximize Discriminator’s prediction error. The CrackGAN was trained on 8,200 binary crack segmentation masks of varying resolutions which had to be scaled to a width and height of 256 for the model’s requirements. In addition, the images were normalized to a range of [-1, 1] for optimal learning by the Generator.
Stage 3: Latent space projection & direction estimation
The latent space of Generative Adversarial Networks (GANs) is a mathematical high dimensional space in which the Generator transforms randomly sampled latent vectors to image meaningful representations. The CrackGAN model, in this work, utilizes a 100-dimensional latent space that is meant to learn and restore the heterogeneous morphological characteristics of concrete cracks. As much as the latent space is not directly interpretable, it contains embedded implicit patterns learned by the model in the course of training. These structures are not directly present in the training data but become embedded in the organization of the Generator’s acquired relationships. After a latent vector is sampled and passed through the Generator, the result is a generated image that embodies the internalized distribution of crack patterns in the data. Interestingly, the structure and semantics of this latent space vary with each session of training, making the model of the latent space non-deterministic and model-dependent. To proceed with this process, random latent vectors (z-vectors) are sampled from a standard normal distribution a Gaussian with mean 0 and standard deviation 1. These are points in a 100-dimensional hypersphere, which is the space from which the Generator draws seeds for crack generation. An important feature of such a latent space is that it is possible to perform semantic interpolation. By interpolating between two latent vectors, CrackGAN can generate intermediate crack patterns that smoothly connect the two original inputs. This interpolation ability proves particularly useful in predicting future crack evolution and observing how micro-structural evolution could potentially develop with time. For this study, Spherical Interpolation (slerp) was used instead of linear interpolation, giving the smoother shift across the curved surface of the latent hypersphere. This approach enables the latent space not only to be a source of variability but also to be an extrapolation mechanism facilitating extrapolation of crack patterns beyond what has been observed in the original data. A sample illustration of the latent space and sampling process is shown in Figure 5. Latent space hypersphere illustration.
Latent space projection was the process of taking a real image and finding its corresponding point in Generator’s latent space. For accurate representation, an image of a real crack was acquired and it was needed to identify it fits in the latent space. It was a crucial step as Generator had learned to work within this space during its training. Generator understood how to create realistic crack patterns from points in this space. Accurate latent space projection was particularly important as it allowed to understand where the current crack pattern resided in relation to other patterns the model had learned about. In addition, it provided a starting point for predicting how the crack might grow and ensured that when new crack patterns were synthesized, they maintained essential characteristics of the original crack while showing realistic progression. This phenomenon could be achieved with two approaches, GAN inversion and latent space optimization. The first approach used an Inverse Generator (encoder) network as shown in Figure 6. Inverse generator architecture.
It required training a specialized neural network whose job was to identify crack patterns and estimate what latent code would generate them. The encoder was trained by taking many crack images generated by the GAN, passing them through the encoder to get latent codes, then passing those codes through the original Generator and comparing the results with the original images. Through this training process, the encoder learned to map images directly to their corresponding latent codes, as shown in Figure 7. The advantage of this approach was speed once it was trained, the encoder could convert new crack images to latent codes almost instantly. However, it might not always estimate the optimal latent code, especially for crack patterns that are significantly different from what it was trained for. The optimization approach, on the other hand was a systematic trial-and-error process. Instead of training a separate network, it was started with a random latent code and gradually adjusted it to improve the match between the Generator’s output and our target crack image, as illustrated in Figure 8. Inverse generator training procedure. Latent space optimization approach.

This approach consisted of following steps:
Step-01 Generator model and latent space initialization
The process was started with a pre-trained CrackGAN Generator,
The process initialized the latent vector
Step-02 Preprocessing of the input image
The input image
Step-03 Multi-term loss function
A comprehensive loss function was defined to measure the reconstruction error between the input image (2) Structural Similarity Index Measure (SSIM) that quantified perceptual similarity by comparing the luminance, contrast, and structure of image patches through equation (5). (3) Perceptual Loss that measured high-level semantic differences using feature representations extracted from a pre-trained VGG19 network. Given the feature maps
The total loss was a weighted combination of these components was calculated using equation (7).
Step-04 Gradient-based optimization
The latent vector
Step-05 Iterative refinement
The optimization proceeds for a fixed number of iterations or until convergence criteria were met. Early stopping was applied when no significant improvement in the loss was observed for a predefined number of iterations, reducing computational overhead.
Step-06 Selection of the best reconstruction
Given the stochastic nature of the process, multiple attempts were performed with different initializations of
In order to find the direction of crack propagation in the Generator’s latent space, a unit direction vector was created. This was achieved using two methods later in sequence. A synthetic binary crack data was accessed from the Crack GAN Generator. Its instances were labelled manually as ‘crack’ and ‘non-crack’. The samples were then used to train a Residual Network (ResNet) Convolutional Neural Network (CNN) to classify cracks into crack/non-crack classes, as shown in Figure 9. ResNet50 used “residual blocks” or “skip connections” as bridges between different levels or layers. These bridges allowed information to flow more easily from bottom to top. In ResNet50, these bridges were called skip connections, and they let the network learn “residual” functions essentially, they learn what additional information needed to be added to the input to improve the output. The architecture of ResNet50 was divided down into distinct stages. It was consisted of a conventional convolutional layer that was followed by four stages of residual blocks. Each stage contained multiple residual blocks, the spatial dimensions of the data decrease while the number of filters increases in passing through these stages. The architecture of ResNet allowed for gradual distillation of essential features of cracks into a more compact but information-rich representations. In the domain of crack detection, ResNet50 was effective particularly as its deep architecture allowed it to learn features at multiple levels of generalization. At the stage of early layers, it might learn to detect simple edges and textures. Middle layers might combine these to recognize crack segments and patterns. The deeper layers can then have understood complex crack characteristics like branching patterns or crack network topology. The final stage of ResNet50 included an average pooling layer that consolidated the learned features, followed by a fully connected layer that produced the final classification output. For crack detection, this output was a binary decision in form of crack or no-crack along with an appropriate confidence score. When ResNet50 is used for analyzing crack images, it did not just give an answer in form of Yes or No but it provided confidence scores that explained how certain its usage was about the presence and characteristics of the cracks. These scores contained valuable information about crack patterns and their progression. For instance, taking two images of the same crack taken at different times, the change in confidence scores explained something about how the crack evolved. The final stage of crack propagation required the use of a model to generate a line of demarcation. To train Support Vector Machine (SVM), paired vectors were created that combined two pieces of information that were latent space representation of crack images (from our GAN inversion) and their corresponding ResNet50 confidence scores. These paired vectors formed training data, where each pair represented a “before” and “after” state of crack progression. The job of SVM was to learn the relationship between these states and figure out the consistent patterns to know cracks tended to grow in specific pattern. SVM was considered particularly suitable for this task due to its ability to find optimal separating hyperplanes in high-dimensional spaces. The SVM analyzed patterns in paired vectors and determined a unit vector that best represented the typical direction of crack progression. The directional unit vector serves like a “compass” in the latent space. When it was needed to predict how a new crack will propagate, this vector was used to guide movement through the latent space, as represented in Figure 10. By taking steps in this direction and using spherical interpolation, which helped to maintain smooth transitions, a sequence of points was generated that, when passed through the Generator, showed the predicted progression of the crack. ResNet50 architecture. Directional unit vector.

Stage 4: Crack propagation via slerp
Spherical interpolation (slerp) was applied to the propagated cracks in the CrackGAN’s latent space to determine subsequent crack propagations. It was a method for interpolating between two points on the surface of the hypersphere. It was designed to preserve the manifold structure of the hypersphere, ensuring that interpolated points remained equidistant from the origin. Spherical interpolation was applied along with the normal of identified hyperplane. As described above, latent space is a high-dimensional sphere where each point represented a possible crack pattern. When it was needed to generate a sequence that shows how a crack might progress, it was moved smoothly between different points on this sphere.
At this point where spherical interpolation became crucial. For crack pattern generation, on switching from one crack state to another in latent space, the natural curvature of the latent space should be followed. This ensured that all the generated intermediate crack patterns were looking realistic and physically plausible. Mathematically, if two points that were lying on the surface of a unit sphere
These interpolated vectors Iteration of propagations.
In the proposed framework, spherical interpolation generates intermediate crack states along a learned latent-space propagation direction. The interpolation step size influences the temporal resolution of the predicted crack evolution but not the overall propagation trend. Smaller angular steps produce smoother and more finely resolved intermediate crack states; larger steps result in coarser but qualitatively consistent propagation patterns. In this study, a fixed number of interpolation steps, chosen moderate through empirical observation, is adopted to balance visualization smoothness and computational efficiency. Systematic sensitivity analysis with regard to interpolation density and angular step size is recognized as future work.
The directional guidance is necessary to ensure ordered and physically plausible crack state transitions in latent space. Without explicit propagation direction, latent traversal reduces to unguided interpolation, which may yield visually plausible yet non-monotonic variations of cracks that do not preserve the propagation direction, crack length evolution, or stability of the crack tip in a consistent manner. By learning a directional unit vector from ResNet-50 confidence evolution and SVM-based separation of sequential crack states, the proposed framework constrains latent traversal to follow a coherent crack growth trajectory. This directional mechanism is thus crucial for predictive consistency, as opposed to unconstrained image generation.
Stage 5: Experimental validation
Structural Similarity Index Measure (SSIM) (Wang et al.,2004) and Learned Perceptual Image Patch Similarity (LPIPS) (Zhang et al.,2018) were selected as differentiating criterion for evaluating the actual propagation with respect to the predicted propagation. Unlike traditional metrics like MSE (Mean Squared Error) that operates on a pixel to pixel basis, SSIM attempts to model image quality in a way that aligns with human visual perception. It does this by analyzing three key components of human visual perception that are luminance, contrast, and structure. The SSIM calculation looks for these components within local windows of the images being compared, typically having size of 8 × 8 or 11× 11 pixels. For each window, it computes the luminance comparison by looking at the mean intensity, the contrast comparison using standard deviation, and the structure comparison through cross-correlation. These components are then combined multiplicatively to give the final SSIM score, which ranges from −1 to 1, where 1 indicates perfect structural similarity.
Mathematically, SSIM is calculated through equation (11).
In order to develop an Evaluation Metrics for Cracks Propagation Comparison, Images of cracks were captured of an RC slab that was tested in a lab under uniform load. Images before loading and after loading for propagations of cracks were captured from Top surface of the slab and from Bottom as shown in Figure 12. Similar procedure was adopted for an RC beam; its images were captured from front and back as shown in Figure 13. Images of propagations of cracks for Slab and Beam were compared with the propagation that were generated from the proposed method and their respective SSIM and LPIPS scores were measured as shown in Figures 14 and 15. Concrete slab testing (a) unloaded top, (b) loaded top, (c) unloaded back, (d) loaded back. Reinforced concrete beam testing (a) front crack propagation 1, (b) front crack propagation 2, (c) back crack propagation 1, (d) back crack propagation 2. Comparison of crack propagation in RC slab actual versus predicted. Comparison of crack propagation in RC beam actual versus predicted.



As per the observations of Figures 14 and 15, for the Slab, Top Crack 1 had SSIM score 0.9236 and LPIPS score 0.123, Top Crack 3 had SSIM score 0.9554 and LPIPS score 0.0613, Bottom Crack 3 had SSIM score 0.9388 and LPIPS score 0.1069, Bottom Crack 4 had SSIM score 0.9266 and LPIPS score 0.1295. Similarly, for the beam, Front Crack 1 had SSIM score 0.9485 and LPIPS score 0.1014, Front Crack 2 had SSIM score 0.9677 and LPIPS score 0.0796, Back Crack 2 had SSIM score 0.9417 and LPIPS score 0.1557. For all the cracks, SSIM scores are greater than 0.9 and LPIPS scores are lesser than 0.2 that endorsed the accuracy of the propagation of cracks predicted by the developed methodology. The results in Figures 14 and 15 were produced using the gradient-based latent space optimization approach. In particular, this approach was chosen for producing the results because of the higher reconstruction fidelity and the more accurate alignment between real crack images and their corresponding latent representations, crucial for the reliable prediction of crack propagation. Encoder-based inversion was considered as an alternative approach since it allowed much faster inference speed; however, its reconstruction accuracy was significantly lower for crack patterns far from the generator’s dominant distribution of training. Therefore, encoder-based inversion is more suited for fast approximation, while gradient-based optimization gives higher accuracy with much larger computational cost. A systematic quantitative comparison between the two approaches will be a point of future work, especially for applications that demand a trade-off between speed and reconstruction precision.
Physically interpretable crack geometry metrics
Crack geometry metrics.
Computational cost and efficiency analysis
Hardware configuration.
Training time and memory requirements.
Conclusion and recommendations
The methodology developed in this study for predicting the propagation of cracks represents a novel approach to assessing the structural health of infrastructure based on its current condition. The predictive visualization of crack propagation achieved through the integration of generative modeling provides a comprehensive understanding of how cracks may evolve over time capturing variations in their path, length, and width. This capability is highly beneficial for developing data-driven infrastructure maintenance strategies and supports informed decision-making, eliminating the need for speculative assessments regarding a structure’s future state. The methodology was validated through a detailed experimental study, which demonstrated its potential for real-world application in solving maintenance challenges within the civil infrastructure domain. However, the efficacy of deep learning (DL) algorithms, including those used in this research, is significantly influenced by two key factors: the volume and diversity of training data and the availability of computational resources.
This study utilized an existing dataset comprising approximately 8,200 crack images. While the results were promising, the accuracy and generalizability of the model could be further improved with larger, more diverse datasets, which would, in turn, require higher computational power. Moreover, the current study focused exclusively on the propagation of single, unbranched cracks. For future work, it is recommended to expand the model to include multiple classes of cracks such as branched, intersecting, or networked cracks for broader applicability in real-world scenarios. Additionally, latent space optimization techniques should be further refined to allow more manual control and semantic guidance during the prediction process, enabling more realistic and interpretable visualization of crack evolution.
In summary, this study lays the foundation for an intelligent, data-driven crack monitoring and forecasting system. With further improvements in data availability, model robustness, and latent space control, the proposed methodology has the potential to transform current infrastructure maintenance practices through predictive, non-invasive assessment strategies.
The quantitative evaluations demonstrate the robustness of our model in making high-fidelity predictions with an SSIM >0.90 and an LPIPS perceptual disparity of <0.15 on a variety of slab and beam examples. These are clear signs of a good fit between our predictions and the results can be verified in the lab. The geometrical tests support this finding as well. Cracks are generated in a physically meaningful manner with a monotonic increase in the total crack length, a maximum width around the tip regions of the cracks, and the tip directions of the cracks pointing in the same directions in both the predicted and actual crack geometries. With a concentration on predicting a single, unbranched crack with a fixed direction of crack propagation serving as a qualitative indicator for a good fit with the major tensile directions, the research maintains the stability of latent space learning at a low computational expense with a clear interpretation of the results.
The training dataset will feature around 8,200 images of cracks sourced from multiple databases, with both real and simulated images mixed together. Harmonization of dataset from multiple sources was ensured through consistent preprocessing, resolution normalization, and controlled sampling to mitigate domain bias. Empirically, the model demonstrated stable convergence and consistent predictive behavior across cracks originating from both real-world and synthetic datasets, as reflected by uniformly high SSIM and low LPIPS values during experimental validation. These observations suggest that the proposed CrackGAN framework is not overly sensitive to dataset source, but rather benefits from dataset diversity by learning a more generalizable latent representation of crack morphology. Furthermore, future research can also look to explore systematic ablation and sensitivity analysis with more dataset to assess the robustness, and thus scalability of the model.
Despite the images being from various sources with logically significant domain bias due to the nature of the images, all images were normalized, and then sampling was controlled to reduce this domain bias and ensure the model has an input space that covers the expected domain variation in the real-world deployment environment. The model behaves well during the experiment with stable results and normal model behavior regardless of the source of the cracks and demonstrates that image variety within the dataset provides a better space for the model to represent the expected structure of the images better in the hidden space.
All images of cracks were resized to the fixed 256 × 256 pixels. This maintains consistency between the Pix2Pix and CrackGAN model architectures and makes the adversarial training feasible without overloading computing resources. The higher the image resolution, the more detailed crack features and longer spans are represented; however, this requires more memory and still can destabilize GAN training. To avoid losing spatial accuracy, each crack was treated separately and scaled down to the same normalization before processing; that way, the most geometrically important characteristic traits remain intact for the propagation of modeling. It looks like patch-based, multi-scale, or higher-resolution latent modeling are promising avenues for capturing longer or more intricate crack patterns in further work.
Future extensions of the developed framework will focus on modeling with branching and interacting cracks by including graph-based crack depictions and multi-path latent space traversal, that will enable prediction of more complex crack topologies which are commonly observed in aged reinforced concrete structures. Future work can also investigate the sensitivity of latent-space crack propagation to interpolation density and angular step size to sustenance adaptive resolution selection.
Footnotes
Acknowledgements
The support provided by the Department of Civil Engineering and Virtual Reality Center at NED University of Engineering & Technology, Karachi, Pakistan is duly acknowledged by the authors.
Ethical considerations
Authors confirm that this paper or, any part of this paper has not been submitted or, published elsewhere in any form. Human and/or animal subjects were not involved in the research work reported here and Ethical approval was not needed.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the Higher Education Commission (HEC), Pakistan [CPEC-Project # 178].
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The authors confirm that the data supporting the findings of this study are available and/or referenced within the article.
Declaration of generative AI and AI-assisted technologies in the writing process
No Generative AI and AI-assisted technologies have been used in the writing process.
