Abstract
To improve the quality of printed concrete structures, more refined and efficient detection methods are needed for construction monitoring. This paper proposes a target detection model for quantifying the extrudability and buildability of printed concrete. This model combines the squeeze-excitation attention mechanism with the YOLOv8 target detection model, thereby enhancing the target detection capability. The quantification of extrudability is achieved by detecting the number and size of two common defects in the concrete printing process: cracks and notches. The quantification of buildability is achieved by calculating the overall height deviation of concrete printing based on the height of the extrusion height detection box. Within the investigated case, detection results show that the proposed model improves the mean average precision (mAP) by about 0.15 compared to the original YOLOv8 model in the detection of cracks, notches, and extrusion height, reaching 0.94. Most inference times are under 39 milliseconds per image, demonstrating real-time detection capability. For extrudability, detection relative errors for notch widths within 1.5 mm are generally controlled within 10%. For buildability, underprinting and overprinting states can be determined based on the overall height deviation in concrete printing. The proposed method overcomes the problems of low real-time performance and difficulty in quantifying printing status in previous concrete 3D printing.
Introduction
3D concrete printing (3DCP) is an additive manufacturing technology that can design and print complex geometric shapes of buildings (Paul et al., 2018). This technology offers advantages such as speed, cost-effectiveness, and environmental friendliness, and can achieve automated construction of buildings without formwork (Liu et al., 2022). Currently, the most common 3DCP method is layer-by-layer extrusion, which decomposes a three-dimensional model into multiple horizontal layers and extrudes concrete layer by layer to construct the structure (Nair et al., 2022). Recently, successful applications of 3DCP technology have included the construction of bridges, offices, and other buildings (Abou Yassin et al., 2020; Lim et al., 2018). However, as the scale of 3DCP augmentation expands, the demand for precision in 3DCP increases, and construction monitoring challenges become more severe (Rill-García et al., 2022).
Surface defects and deformations of 3DCP structures are usually triggered by factors like the rheological properties of concrete, printing parameters, and material elasticity, ultimately influencing the extrudability and buildability (Arunothayan et al., 2023). Extrudability is assessed by the continuity and level of defects on the surface of printed filaments. If rheological properties are substandard, it manifests as surface defects like cracks and tears; if the print parameters for the filament curvature radius are adjusted too small, it may also cause surface defects (Wan et al., 2022). These issues weaken structure durability and increase failure risks, such as material plastic yielding or structural elastic buckling (Chang et al., 2022b). Buildability is assessed by measuring the deformation in the print height direction. Parameters such as printing path, layer height, and filament width affect the dimensional accuracy of the final printed filament (Panda et al., 2019; Roussel et al., 2020). The gravity load of the upper concrete also causes compression deformation of the lower layer, which is more pronounced at high printing speeds when concrete elastic modulus growth is insufficient (Sanjayan et al., 2021). When 3DCP is applied to prefabricated component production, strict dimensional inspections are necessary. Therefore, during 3DCP, the extrudability and buildability of concrete structures should be monitored and evaluated in real time.
In recent years, the development of structural health monitoring (SHM) and construction process detection systems has become a focal point in research for important infrastructure fields such as dams, bridges, and buildings (Davtalab et al., 2022). In 3DCP, researchers employ various emerging technologies to implement printing processes and post-printing automatic inspection systems. For instance, Kazemian et al. (2019) proposed a method based on image processing technology that uses edge detection algorithms to acquire the contour information of filaments in real time and measure their width precisely. Davtalab et al. (2022) applied deep convolutional neural networks for semantic segmentation to separate the background of concrete printing and inspect the printing defects using edge detection algorithms. Kazemian et al. (2021) discussed four real-time monitoring techniques for 3D printing: computer vision, power consumption measurement, extrusion force measurement, and resistivity measurement, among which computer vision is the most reliable and accurate technique. Shojaei Barjuei et al. (2022) monitored the width of concrete filaments printed by robotic arms in real time using edge detection algorithms and employed proportional integral controllers to control the width of the printed filaments. Chang et al. (2022) utilized U-net convolutional neural networks to establish the relationship between 3DCP crack morphology and microstructure, and predicted crack morphology and stress-crack width curves. Nair et al. (2022) used a three-dimensional scanner to obtain point clouds and quantified the degree of mismatch between concrete design and actual printing by morphological analysis. Rill-García et al. (2022) based on U-VGG19 neural networks captured 3DCP interlayer lines and calculated layer thicknesses, as well as using CNNs to classify concrete dry and wet textures.
The aforementioned methods can monitor the size and surface defects of concrete to evaluate its printing quality. A further summary shows that: • Computer vision methods are reliable and accurate for detecting 3DCP. • Detection mainly involves determining the width of the printed filament, recognizing defects and textures. • Early computer vision methods for detecting 3DCP primarily relied on edge detection, and later detection applications based on artificial intelligence methods (AI) have become more prevalent, such as image classification and semantic segmentation approaches.
To address the problem of defect localization and detection in concrete 3D printing, some methods have been proposed to roughly determine the defect region. However, accurately locating and quantifying defect sizes in real-time remains challenging, making it difficult to comprehensively fulfill the quantification of extrudability and buildability during the printing process. In recent years, target detection methods, as a deep learning-based AI approach, have been widely applied in various fields, such as face detection (Ming et al., 2022), bridge detection in remote sensing images (Sun et al., 2022), and road feature detection (Huang et al., 2022). These methods locate the position of the object within a rectangular bounding box and demonstrate strong adaptability and accuracy, with excellent performance in practical applications (Qiu et al., 2023). Therefore, target detection methods are expected to become a powerful tool for addressing the quantification issues of extrudability and buildability in 3DCP.
This paper proposes a target detection model embedded with an attention mechanism to achieve real-time detection of the extrudability and buildability of concrete printing. Based on the target detection method YOLOv8 (YOLOv8, 2023), this study introduces the squeeze-and-excitation (SE) attention mechanism (Hu et al., 2018) and proposes the SE-YOLOv8 model. This model can effectively utilize the weights of different channels in the feature map, enhance feature expression, and improve target detection performance. Then, the extrudability is quantified by the SE-YOLOv8 model, which detects the number of cracks and the width of the notches. The buildability is quantified by the overall printing height deviation calculated from the extrusion height detection box.
The detection framework for this paper is shown in Figure 1. The methods of the SE-YOLOv8 model and the model evaluation methods are introduced, and then the training data processing is explained. Subsequently, the model performance metrics are analyzed. Then, the extrudability and buildability of concrete printing are quantified. Finally, the main contributions of this study are summarized, and future improvements are proposed. Detection framework.
Methodology
Proposed SE-YOLOv8 model
YOLOv8 model
The YOLOv8 model is an advanced target detection algorithm consisting of three parts. The first part is the backbone network, which extracts features from the input image. The second part is the neck, which fuses different levels of features to enhance feature expression. The third part is the head, which predicts the category and location of the target based on the fused features. The structure of the YOLOv8 model (without the attention part) is shown in Figure 2. The output dimension of adjacent modules is marked only once if it is the same. More details on the model structure are in MMYOLO’s documentation (2023). Model architecture.
The YOLOv8 model has multi-scale resolution capability, processing different sizes of feature maps through three detection heads. The feature map size gradually decreases from 80 × 80 to 20 × 20 from top to bottom, as shown in Figure 2. This is conducive to detecting various sizes of targets. However, as the feature map size decreases, the channel numbers increase and the YOLOv8 model lacks a mechanism to effectively utilize these rich features. To address this problem, this paper proposes the SE-YOLOv8 model, which can fully exploit channel feature information.
SE-YOLOv8 model
To effectively utilize the feature information obtained from visual resources, the channel features are weighted according to the principle of the attention mechanism. This paper proposes the SE-YOLOv8 model, which incorporates the SE mechanism into the YOLOv8 model. The SE attention mechanism is an approach that introduces the attention mechanism in the channel dimension, which can adaptively adjust the feature response of each channel and effectively enhance feature representation ability and classification performance (Hu et al., 2018).
The mechanism principle is shown in Figure 3, which includes two main operations: squeeze and excitation. The squeeze operation compresses the H × W dimension of the input feature map to the 1 × 1 dimension using global average pooling. The excitation operation generates weights for each channel, using 2D convolution and the sigmoid activation function to normalize channels into weight coefficients between 0 and 1. Finally, the weight coefficients are multiplied by the original input to obtain the feature map with channel weighting. The SE attention mechanism.
This paper adds the SE attention mechanism between the head and neck parts, as shown in the attention part of Figure 2. The SE mechanism does not change the dimension of the data passed from the neck part to the head part, only the weights of channels. Therefore, feature maps with 128, 256, and 768 channels are processed by the SE attention mechanism, effectively extracting channel features.
Loss function
The loss function is key to optimizing neural networks, guiding the direction of parameter optimization, and in both the YOLOv8 and SE-YOLOv8 models, it comprises two parts: classification and regression. The classification part aims to differentiate different categories, adopting binary cross-entropy loss as its loss function. Its formula is as follows:
The respective loss is defined by the sigmoid operator σ as:
The regression part uses two loss functions: distribution focal loss (DFL) and complete IoU (CIoU) loss. DFL represents the position of the bounding box as a general distribution, enabling the network to focus on candidate regions close to the real bounding box and thereby improving regression accuracy. Specifically, DFL achieves this by increasing the probability of
Parameters
CIoU loss considers not only the intersection over union (IoU) between the predicted box and the ground truth (GT) box but also introduces two geometric factors: center distance and aspect ratio, thereby improving the localization accuracy of the bounding box. Figure 4 shows the calculation method of IoU, which is the intersection area of two rectangular boxes divided by the union area. IoU calculation for two bounding boxes.
The formula of CIoU loss is (Zheng et al., 2020):
The final loss function is obtained by weighted summation of the above three losses according to a certain weight ratio. In this paper, weight values recommended by the YOLOv8 model were set to balance the influence degree of each task, as follows: the weights of
Evaluation indicators
The evaluation metrics for model prediction commonly include precision, recall, and the trade-off metric of average precision (AP). The basic elements of these indicators are as follows: • True Positive (TP): Positive samples are correctly classified as positive. • True Negative (TN): Negative samples are correctly classified as negative. • False Positive (FP): Negative samples incorrectly classified as positive (false alarm). • False Negative (FN): Positive samples incorrectly classified as negative (missed detection).
Precision indicates the percentage of correctly predicted positives in the total predicted positives, measuring the overall accuracy of the prediction. The formula is as follows:
Recall reflects the proportion that the algorithm can correctly detect to GT boxes. The formula for recall is as follows:
The higher the recall, the less likely it is for the model to miss GT boxes. However, there is often a trade-off between recall and precision; that is, improving recall may reduce precision, and vice versa. Therefore, these two indicators need to be considered comprehensively, and AP is introduced as a basis for balancing them. AP is the area under the precision-recall (PR) curve. The PR curve is drawn as follows: Change the confidence threshold from 1 to 0, calculate the precision and recall corresponding to each confidence threshold, and plot points and lines with recall as the horizontal axis and precision as the vertical axis.
According to the PR curve, the formula for the AP indicator is:
The mean average precision (mAP) indicator, which is the average value of the APs of all categories, can be obtained by synthesizing the APs of various categories. The formula is as follows:
Results processing
In the target detection task, two situations are commonly encountered: (1) some generated bounding boxes have poor quality; and (2) some targets are covered by multiple prediction boxes. To address these issues, the former typically sets a confidence threshold to filter out prediction boxes with low confidence, and the latter typically adopts the non-maximum suppression (NMS) algorithm, as shown in Figure 5. The basic idea of this algorithm is: for multiple prediction boxes in the same category, keep the one with the highest confidence and remove other prediction boxes that highly overlap with it. The specific steps are as follows: (1) The bounding box with the highest confidence from a set of boxes A in a certain category is selected and retained in set B. (2) The IoU of this prediction box and the remaining boxes in set A is calculated. (3) If IoU is greater than the set IoU threshold, the prediction box is deleted from set A. (4) Repeat the above steps until set A is empty. Redundant bounding box filtering process.

The final result after processing consists of the target detection boxes of set B, whose number was affected by the IoU threshold set in the NMS algorithm. The lower the threshold, the fewer the results retained.
Data processing
The image acquisition device is shown in Figure 6, where the mounting bracket is fixed on the printing nozzle and follows the movement of the printing nozzle. The capture area includes the printing nozzle and several layers of print filaments below. The captured images used are stored on GitHub as a public 3DCP dataset (3DCP CV Monitoring, 2022). The details of the experimental setup and concrete material can be found in the reference (Rill-García et al., 2022). The dataset consists of 627 images of concrete surfaces taken under different extrusion conditions, with high complexity and diversity. Here, 60 images were randomly extracted from the dataset and manually annotated with bounding boxes for the three target types: concrete crack, notch, and extrusion, using LabelImg software (LabelImg, 2018). Photo shooting scene.
Detection targets
The detection targets in the 3DCP construction process are concrete cracks, notches, and extrusion, as shown in Figure 7. Cracks and notches are two common defects on the concrete surface. Cracks can occur anywhere in the concrete layer, whereas notches only occur on the upper part of the concrete layer. They have different geometric characteristics: cracks are nearly linear and have a darker gap color, while notches are V-shaped and taper from top to bottom. The identification of cracks is easily disturbed by the complex and variable texture of the dry zone of concrete. Therefore, when manually annotating, only those cracks that are clearly distinguishable from textures could be selected as targets, and errors or uncertain information should be avoided as much as possible from being incorporated into the detection model. Detection targets: cracks, notches, and extrusion.
The extrusion target includes the state of the printing nozzle extruding and printing operation, which has obvious geometric features. Figure 8 illustrates these geometric features of the extrusion: the two red circles indicate the lower left and upper right corner points of the extrusion box. The lower left corner point is where the current printed layer and the previous layer contact most recently. The upper right corner point is where the printing nozzle and the extruded concrete on the printing direction side intersect at the edge. Geometric characteristics of extruding.
Data augmentation
The quality of target detection is affected by factors such as illumination and camera shake. To mitigate these issues, data augmentation methods can be adopted. Data augmentation is a common approach for enhancing the performance of deep learning models. It transforms the original training data to increase the diversity and scale of the dataset. Data augmentation can effectively decrease the model’s reliance on irrelevant features, strengthen the model’s ability to identify key features, mitigate the overfitting problem caused by data scarcity, and improve the model’s generalizability. Computer vision extensively applies and studies data augmentation. This paper employs several effective image data augmentation methods to process the 3DCP dataset.
Image HSV model
The HSV color model converts the RGB color space into three components: hue, saturation, and value. In this paper, during training, the hue, saturation, and value of each image are randomly altered within ±10%, ±20%, and ±40%, respectively. Through this approach, the model’s dependence on the color of images is reduced.
Scaling and translation
Scaling can increase the model’s recognition ability for objects of different scales, and moving can improve the model’s ability to recognize objects in different locations. In this paper, during training, each image was randomly scaled and moved within ±10%, respectively. This mitigated the effect of object displacement due to camera shake during printing.
Label smoothing
Label smoothing is a regularization strategy that softens the original labels, rendering the class probabilities in classification tasks no longer 0 or one but values approaching 0 or 1. The goal is to decrease the impact of the true sample label category when computing the loss function, thereby suppressing overfitting and effectively enhancing the model’s generalizability and robustness (Zhang et al., 2021). The category probabilities without label smoothing are:
Mosaic
This approach is to concatenate four images into a new image by randomly cropping, scaling, and arranging them, and then adjusting the corresponding GT box coordinates according to the positions of the four images post-concatenation. Mosaic offers several benefits: it enables the model to adapt to various detection backgrounds, handle objects with different scales and locations, achieve better convergence with a smaller batch size, and improve the performance of small object detection (Chen et al., 2022).
Mixup
Mixup employs linear interpolation to generate new images and adjust label information. It combines two images based on randomly sampled weight ratios and adjusts the bounding box locations and class probabilities accordingly. Its formula is as follows (Liang et al., 2018):
Computational setup
The computing environment of this paper comprised a computer with an AMD Ryzen 7 5800H processor, an NVIDIA GeForce RTX 3060 GPU, and 16 GB of system memory, running the Windows 10 operating system. GPU acceleration was utilized for both training and testing, and the Pytorch software (PyTorch, 2022) was employed as the development platform.
This study chose the YOLOv8s model variant for experiments. YOLOv8 has multiple variants, including YOLOv8n, YOLOv8s, and YOLOv8x, which differ in parameter size and output speed. The parameter size increases and the output speed decreases in turn. These variants have similar structures, but they adjust the channel number and bottleneck layer number of the model to control the parameter size and depth of the model. During training, the number of iterations was set to 600, the batch size was set to 4, the optimizer was selected as Adam, and the learning rate was set to
Model training
The experimental dataset consisted of 60 images, divided into training, validation, and test sets in an 8:1:1 ratio. The model used the training set to optimize its parameters, the validation set to adjust its hyperparameters, and the test set to evaluate its generalization ability on unknown data.
To obtain the optimal training model, the recall for the validation set was calculated at each iteration during training, and the model with the highest recall was selected as the training result. Recall is the ratio of positive examples correctly identified by the model to all positive examples, reflecting the model’s ability to retrieve information comprehensively. Compared with precision, which represents the ability to check accurately, this paper prioritizes the ability to retrieve defects and aims for the model to find as many positive samples as possible. The method for saving the model with the highest recall rate was as follows: after each iteration, the current model parameters were applied to detect the validation set and calculate the corresponding recall. Then, the current recall was compared with the previously saved maximum recall, and if the current recall was higher, the maximum recall was updated and the current model parameters were saved. In this way, when the training was finished, a model with the highest recall on the validation set was obtained as the training result.
According to the principle of maximal recall and no redundant bounding boxes, the hyperparameters of the validation set were optimized; among these, the confidence threshold is 0.35 and the IoU threshold is 0.45, which are the key parameters for result processing. This set of parameters is also used for the test set.
Model performance
Model indicators
To assess the influence of the channel attention mechanism on model performance, this section initially compares the loss changes during the training process of models with and without this mechanism, then calculates the recall, precision, and other indicators of both models on validation and test sets, demonstrating the advantages of the new model in enhancing performance.
Figure 9 illustrates the loss fluctuations of models with and without the channel attention mechanism during the training process. The upper three labels signify the YOLOv8 model without the channel attention mechanism, and the lower three labels represent the SE-YOLOv8 model with the channel attention mechanism. It can be found that: (1) A comparison of (2) A comparison of (3) A comparison of Comparison of training loss between YOLOv8 and SE-YOLOv8 models.

This indicates that the model with the channel attention mechanism is superior to the original model in three kinds of losses, with the most significant enhancement observed in the classification loss.
The next step is the computation of the model indicators. Figure 10 shows the comparison of indicators between the YOLOv8 model and the SE-YOLOv8 model under different data sets. The upper four labels signify the AP of the original YOLOv8 model under different categories, and the lower four labels represent the AP of the SE-YOLOv8 model under different categories. Model indicators on different data sets. (a) Validation set. (b) Test set.
Figure 10(a) illustrates the comparison of the two models on the validation set. It can be seen from Figure 10(a) that the AP values of all categories except the extruding category have been improved, among which the notch category has the largest increase in AP value, ascending by 0.372. Furthermore, the SE-YOLOv8 model also has a higher mAP than the original YOLOv8 model on the dataset by 0.154. These consequences indicate that the SE-YOLOv8 model can efficaciously improve the overall performance of object detection. However, the contribution of the SE-YOLOv8 model to diminishing the training loss is not substantial, as the channel attention mechanism increases the complexity and computation of the model, rendering the training process more difficult.
The comparison of different models on the test set is illustrated in Figure 10(b). Similar to the validation set, the SE-YOLOv8 model has the largest enhancement in AP for notches, increasing by 0.327. However, the AP for cracks has slightly declined, from 0.863 to 0.830. This is because the precision of the SE-YOLOv8 model is lower than that of the YOLOv8 model when recall is between 0.3 and 0.7, resulting in a decrease in the area under the PR curve. But when recall is between 0.7 and 1, the precision of the SE-YOLOv8 model is higher than that of the YOLOv8 model; in this case, the performance of the SE-YOLOv8 model is still superior to that of the YOLOv8 model. Overall, the SE-YOLOv8 model has a significant improvement in AP for most categories, particularly for notches. This indicates that the SE-YOLOv8 model can effectively utilize the attention mechanism to improve target detection ability.
Analysis of misses and false alarms in detection
The AP metric can assess the performance of a model in a general manner. However, it fails to indicate missed detections and false alarms. These limitations are illustrated by the confusion matrix. Figure 11 shows the confusion matrixes for the YOLOv8 model and the SE-YOLOv8 model on the test set. The values on the predicted background represent missed detections; the values on the main diagonal represent recall; other values represent false alarms. The sum of the values in the true category equals 1. Confusion matrices for detected targets using different models. (a) YOLOv8. (b) SE-YOLOv8.
As shown in Figure 11(a), the YOLOv8 model produces missed detections for the extruding target, false alarms for the crack target, and both types of errors for the notch target. The extruding target exhibits the highest rate of missed detections. Figure 11(b) shows that the SE-YOLOv8 model produces only false alarms for the crack target among all targets. Neither model generates inter-category false alarms. Comparison shows that the SE-YOLOv8 model can effectively decrease both missed detections and false alarms.
Inference time
This paper evaluates the responsiveness performance of the proposed model in real-time object detection tasks by analyzing 627 images in the dataset and recording the inference time of each image. To display the distribution of inference time more intuitively, this paper adopted a violin plot. The violin plot combines box and kernel density plots, displaying statistical data and illustrating the overall distribution of data, especially for multi-peak distribution data. In Figure 12, the blue violin plot on the left shows the distribution of inference time, and the orange violin plot on the right shows the distribution of frames per second (FPS). FPS is 1000/inference time. Inference time distribution.
Figure 12 presents the quartiles, delineated by a middle dashed line, as well as upper and lower straight lines. The wider parts of the figure denote higher probabilities for detection time to fall within those intervals. Overall, the inference time and FPS have a multi-peak distribution, not the common Gaussian distribution. The left half of the figure reveals that the widest part is around the 75% percentile, which is 39 ms, corresponding to the widest part of the right half of the figure at 25 FPS. This indicates that most of the detection time is at least 25 FPS. Therefore, this model has a fast detection speed and can satisfy the requirements of real-time monitoring.
Extrudability quantification results
The extrudability of concrete printing is quantified in this section by utilizing the number of cracks and width of notches based on the information from bounding boxes. The number of cracks is equal to the number of corresponding bounding boxes, and the width of the notch is equal to the width of the corresponding bounding boxes. Given that the conversion ratio of image pixels to sizes is 40 px/mm, this enables the conversion of pixel numbers from these bounding boxes to actual sizes.
The detection of cracks and notches is shown in Figure 13. The blue bounding box detects cracks, and the red bounding box detects notches. The number next to the category label of both bounding box types denotes their confidence value; the number adjacent to the right side of the red bounding box represents its width. With a total of 23 blue boxes, it can be inferred that there is an equal number of cracks, which is 23. The width values next to the red bounding boxes range from 0.7 mm to 2.4 mm. These are the quantifiable indicators of the extrudability of concrete printing. Detection of notch width.
The model may generate a bounding box with a position deviation, as the notch’s geometric features are unclear. This paper evaluates the accuracy of notch width detection by manually measuring the pixel number and converting it to size. The notch width in Figure 13 was used for this purpose. Figure 14 shows the relative error of the measurement results. It can be seen that if the notch width is less than 1.5 mm, the relative error essentially stays within the range of 10%. However, when the notch width is greater than 1.5 mm, the maximum relative error reaches nearly 30%. A larger relative error is presented in the latter because the larger notch has a more complex shape and unclear boundaries, thus making monitoring accurately challenging. Therefore, the model effectively detects notches with a width of 1.5 mm or less. Notch width detection error.
Buildability quantification results
The overall height deviation of concrete printing can be conveniently monitored by measuring the height of the extruding target box. The top of the extruding target box aligns with the extrusion nozzle, and the bottom aligns with the interlayer line of the topmost concrete layer, as shown in Figure 15. Figure 15(a) shows an ideal model without settlement, where the height of the target box is the same as the design printing height. The design printing height is the design distance between the nozzle and the bottom of the current printing layer. Figure 15(b) shows a situation where a height deviation exists. Since the printing head height remains unchanged, the deviation occurs at the bottom of the target box, so the height of the target box equals the sum of the design printing height and the height deviation. Schematic of print height deviation. (a) Ideal model. (b) Model with height deviation.
The formula for printing height deviation follows:
The expected height of the printed filament layers was 6 mm, according to the dataset information. However, the actual layer height ranged from 4.5 mm to 7.5 mm due to the intentional adjustment of the printing parameters, which caused a deviation in the overall printing height. This deviation can be quantified by the height of the extruding bounding box, as presented in Figure 16. The bounding box height, shown to the right of the box, has a unit of mm. Using equation (16), the printing height deviation is calculated from Figure 16(a) as −0.12 mm, which is a negative value, indicating that the actual printing height exceeds the design value; the printing height deviation is calculated from Figure 16(b) as 0.13 mm, which is a positive value, indicating that the actual printing height falls short of the design value. Print height deviation detection. (a) Under-printing. (b) Over-printing.
Conclusions and future work
This study proposes a SE-YOLOv8 target detection model that introduces the SE channel attention mechanism. It can monitor the extrudability of concrete printing by detecting two types of defects: cracks and notches, and quantify the buildability by calculating the printing height deviation, which is determined by the number and size of the bounding boxes generated by object recognition. All these quantifications have real-time characteristics. The main conclusions of the case study are as follows: (1) The proposed SE-YOLOv8 model exhibited high detection performance, with a mAP of about 0.94. Compared with the original YOLOv8 model, the SE-YOLOv8 model significantly improves the detection metrics, with an increase of approximately 0.15 in mAP and the highest improvement in AP for notches. The SE-YOLOv8 model also effectively reduced missed detections and false alarms. (2) The proposed model, demonstrating real-time recognition capabilities, exhibits a multi-peak distribution in recognition time, with most recognition times exceeding 25 FPS. (3) For extrudability quantification, the relative error of width detection is mostly within approximately 10% when the notch width is less than 1.5 mm, suggesting a satisfactory detection performance. (4) The buildability quantification method can calculate the total height deviation of all the concrete layers below the pre-printed layer, enabling the assessment of whether the actual overall height of the printed concrete is lower or higher than the design value.
The case study demonstrates that the proposed method enables real-time recording of the size and position of product defects. This record can serve as a quality report for both the printing process and post-printing, providing evidence for product acceptance or rejection based on the specific printing quality. Depending on the severity of the defects, the operator can also correct or stop printing before completion to avoid material waste.
Future research directions could be extended to two aspects: quantitative rating and control. For quantitative rating, it can be explored how to divide print quality grades based on the number of cracks, notch width, and printing height deviation. While this study quantifies extrudability and buildability using these indicators, it does not establish corresponding quality evaluation criteria. A feasible approach might be to establish a relationship model between these indicators and structural quality by combining the mechanical properties and failure modes of 3DPC. This would provide a basis for 3DPC printing quality evaluation and early warning.
In terms of print quality control, future studies can investigate how to perform closed-loop control based on printing size deviation to ensure product size accuracy. Based on the dimension deviation detection in this study, a closed-loop control algorithm based on size deviation feedback can be designed to realize real-time adjustment of printing parameters and ensure product size accuracy.
Footnotes
Acknowledgments
The authors wish to acknowledge supports from the National Natural Science Foundation of China (52127814, 52078280) for the work reported in this paper.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (No. 52127814 and 52078280).
