Abstract
Accurate industrial defect detection is crucial for industrial quality control, yet challenges such as complex backgrounds and small targets hinder performance. This study presents DEMA-YOLO, an efficient neural network based on the YOLOv10 architecture, integrating a dual-stream edge detail enhancement module and a multiscale attention mechanism. These components enhance feature representation and fine-grained perception. An improved NWD loss further stabilizes small object detection. Extensive experiments on PCB board, NEU-DET, and mixed-type WM38 datasets show that DEMA-YOLO achieves mAP scores of 93.9%, 90.5%, and 98.7%, respectively, outperforming YOLOv10s by 6.7% and 0.9% on PCB and NEU-DET. In the mixed-type WM38 dataset, while accuracy is comparable, DEMA-YOLO reduces parameters by 0.3M and increases the inference speed by 5.8 FPS. Inference speeds reach 119.2, 112.0, and 134.2 FPS on the three datasets, respectively. These results demonstrate the model’s effectiveness and efficiency in deep learning-based computer vision for industrial defect detection.
Introduction
Industrial products are foundational to construction, infrastructure, and various manufacturing sectors, playing a critical role in ensuring sustainable development and driving economic growth (Hassan et al., 2024; ShiLong et al., 2023; Sun et al., 2024). However, surface defects—caused by limitations in production cost, equipment, and process stability—are prevalent in items such as steel plates, printed circuit boards, and wafers. These defects significantly compromise product quality, safety, and reliability. To prevent defective products from entering the market, effective surface defect detection across all stages of production is imperative.
In real-world manufacturing environments, challenges such as variable lighting, equipment inconsistencies, and material differences contribute to the diversity and complexity of defect types. Traditional visual inspection methods, particularly manual inspections, suffer from low efficiency, high false detection rates, and long-term health risks to human operators (Li et al., 2024; Zhang et al., 2023). These limitations highlight the urgent need for automated, robust, and scalable defect detection solutions.
With advances in machine vision, the field has shifted from manual inspection to algorithm-driven approaches. Machine learning methods, including support vector machines (SVM), random forests (RF), and K-means clustering, rely on handcrafted features such as texture, shape, and colour. For example, Damira et al. (2024) proposed a hybrid method combining non-destructive ultrasonic testing and machine learning to detect bonding defects in composite materials. While effective in specific scenarios, the generalization ability of these approaches is constrained by their reliance on manually engineered features, making them less adaptable to the variability of real-world defect patterns (Lin et al., 2024).
In contrast, deep learning, particularly convolutional neural networks (CNNs), has demonstrated remarkable success in visual recognition tasks (Nagata et al., 2020). CNN-based detectors can learn hierarchical and abstract representations from raw data, thus improving detection accuracy and reducing the reliance on domain-specific feature design. Numerous studies have applied deep learning to industrial defect detection (Haigang et al., 2023; Hongxin et al., 2024; Ming et al., 2024; Zekai et al., 2023), achieving notable performance improvements. However, many of these methods sacrifice the speed of inference for precision (Feifan et al., 2024; Shuwen et al., 2023; Yuanyuan et al., 2024; Zhuxi et al., 2023), making them unsuitable for real-time deployment in high-throughput industrial scenarios (Kim et al., 2024).
Therefore, it is crucial to design a detection algorithm that balances real-time inference speed with high accuracy, while also being lightweight enough for deployment in resource-constrained environments. To address these challenges, we propose DEMA-YOLO, a lightweight and efficient industrial surface defect detection model based on YOLOv10 (Wang, 2024). The main contributions of this work are as follows.
In order to improve the detection efficiency and accuracy of the model, we developed an efficient industrial defect detection model, termed DEMA-YOLO, which is based on YOLOv10. This model significantly reduces the number of parameters and enhances inference speed while maintaining a high level of detection accuracy. Additionally, we replaced the standard convolution in the model with a depth-separable convolution, further improving the inference speed without compromising detection accuracy. Upsampling operations in traditional detectors often result in the loss of fine details, particularly at the edges of small or irregular defects. These edge features are crucial for accurate defect boundary regression in industrial applications. To address this issue, we propose a dual-flow Detail Enhancement Upsampling (DDEU) module, which extracts and fuses key information from both low-level and high-level features to preserve edge integrity during upsampling. By facilitating interactions across different scales, the DDEU module aids in recovering the detailed structure surrounding defects. In complex industrial environments, defect patterns often exhibit variability in appearance and are embedded within noisy backgrounds. Conventional attention mechanisms typically treat all spatial information uniformly, which limits their capacity to adaptively focus on key defect areas from diverse perspectives. To address this limitation, we propose a Multiscale Attention Mechanism of Different Perspectives (MAMDP) that assigns varying weights to feature maps derived from different scales and views. This allows the network to prioritize information that is both spatially and semantically significant. Consequently, this mechanism improves feature discrimination across different defect types and improves robustness against background interference. To address the limitations of traditional localization loss functions, which often struggle with accurately regressing bounding boxes for small objects due to scale imbalance and localization sensitivity, we introduced the novel Normalized Wasserstein Distance (NWD) loss (Rezatofighi et al., 2019). NWD treats bounding boxes as 2D Gaussian distributions and computes a distance metric that is robust to scale variations, thereby enhancing the model’s sensitivity to small target defects. Furthermore, we improved this loss by incorporating adaptive temperature coefficients, which dynamically adjust the influence of the distillation term during training, and by applying high-order distance metrics to better capture spatial discrepancies. These enhancements enable the model to autonomously adapt to varying object sizes, thereby improving the stability and accuracy of the bounding box regression for both small and large defects.
Related Work
Target Detection Methods Based on Deep Learning
Deep learning techniques for industrial defect detection are typically categorized into two types: two-stage and single-stage methods. Two-stage methods, such as Faster R-CNN (Ren et al., 2015) and Mask R-CNN (He et al., 2017), initially generate candidate regions using a regional proposal network (RPN), and subsequently classify and refine these regions through positional regression. Liyun et al. (2020) improved the Faster R-CNN method by optimizing anchor points, thus improving the accuracy and efficiency of engine surface defect detection and positioning. Shuhong et al. (2023) propose a wheel hub defect detection model based on DS-Cascade R-CNN, which integrates spatial attention for multiscale positioning and deformable convolution for adaptive feature extraction to improve the accuracy of defect detection. Ping et al. (2024) improved the faster RCNN model by integrating ResNet50 and the feature pyramid network (FPN), and used K-Means++ to optimize the generation of suggestion boxes, achieving better detection accuracy.
Single-level detection methods, such as YOLO and SSD, perform target localization and classification simultaneously, offering high-speed detection that is well suited for real-time industrial inspection. Although these methods were initially less accurate than two-stage approaches, recent research has significantly improved their accuracy. Several studies have enhanced real-time detection by combining multiscale feature processing with coordinate attention (CA), such as (Dehua et al., 2023; Jingni et al., 2024; Yuxi et al., 2024).
Liu et al. (2025) improves YOLOv8s for surface defect detection in metal castings by introducing modules for small object detection, spatial feature preservation, and multi-scale attention, along with Wise-IoU loss. It achieves higher accuracy, faster inference, and smaller size, showing strong performance in both proprietary and public datasets and real-time capability on edge devices. Pang et al. (2025) proposed ASSM-YOLO, a lightweight and high-precision model for detecting multi-scale defects on cathode copper plates, which combines ADFEM and SimAM for feature enhancement, a Slim Neck for efficiency, and MPDSIoU loss for improved localization performance. Ma et al. (2025) proposed ELA-YOLO, a YOLOv8-based model for steel surface defect detection, which integrates linear attention, a selective feature pyramid, and a lightweight detection head to balance accuracy and efficiency under complex industrial conditions. Zhou and Zhao (2025) proposed MPA-YOLO, a YOLOv8-based model for steel surface defect detection, which integrates a multi-path convolution attention module (MPCA), partial self-attention, and an auxiliary detection head to improve feature representation and localization. It achieved notable gains in mAP, precision and recall in the NEU-DET and VOC2007 datasets. Tao et al. (2024) proposed EFE-YOLO, an enhanced YOLOv5-based model for industrial small object detection. Introduces PSRFA upsampling, MSE downsampling, and AFAF modules to improve feature extraction under occlusion and complex backgrounds, achieving higher precision and recall in a custom dataset.
Technical Gaps
Although significant progress has been made in deep learning-based industrial defect detection, several critical challenges remain unsolved:
To address the aforementioned challenges, this study proposes a lightweight detection model named DEMA-YOLO, which is based on the YOLOv10 architecture. This model significantly enhances feature extraction and attention representation while maintaining a compact size. In comparison to traditional two-stage and conventional single-stage methods, DEMA-YOLO achieves superior accuracy-speed trade-offs and demonstrates strong adaptability in high-speed and variable industrial environments, thereby showcasing substantial practical value for real-world deployment.
YOLOv10 is a recent single-stage detector evolved from YOLOv8, designed to improve detection performance through architectural optimizations. Key innovations include: (1) a consistent dual assignment strategy that supports end-to-end training and inference without requiring Non-Maximum Suppression (NMS); (2) lightweight classification heads and spatial-channel decoupling downsampling (SCDown) for improved efficiency; and (3) partial self-attention (PSA) and large kernel convolution modules that enhance receptive field and feature refinement with minimal computational cost. Its backbone integrates improved C2f and SPPF modules to enhance multi-scale feature representation, while the head employs multi-scale feature maps via the v10Detect layer for effective detection across object sizes. Given its high accuracy, speed, and real-time inference capabilities, YOLOv10 serves as a suitable and efficient baseline for the detection of industrial defects. The overall architecture is shown in Figure 1.

YOLOv10 Architecture.
Efficient Industrial Defect Detection Model: DEMA-YOLO
To address the resource limitations encountered in actual industrial defect detection scenarios and to further improve the balance between detection accuracy and efficiency, we have developed an efficient detection model, DEMA-YOLO, based on YOLOv10. The primary objective of this model is to effectively balance the efficiency and precision of industrial defect detection. The objective is to achieve favourable detection results while simultaneously reducing the number of parameters. The DEMA-YOLO framework is shown in Figure 2. DEMA-YOLO employs YOLOv10 as the backbone network for the detection model. Replaces the standard convolutional layers in the backbone with the depth-wise separable convolution and average pooling (DWAConv) and substitutes the C2f module with the MAMDP operation proposed for feature extraction. This approach selects key industrial defect information from both spatial and channel perspectives, distinguishing the importance of various input features by weighting the original feature map. Consequently, it reduces computational complexity while enhancing feature extraction capabilities. The neck component employs the DDEU structure for upsampling, ensuring that the upsampled features consistently retain rich feature information. Ultimately, we improve the model’s capacity to detect industrial defects by integrating and optimizing the NWD loss.

DEMA-YOLO Architecture.
To reduce the computational requirements of the model, we enhanced the YOLOv10 infrastructure as our backbone network. YOLOv10 employs a more efficient model architecture and a novel training strategy that enhance both performance and efficiency. Initially, we replaced the conventional convolutional (Conv) module with the DWAConv module, which substantially reduces computational complexity. The structural diagram of DWAConv is shown in Figure 3. DWAConv consists primarily of depthwise separable convolutions (DWConv), an average pooling layer, and a squeeze-and-excitation (SE) block. The SE module comprises two components: Squeeze, which captures global information from the network, and Excitation, which reweights the extracted information. By employing global average pooling to reduce the spatial dimensions of feature maps, information is effectively concentrated in the channel dimension, leading to the generation of lower-dimensional feature vectors. During the excitation stage, the feature vector undergoes a

DWAConv Moudle.
Each Layer Parameter of SE Block.
Each Layer Parameter of DWAConv Module.
To enhance the focus of the model on salient features, the MAMDP module is integrated into the backbone network, effectively capturing critical information from both spatial and channel dimensions. Spatial attention is achieved by computing the mean and maximum values along the channel axis, which are then concatenated and subjected to convolution to emphasize spatial dependencies. Concurrently, multi-scale features are extracted through three parallel convolutional branches, which are subsequently fused and adaptively weighted back into the original feature map. Additionally, channel-wise attention is further refined using the squeeze-and-excitation (SE) block, with the final output derived from the combination of spatial and channel features based on their respective contributions. The structure of the MAMDP module is illustrated in Figure 4, while its architecture is detailed in Table 3.

MAMDP Block.
Each Layer Parameter of MAMDP Module.
The neck network in YOLOv10 is designed for advanced feature fusion and multi-scale information extraction. It inherits the design philosophy of YOLOv8 and introduces a novel feature fusion module, C2fCIB, which replaces the original C2f module in semantically rich layers. By employing depthwise separable convolution in place of standard convolution, C2fCIB effectively reduces computational complexity while enlarging the receptive field, thereby achieving efficient feature aggregation with minimal computational overhead. However, the reliance on a basic upsampling module within the neck can result in the loss of fine-grained details, leading to frequent missed or false detections of small industrial defects. To address this limitation, the DDUE module is proposed. It comprises two parallel branches: one dedicated to upsampling the feature map to the target resolution, and the other designed to extract and preserve fine details prior to fusion. This structure enables the integration of high-resolution features containing shallow-layer details with semantically enriched deep features after upsampling, thereby enhancing spatial information reconstruction. In the detail recovery branch, a dual-stream enhancement mechanism is applied, incorporating

DDUE Module.
Each Layer Parameter of DDEU Moudle.
Each Layer Parameter of Focal Edge Module.
To enhance the detection performance of the model, we modified the loss function of DEMA-YOLO by incorporating NWD loss, which improves the stability of model training. The loss function of YOLOv10 is primarily divided into two components, comprising three loss functions: the position loss
The DEMA-YOLO model is built upon a YOLOv10 backbone and is further enhanced by the integration of the proposed DDEU and MAMDP modules. As depicted in Algorithm 1, the training process begins with the initialization of the model using pre-trained YOLOv10 weights, which aids in achieving faster convergence. During each epoch, input batches undergo standard data augmentation techniques, such as resizing, flipping, and mosaic transformations. Subsequently, feature extraction is performed utilizing the DDEU-enhanced backbone, followed by multi-scale fusion facilitated by the MAMDP. The network generates outputs that include classification scores, bounding box coordinates, and objectness predictions. A composite loss function governs the training process, which consists of the loss of BCE for classification, the loss of CIoU for regression, the loss of DFL for enhancing the precision of the localization, and a modified NWD loss specifically designed for the detection of small objects.
Experiments and Results
Experimental Datasets
To assess the effectiveness of our proposed approach, evaluate our method using three datasets: the PCB board dataset (Beijing University, 2024), the NEU-DET steel dataset (Northeast University, 2024), and the mixed-type WM38 dataset (Wang et al., 2020). We selected three datasets for our study because of their unique characteristics that enhance the evaluation of our method. The PCB dataset is particularly valuable, as it contains a significant number of small target defects on circuit boards, facilitating the assessment of our method’s performance on small target datasets. The NEU-DET dataset was chosen for its abundance of steel defects that closely resemble the background, allowing us to evaluate the effectiveness of our proposed module in recognizing the edges of target defects. Lastly, we included the complex Mixed type WM38 dataset, which features a variety of mixed wafer defects. This dataset is instrumental in demonstrating the model’s capacity to accurately identify different defects within a single image and assess detection accuracy. These highly representative industrial defect datasets enable a comprehensive comparison of the performance and robustness of our methods, further supported by ablation experiments that validate the compatibility of each contribution.
Experimental Parameter Settings
To ensure a fair and consistent evaluation, all experiments in this study—including baseline comparisons, ablation studies, and variant analyses—were conducted in identical training settings. Specifically, we used the same dataset, number of epochs, learning rate, batch size, image resolution, AdamW optimizer, and data augmentation strategies across all models. This standardized configuration eliminates potential biases arising from inconsistent hyperparameters or preprocessing, thereby ensuring that performance differences can be attributed solely to the architectural design of the models. Table 6 shows the hardware and software environment of the experimental platform. Table 7 lists the default hyperparameter settings for the training procedure.
Experimental Platform.
Experimental Platform.
Experimental Hyperparameter.
In this study, we adopt precision (P), recall (R), average precision (AP), mean average precision (mAP), and Frames Per Second (FPS) as evaluation metrics based on their widespread use and practical relevance in industrial defect detection tasks. These indicators demonstrate the effectiveness of the model in detecting and classifying defects. The calculation formulas for these indicators are as follows: precision and recall provide a clear measure of the model’s ability to correctly detect and localize defects while minimizing false alarms and missed detections, which are critical in high-risk industrial environments. Precision measures how many of the samples identified as positive by the model are truly positive samples. The formula for precision can be expressed as:
The PCB Surface Defects Dataset is a synthetic and open source dataset of 1386 pictures from Peking University. There are six defects in the image: missing hole, mouse bite, open circuit,spur, and short and spurious copper. Each image has several defects of the same type. Figure 6 shows samples with various defects in the PCB dataset, with red boxes indicating the locations of the defects. It is evident that the PCB dataset features a complex background and contains a significant number of small targets. To overcome this issue, data augmentation techniques such as random rotation, flipping, and brightness adjustments were employed to improve the network’s generalizability. This enhancement process generated a total of 2420 images. Splitting the training set, validation set, and test set according to 8:1:1. During the experiment, we adjusted the image pixels to

Visualization of a PCB Dataset Containing Six Defect Categories.

Distribution of the Number of Different Defect Categories in the Enhanced PCB Dataset.
In order to verify the effectiveness of each module in DEMA-YOLO on small target defects and improve the performance of the model, we conducted ablation experiments on three key modules: MAMDP, DDUE, and NWD loss on the PCB dataset. Main performance indicators, including P, R, and mAP@0.5 See Table 8. The addition of the DDEU module resulted in a modest enhancement in the model’s precision, which can be attributed to the improved extraction of critical edge information during the reconstruction of the feature map for small target defects. This, in turn, enhances the model’s detection efficiency. Furthermore, the incorporation of the MAMDP module leads to an additional increase in the model’s precision. Experimental results demonstrate that the MAMDP module effectively assigns importance weights to feature maps from various perspectives, thereby bolstering the model’s capacity to identify significant features. Lastly, the integration of the NWD loss function maximizes the overall precision improvement. This phenomenon arises from the original CIoU loss function’s sensitivity to small object detection boxes, which causes considerable fluctuations and slow convergence during the training process, ultimately impairing the model’s ability to detect small object targets. The refined NWD loss function mitigates the significant gradient effects on defects of varying sizes, thus facilitating better model convergence.
Ablation Study of Different Modules on the PCB Dataset.
Ablation Study of Different Modules on the PCB Dataset.
To verify the effectiveness of the model, Table 9 shows the comparison of the proposed DEMA-YOLO method with several mainstream target detectors in terms of P, R, mAP and FPS, including the two-stage model Faster R-CNN, Cascade R-CNN (Cai & Vasconcelos, 2018) and the Transformer model EfficientViT (Xie & Liao, 2023), RT-DETR-R18 (Lv, 2023), DETR (Carion, 2020) that integrates the attention mechanism, as well as the most popular single-stage detection models YOLOv8s (Jocher et al., 2023), YOLOv9s (Wang et al., 2024), YOLOv11s (Jocher et al., 2023) and YOLOv10s. Our proposed DEMA-YOLO model outperforms the original model in terms of precision, recall, and mAP value, demonstrating the superior detection performance. In addition, our model has the lowest number of parameters among various models. This reduction is attributed to the use of the lightweight YOLOv10 as the backbone network for feature extraction, which effectively decreases the model’s parameter count. We also designed a MAMDP structure based on spatial and channel attention mechanisms for feature fusion. This structure enhances multi-scale feature fusion while minimizing redundant calculations, further improving detection accuracy. Although DEMA-YOLO may be slightly inferior to other models in certain precision metrics, its efficient characteristics offer significant advantages in resource-constrained environments.
Comparison of this Paper’s DEMA-YOLO Model with other Models on the PCB Dataset.
Comparison of this Paper’s DEMA-YOLO Model with other Models on the PCB Dataset.
Table 10 shows the performance of DEMA-YOLO in the PCB dataset in six categories of defects. The model achieves an AP@.5 score exceeding 90% in the first five classes, with the missing hole class reaching an impressive 98.3%. Although spurious copper attained an accuracy of 88.6%, which is below 90%, it is evident that DEMA-YOLO demonstrates effective performance in detecting small-target defects. Figure 8 shows the PR curves and confusion matrices of DEMA-YOLO and YOLOv10s, reflecting the performance differences of the model in various types of defects. The P and R for missing_hole and short defects are both high, with the PR curve showing stable, high P and R. However, for mouse_bite and spur, P and R are relatively balanced but slightly lower, positioning their curves in a moderate region. For open_circuit and spurious_copper, although P is high, R is low, causing the PR curve to lean toward high-P, low-R areas, but overall our model outperforms the baseline model in detecting these defects.
Experiments on the NEU-DET Steel Dataset
The publicly available NEU-DET steel surface defects dataset from Northeastern University contains 1800 images that represent six types of defects: crazing, inclusion, patches, pitted_surface, rolled-in_scale and scratches. In addition, the steel images were rotated and brightness processed to enhance the robustness of the model and better simulate real industrial defect scenarios. It can be seen that the category distribution of steel surface defects is non-IID (lack of independence, and samples have the same distribution), and the number of inclusions and Paches is large. This paper divides the dataset into training set, validation set, and test set in a ratio of 8:1:1. During the experiment, we adjusted the image pixels to

PR Curve and Confusion Matrices of DEMA-YOLO and YOLOv10s on the PCB Dataset. (a) Representing DEMA-YOLO, (b) Represents YOLOv10s.

Visualization of a NEU-DET Dataset Containing Six Defect Categories.

Distribution of Six Types of Defects on the NEU-DET Dataset.
Various Defects in the DEMA-YOLO Model on the PCB Dataset.
In order to verify the effectiveness of each module in DEMA-YOLO to deal with defects in complex backgrou nds and the performance of model improvement, we conducted ablation experiments on three key modules: MAMDP, DDUE, and NWD loss on the NEU-DET dataset. Main performance indicators, including P, R, FPS and mAP@0.5 See Table 11. The results of the ablation experiments indicate that the model performance improved more significantly with the addition of the DDEU module. This enhancement can be attributed to the DDEU module’s role as an upsampling component, which effectively preserves a substantial number of detailed features within the upsampling feature map. This preservation enables the model to accurately identify the defect locations and maximizes the differentiation of defects that closely resemble the background. In particular, in detection scenarios where the background and defects are highly similar, the DDEU module demonstrates particularly robust performance.
Ablation Study of Different Modules on the NEU-DET Dataset.
Ablation Study of Different Modules on the NEU-DET Dataset.
Table 12 shows the performance comparison of DEMA-YOLO and other object detectors on the NEU-DET dataset. Compared with object detectors with similar numbers of parameters, DEMA-YOLO uses only 7.8 million parameters and outperforms YOLOv10s by 1.0%, 1.2% and 0.9% in P, R and mAP. It also performs better than other popular target detectors.
Compare the DEMA-YOLO Model Proposed in this Article with other Models on the NEU-DET Dataset.
Compare the DEMA-YOLO Model Proposed in this Article with other Models on the NEU-DET Dataset.
Table 13 shows the accuracy of DEMA-YOLO in six defect categories. It can be seen that the precision of the model exceeds 83% in all categories and even exceeds 90% in the three categories of cracking, rolled-in-scale, and scratches. It can be seen that DEMA-YOLO can also achieve good results in detecting defects in complex backgrounds. Figure 11 shows that the PR curve and confusion matrices for the types of defects demonstrate strong performance in detecting crazing, rolled-in_scale, and scratches, with P and R values approaching 99%. This indicates high detection accuracy and minimal false positives or negatives. Inclusion and patches exhibit slightly lower P and R, yet still achieve mAP scores of 83.9% and 84.6%, respectively, reflecting a favourable balance between P and R. pitted_surface displays moderate performance, characterized by a trade-off between P (87.4%) and R (80.3%), resulting in a lower mAP of 83.1%. Overall, the model excels in identifying well-defined defects, such as crazing and rolled-in_scale, while still providing reliable results for other defect types. Compared to YOLOv10s, this model performs well in identifying clear defects, such as silver lines and curled edges, while still providing reliable results for other types of defects. It can be seen that our model performs well even in situations where industrial defects are very similar to the background.
Experiments on the Mixed-type WM38 Dataset
The mixed WM38 dataset comprises a total of 3,798 images, including eight different types of defects. These defect types are as follows: donut, center, loc, edge_loc, edge_ring, near_full, scratch and random, as shown in Figure 12. The Mixed-type WM38 dataset is characterized by the absence of real-world noise and precise boundary labels. To mitigate these limitations and enhance the model’s generalization, accuracy, and robustness, we expanded the dataset using techniques such as rotation, flipping, brightness adjustment, and re-labeling. The final dataset comprises 5,602 images. Subsequently, the training set, validation set, and testing set were split in an 8:1:1 ratio. During the experiment, we adjusted the image pixels to

PR Curve and Confusion Matrices of DEMA-YOLO and YOLOv10s Model on the NEU-DET Dataset. (a) Representing DEMA-YOLO, (b) Represents YOLOv10s.

Visualization of a Mixed Type WM38 Dataset Containing Eight Defect Categories.

Distribution of Eight Types of Defects on the Mixed-Type WM38 Dataset.
Various Defects in the DEMA-YOLO Model on the the NEU-DET Dataset.
Ablation experiments were performed using DEMA-YOLO on the mixed-type WM38 dataset. The results show that the model is basically the same as YOLOv10s in terms of precision, but the inference speed is significantly faster than YOLOv10s. It can be seen that our model also performs well for mixed complex defects. The results are shown in Table 14. The results show that adding one of the three modules can achieve a faster inference speed and smaller parameters without affecting precision. In the context of mixed industrial defect detection, the synergy of the three modules enables the model to achieve optimal detection performance.
Ablation Study of Different Modules on the Mixed Type WM38 Dataset.
Ablation Study of Different Modules on the Mixed Type WM38 Dataset.
Table 15 compares the proposed DEMA-YOLO with several mainstream object detectors, including Faster R-CNN, Cascade R-CNN, EfficientViT, RT-DETR-R18, DETR, YOLOv8s, YOLOv9s, YOLOv11s and YOLOv10s. DEMA-YOLO achieves an mAP of 98.7%, which is basically the same as YOLOv10s, while maintaining a low number of parameters of 7.8 million, showing excellent accuracy and efficiency, but the number of parameters of the DEMA-YOLO model is reduced compared to YOLOv10s and the inference speed is improved. These results confirm the effectiveness of the design and implementation of DEMA-YOLO.
Compare the DEMA-YOLO Model Proposed in this Article with other Models on the Mixed Type WM38 Dataset.
Compare the DEMA-YOLO Model Proposed in this Article with other Models on the Mixed Type WM38 Dataset.
Various Defects in the Mixed Type WM38 Dataset.
Table 16 demonstrates that the model exhibits the highest detection performance for center, donut, near_full and random defects. In contrast, the recall rate for edge_loc and loc defects are relatively low. However, the model maintains strong performance, achieving an AP@.5 that exceeds 95%.
Figure 14 shows the PR curves and confusion for various type of defects detected by the DEMA-YOLO and YOLOv10s model in the mixed type WM38 dataset. From these curves, we observe that the model achieves a high detection probability for most types of defects. However, the detection rates for edge_loc and scratch defects are comparatively lower than those for other defect types. Despite this, the overall detection rate remains above 95%.

PR Curve and Confusion Matrices of DEMA-YOLO and YOLOv10s on the Mixed Type WM38 Dataset. (a) Representing DEMA-YOLO, (b) Represents YOLOv10s.

Visualization Showing Different Models on the PCB Dataset. (a) Represents DEMA-YOLO, (b) Represents YOLOv11s, (c) Represents YOLOv10s and (d) Represents YOLOv9s.

Visualization Showing Different Models on the NEU-DET Dataset. (a) Represents DEMA-YOLO, (b) Represents YOLOv11s, (c) Represents YOLOv10s and (d) Represents YOLOv9s.

Visualization Showing Different Models on Mixed Type WM38 Dataset. (a) Represents DEMA-YOLO, (b) Represents YOLOv11s, (c) Represents YOLOv10s and (d) Represents YOLOv9s.
Figure 15 illustrates the inference results of DEMA-YOLO, YOLOv11s, YOLOv10s and YOLOv9s in the PCB dataset. The results indicate that our proposed method demonstrates high accuracy in detecting small target objects, as evidenced by the precision of the predicted bounding-box positions and the confidence levels associated with the category classifications. In contrast, the other two models either missed or misdetected the objects. Figure 16 shows the inference results of our model in the NEU-DET dataset. Despite significant differences in defect sizes and high similarity between defects and background, our model shows good performance, although slightly lower than the performance observed on the other two datasets. However, in general our model outperforms most models in all three datasets. At the same time, our model also performs well compared to the visualization results of the other two models, with no false positives or missed positives. In Figure 17, our model performs well on the mixed-type WM38 dataset, which shows that it can also effectively perform the detection task for mixed-type industrial defects.
Feasibility in Real-World Deployment
The proposed DEMA-YOLO model is optimized not only for high detection performance but also for practical demands in real-world deployment. Featuring a compact architecture with only 7.8 million parameters, the model significantly reduces memory consumption and computational load. This lightweight design facilitates deployment on resource-limited devices, such as embedded systems and industrial edge computing platforms, without necessitating powerful GPUs or servers. In terms of inference efficiency, the model achieves an average speed of 125.1 FPS on an NVIDIA Tesla T4 GPU across the PCB, NEU-DET, and Mixed type WM38 datasets. This high-speed performance meets the real-time requirements of most industrial surface defect detection tasks, particularly in high-throughput manufacturing environments where timely feedback is critical for quality control and process optimization. Furthermore, the model’s strong performance across three diverse datasets demonstrates its robustness and generalization ability in various defect scenarios, enhancing its practical applicability. Compared to traditional or heavier deep learning models, DEMA-YOLO strikes a favourable balance between detection accuracy, model complexity, and inference speed. These advantages confirm that DEMA-YOLO is well-suited for deployment in real-world industrial inspection systems, especially those constrained by hardware resources, latency requirements, or energy efficiency concerns.
Discussion: Challenges in Model Development and Deployment
During the development of DEMA-YOLO, several key challenges were encountered. Firstly, severe class imbalance and label noise in industrial defect datasets hindered stable training and generalization. These issues necessitated the implementation of targeted data augmentation strategies and class-reweighting techniques. Secondly, the model encountered difficulties in preserving fine-grained features during upsampling and achieving efficient multi-scale representation, which were addressed through the design of the DDEU and MAMDP modules. Although deployment was not included in this study, future work will focus on real-world implementation to evaluate the robustness and efficiency of the model under practical industrial conditions.
Conclusion
To balance the efficiency and accuracy of industrial defect detection, we propose an efficient detection model, DEMA-YOLO, based on YOLOv10, which is suitable for resource-constrained environments. The model parameters are only 7.8M, due to the use of YOLOv10 as the backbone network. Additionally, we have designed the DDEU module, which effectively addresses the issue of detail loss during upsampling by exchanging critical information regarding industrial defect edges at various scales. Concurrently, we propose the MAMDP module to significantly enhance the model’s feature extraction and fusion capabilities. Finally, we introduce NWD loss and incorporate adaptive temperature coefficients along with high-order terms to improve its performance, thereby enhancing the model’s stability in small-target border regression and mitigating the issue of large training jitter amplitude.
Experimental results show that DEMA-YOLO has achieved a good balance between detection efficiency and accuracy and is suitable for resource-constrained industrial defect detection scenarios.
Footnotes
Funding
This work was supported by Dongguan Science and Technology of Social Development Program under Grant 20231800940532, and Songshan Lake Sci-Tech Commissioner Program under Grant 20234373-01KCJ-G.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
