Abstract
BACKGROUND:
The identification of infection in diabetic foot ulcers (DFUs) is challenging due to variability within classes, visual similarity between classes, reduced contrast with healthy skin, and presence of artifacts. Existing studies focus on visual characteristics and tissue classification rather than infection detection, critical for assessing DFUs and predicting amputation risk.
OBJECTIVE:
To address these challenges, this study proposes a deep learning model using a hybrid CNN and Swin Transformer architecture for infection classification in DFU images. The aim is to leverage end-to-end mapping without prior knowledge, integrating local and global feature extraction to improve detection accuracy.
METHODS:
The proposed model utilizes a hybrid CNN and Swin Transformer architecture. It employs the Grad CAM technique to visualize the decision-making process of the CNN and Transformer blocks. The DFUC Challenge dataset is used for training and evaluation, emphasizing the model’s ability to accurately classify DFU images into infected and non-infected categories.
RESULTS:
The model achieves high performance metrics: sensitivity (95.98%), specificity (97.08%), accuracy (96.52%), and Matthews Correlation Coefficient (0.93). These results indicate the model’s effectiveness in quickly diagnosing DFU infections, highlighting its potential as a valuable tool for medical professionals.
CONCLUSION:
The hybrid CNN and Swin Transformer architecture effectively combines strengths from both models, enabling accurate classification of DFU images as infected or non-infected, even in complex scenarios. The use of Grad CAM provides insights into the model’s decision process, aiding in identifying infected regions within DFU images. This approach shows promise for enhancing clinical assessment and management of DFU infections.
Keywords
Introduction
DFU represent a significant challenge in diabetes care on a global scale, affecting a vast number of individuals. Within the healing process of DFUs, infection and ischemia are prevalent issues that can escalate to the point of necessitating limb amputation and hospitalization [1]. Post-amputation, a patient’s life quality often experiences a rapid decline, with their life expectancy typically falling below three years [2]. This statement underscores the significant impact that amputation can have on an individual’s overall well-being, including physical, mental, and social aspects. However, it is crucial to note that various factors, such as prosthetic use, rehabilitation, psychological support, and health management, can significantly influence the quality of life after amputation. While the statement highlights a general trend, individual experiences and outcomes can vary, and healthcare professionals strive to provide the necessary support to help patients navigate the challenges associated with limb loss. This paper specifically delves into diabetic foot infection, a primary concern as it manifests in 40%–80% of DFU cases [3]. The presence of bacteria in the wound is the primary instigator of infection, leading to cellular damage. Given their typical location on the lower extremities, particularly the soles of the feet, DFUs are particularly vulnerable to infection.
DFU are a major global health concern, affecting an estimated 40–60 million people worldwide, with the potential to increase to 700 million by 2045 due to the rising number of individuals with diabetes [4]. This burden is disproportionately felt in developing nations where access to proper foot care and specialized healthcare is limited, with 80% of people with diabetes residing there based on International Diabetes Federation, 2023 [5]. The economic impact of DFU is significant, exceeding $10 billion annually in the US alone (American Diabetes Association, 2023) [6]. This includes direct costs of treatment and indirect costs due to job losses and productivity declines (World Health Organization, 2023) [7]. Despite advancements in treatment, approximately 15% of diabetic foot ulcers progress to amputations, highlighting the need for improved strategies (International Wound Journal, 2023) [8].
Several factors increase the risk of developing diabetic foot ulcers, including:
Poor diabetes management: Uncontrolled blood sugar levels significantly increase the risk. Peripheral neuropathy: Nerve damage in the feet reduces sensation, leading to unnoticed injuries. Poor foot circulation: Reduced blood flow impairs healing and increases the risk of infection. Smoking: Smoking significantly hinders wound healing and worsens foot ulcers.
Prevention is crucial in reducing the burden of diabetic foot ulcers. Regular self-inspections, good hygiene, and proper footwear are essential for preventing foot ulcers. Maintaining blood sugar control through medication, diet, and exercise is also key to reducing the risk of complications [9]. Regular check-ups with a healthcare professional for foot examinations are vital for early detection and treatment of any issues.
In recent years, advancements in medical imaging techniques have provided a promising avenue for the early detection and characterization of infection in DFU. Various imaging modalities, including but not limited to infrared thermography, Doppler ultrasound, magnetic resonance imaging (MRI), and positron emission tomography (PET), have shown potential in assessing blood flow, tissue perfusion, and oxygenation levels. Moreover, new paths for the automatic and precise identification of infection in imaging data have been made possible by the combination of artificial intelligence (AI) and machine learning algorithms [10]. Computer vision techniques are revolutionizing medical diagnosis, offering the potential to significantly improve both accuracy and speed within clinical practice [11]. Image processing plays a vital role in assisting physicians with disease identification, finding applications in surgery, biological imaging, and treatment planning [12]. With the use of both high-level analytical methods and low-level image processing techniques (such as edge detection, region growth, and line recognition), this cutting-edge technology tackles particular problems in healthcare [13].
Accurately extracting pertinent information from medical images is a crucial step in building these disease detection systems. Deep learning (DL) technologies have emerged as powerful tools, replacing traditional image processing methods [14]. These technologies empower computers to recognize and learn meaningful features associated with specific medical conditions. This breakthrough has opened new doors for automated disease diagnosis [15]. In particular, CNN based models have proven incredibly effective in advancing healthcare diagnostics, with applications spanning areas such as: neuropathic ulcer detection, breast tumor segmentation and classification, cancer cell diagnosis, genetic pattern analysis and image segmentation [16]. The development of different healthcare systems for the identification of ailments like infection DFU, neuro-pathic foot ulcers, and abrasions heavily relies on deep learning. A computer vision-based system was developed for diagnosing patients with neuropathic ulcers, focusing on detecting infection and ischemia [17]. Earlier attempts to detect infection in wounds employed traditional machine learning techniques that relied on handcrafted features. These methods primarily focused on identifying signs of infection through visual examination of wounds and confirming diagnoses with wound cultures. However, these approaches have limitations, as they can be time-consuming and dependent on the clinician’s experience. Hsu et al. [18] used Support Vector Machines (SVM) with clustering to classify infections using machine learning. They demonstrated the method’s effectiveness in categorizing complex infection patterns. Wag et al. [19] proposed a deep learning method for wound segmentation and infection detection. This method jointly learns task-relevant visual characteristics, showing efficacy on a large-scale wound database. Alzubaidi et al. [20] introduced DFU QUTNet for neuropathic ulcer classification. They achieved high precision (95.4%), recall (93.6%), and F1-score (94.5%). Al-Garaawi et al. [21] developed a novel CNN method for DFU classification using texture features from RGB images. Their technique achieved an F1 score of 95.2% and an AUC of 98.1%. EL Kady et al. [22] enhanced DFU diagnosis accuracy with a hybrid ResNet50-GAN model. This model outperformed ResNet50 alone, achieving an accuracy of 0.84. Cui et al. [23] created a CNN-based algorithm for ulcer segmentation. It surpassed U-net and SVM, achieving an accuracy of 93.4% and reducing false positives by 0.9%. Reyes-Luévano et al. [24] proposed DFU_VIRNet for classifying DFU using visible and thermography images. The model obtained an F1-score of 0.83 and an AUC of 0.91. Hong S et al. [25] applies machine learning to assess recurrence risk of diabetic foot ulcers (DFUs) in elderly diabetic patients, achieving 93% accuracy with SVM. Findings underscore SVM’s effectiveness in predicting DFU recurrence, aiding targeted interventions for improved patient outcomes.
Building upon these methodologies, this study presents a novel DL model that combines CNN with Swin transformer for reliable categorization of infection in DFUs from patient foot images. This represents the first-ever application of a hybrid model a combination of CNN with Swin transformer model for infection detection in DFUs. We evaluated the suggested model’s efficacy by contrasting it with many baseline models on a range of performance measures, including accuracy, recall, precision, F1-score, MCC, and AUC-ROC. The goal of this study was to create an automated deep learning system that would enable medical professionals to classify infections in DFU quickly and accurately. Utilizing its attention-based architecture, the Shifted Window (Swin) Transformer [26] excels in image classification tasks, offering state-of-the-art results. As a Vision Transformer (VIT) network grounded in self-attention, its application in hyperspectral image (HSI) classification is limited. To adapt it for HSI classification, the structure was modified. By integrating the single-stage Swin Transformer into the network architecture, specifically leveraging its attention module, the aim is to enhance DFU classification accuracy and robustness. The experiments demonstrate the effectiveness of Transformer-based architectures in DFU image classification, even with small datasets, without complex model designs.
The notable contributions of this research endeavor are outlined as follows.
The block6a_expand_activation mid-layer of the Efficient Net-B0 model is integrated by the Swin DFU network into the Swin Transformer Blocks, removing the requirement for manual intervention and specialized knowledge. Swin DFU-Net autonomously acquires relevant features, showcasing superior accuracy compared to existing methods while maintaining operational efficiency. The hybrid model streamlines workflow by combining feature extraction and classification into a cohesive process, ensuring high performance in classifying infection-positive and infection-negative images with minimal misclassification. On a sizable DFU dataset, Swin DFU applies a delineation-free feature acquisition technique, exhibiting mastery by extracting pertinent features straight from the full image without the need for operator intervention or further segmentation procedures.
This research study follows the following structure: Section 2 delves deeply into the Swin DFU framework and covers data preprocessing and model building techniques. In Section 3, the experimental results are presented together with a comprehensive explanation and a complete description of the dataset and metrics employed. Section 4 provides a summary of the study’s limits and comments, and Section 5 concludes this research work.
For clinical purposes, creating an accurate diagnosis system to classify DFU is a difficult undertaking. This section explains how we combined global and local context information using a Shifted Window Transformer to provide a hybrid Swin DFU framework for the infection classification problem. The planned Swin DFU-Net framework is depicted on Fig. 1. The framework does the following tasks: DFU sample extraction, label loading, sample pre-processing, sample splitting, data augmentation, training with Swin transformer (using the efficient model’s output as input), and statistical parameter analysis on the test set. Algorithm 1 provides a step-by-step flow of Swin DFU net classification task.
The DFU dataset, including images of infection, is accessible upon completion of a dataset release agreement [27]. It comprises two main classes, each with two subclasses: ischemia positive/negative and infection positive/negative. For this study, focus is on distinguishing infection from non-infection, enhancing overall classification. The dataset contains 5890 images, evenly split between infection positive/negative, labeled by a senior physician. The data is divided into 3534 images to training (Tr) and 1178 images to validation sets (Va) and 1178 images to testing sets (Te). Example images of infection and non-infection classes are depicted in Fig. 2. Table 1 furnishes additional details pertaining to the dataset.
Pre-augmentation data overview table
Pre-augmentation data overview table
Proposed Swin DFU-Net infection classification.
Sample infection (positive and negative) in DFU.
A number of preprocessing processes were carried out before the DFU images were entered into the Swin DFU Net model. Each image in the DFU dataset was downsized to 384
Augmented images using CutMix techniques.
Augmented images using MixUP techniques.
EfficientNet-B0, a CNN design belonging to the EfficientNet family, was developed to achieve a harmonious balance between model performance and computational efficiency [30]. This is achieved by scaling the network breadth, depth, and resolution consistently using the compound scaling technique. Due to its success in obtaining a favorable trade-off between model size and accuracy, it has been widely employed in a variety of computer vision tasks. The model demonstrates superior performance with high accuracy and efficiency, outperforming existing models while using significantly fewer parameters. EfficientNet-B0 showcases state-of-the-art accuracy on transfer learning datasets, surpassing other models with a substantial reduction in parameters. Its scalability and balance in network width, depth, and resolution contribute to its success in achieving optimal performance across different applications.
The EfficientNet B0 model is integrated into the workflow, with particular focus on its mid layer, block6a_expand_activation, serving as the input to the Swin Transformer Blocks. The input shape of the EfficientNet B0 model is configured to be 384, resulting in an output size of 24 at the layer block6a_expand_activation. This output serves as the pivotal input for the subsequent Swin transformer processing. In essence, the features extracted by block6a_expand_activation layer, are harnessed as rich representations to be further refined and contextualized within the Swin Transformer architecture. This seamless integration ensures that the Swin transformer framework can effectively leverage the hierarchical feature representations derived from EfficientNet B0, thus enhancing the overall model’s performance and capacity to capture intricate patterns and nuances within the input data.
Swin transformer model
The Swin Transformer (ST) is an advanced deep learning architecture with a distinctive hierarchical structure and attention mechanism. Unlike conventional transformers that process input tokens sequentially, the ST introduces a hierarchical organization by grouping tokens into patches and conducting multi-scale processing, enabling the model to capture both local and global dependencies efficiently. The ST integrates a novel self-attention mechanism called Shifted Windows, which enhances computational efficiency by reducing the cost of attending to distant tokens, achieving remarkable scalability. This design allows the ST to handle larger input sequences without significantly escalating computational demands [31]. These innovative features make the ST a potent tool for various tasks such as imaging application, promising significant advancements in research and practical applications. It distinguishes itself by its versatility in accommodating diverse input modalities and effectively scaling to different input sizes. Unlike conventional CNNs, which are tailored for fixed-size inputs, the ST adeptly processes inputs of variable sizes without necessitating resizing or cropping. This inherent flexibility renders it particularly suitable for tasks involving high-resolution images or sequences of varying lengths. Additionally, the ST exhibits robust generalization, owing to its self-attention mechanism’s ability to capture long-range dependencies and contextual information adeptly.
Swin DFU network architecture overview: Layers, shapes, and connections
Swin DFU network architecture overview: Layers, shapes, and connections
In this research, a detailed approach is presented for fine-tuning the Hybrid-EfficientNet-Swin-Transformer model, a SOTA known for its effectiveness in various computer vision tasks. The approach involves several meticulous steps to adapt the pre-trained model to new domains seamlessly. Initially, the target dataset is carefully preprocessed and curated with precise labeling and data integrity. Advanced deep learning frameworks are used to load the model and adjust its final classification layer to suit the specific research task requirements. The model architecture is provided in Table 2. To facilitate effective learning, appropriate loss functions and optimizers are selected based on the dataset’s characteristics. During the training process, a combination of techniques is employed, including gradual layer unfreezing, adaptive learning rate scheduling, and careful data augmentation to enhance model performance and prevent overfitting. Various fine-tuning strategies such as regularization and transfer learning paradigms are explored to improve the model’s generalization capabilities. Through iterative experimentation and thorough hyperparameter optimization, the approach is refined to achieve superior performance benchmarks. The fine-tuned model is rigorously evaluated on dedicated validation sets, with metrics like accuracy, precision, recall, and F1-score meticulously scrutinized to validate its effectiveness and robustness. This methodology not only delivers promising results but also provides valuable insights into the intricate process of fine-tuning complex deep learning architectures, thereby enhancing their practical utility in real-world applications.
Dataset exploration, metrics, and results evaluation
Dataset exploration
The DFU dataset, a collaborative effort between Manchester Metropolitan University and Lancashire Teaching Hospitals, focuses on DFU images capturing infection and ischemia cases [27]. It aims to enhance DFU detection through advanced methodologies. Images were taken using Kodak DX4530, Nikon D3300, and Nikon COOLPIX P100 cameras, with close-up shots of the foot from a standardized distance. DFU regions were extracted, and natural data augmentation techniques applied. The dataset of 5890 images is split into training, testing, and validation sets as mentioned in Table 1.
Experimental framework
The Keras library’s capabilities were used in the conception of the Swin DFU framework, allowing for the smooth integration of Python with neural networks (NNs). Detailed specifications of the experimental setup, including various parameters, are meticulously documented in Table 3.
Model experimental setup with parameters
Model experimental setup with parameters
The assessment of the Swin DFU framework entailed a comprehensive analysis of various statistical metrics, including accuracy (ACC), Kappa statistic, F1-score (FS), precision (PRE), Matthew’s correlation coefficient (MCC), specificity (SPE), sensitivity (SEN), and recall (REC). These metrics were computed from the confusion matrix, which records the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Throughout the evaluation, TP represents accurately detected positive foot skin, while TN signifies precisely identified negative foot skin. Conversely, FP indicates incorrectly classified positive foot skin, and FN denotes misidentified negative foot skin. This thorough assessment framework offers insights into the performance and capabilities of the Swin DFU-Net model, aiding its refinement and optimization for future applications in diabetic foot ulcer detection and diagnosis. The evaluation of the Swin DFU framework involved analyzing various statistical metrics derived from the confusion matrix. These metrics are detailed in Table 4. These equations offer a quantitative measure of the Swin DFU framework’s performance in detecting positive and negative infection presence, enabling a comprehensive assessment of its efficacy in infection classification in DFU.
Classification performance measure for Swin-DFU framework
Classification performance measure for Swin-DFU framework
Configuring training parameters for the Swin-DFU framework
Appropriate hyperparameter settings were used to train the Swin DFU net. During the framework training process, the optimizer and gradient descent loss functions are two essential components for choosing the hyperparameters. When deciding on our framework’s Adam optimizer function [32] is chosen, for its adept handling of sparse gradients within sizable datasets, achieved by amalgamating the advantageous traits of RMSProp and AdaGrad optimizers [33]. Given the classification nature of our model, we adopted the categorical cross entropy loss function for its suitability to this task. The research incorporates the gradient accumulation technique to address the computational complexities of transformer-based architectures, which often limit batch size due to their intensive computational requirements. By partitioning the batch into smaller mini-batches and aggregating their outcomes, the method computes loss and gradients, postponing parameter adjustments until multiple mini-batches have been processed. This effectively alleviates memory constraints, allowing for training with reduced memory consumption similar to larger batch sizes. The Swin DFU framework uses a learning rate scheduler with hyperparameters such as decay epoch, warmup epoch, and decay factor to balance the trade-off between rapid convergence and model stability. The detailed parameters used in the Swin DFU framework are provided in Table 5.
Training plot for proposed Swin DFU Net framework: Insights from training vs. validation accuracy and loss plots.
The performance measures of binary classification of infection by Swin-DFU Net and baseline models
Proposed Swin DFU Net confusion matrix.
In Fig. 5, the simulation results of the proposed Swin-DFU net framework are showcased, extracted from the training phase of the model. Additionally, it verifies that overfitting did not occur during the Swin-DFU model training process. The loss curve demonstrates a rapid and smooth convergence, with minimal oscillations. Figure 6 illustrates the comprehensive analysis of the proposed Swin DFU net model through the presentation of the confusion matrix. By harnessing the feature-rich representations extracted from the EfficientNet model, our approach adeptly discerns the nuanced presence of infection within DFU. The confusion matrix serves as a visual testament to the model’s discriminative prowess, showcasing its ability to accurately classify a considerable number of true positives and true negatives. Particularly noteworthy is the model’s remarkable capacity for correctly identifying instances of infection within the DFU dataset, underscoring its potential for clinical application.
Furthermore, the Swin DFU-Net deep learning model exhibits an admirable degree of consistency and stability, as evidenced by the achieved AUC (Area Under the Curve) value of 0.92, as demonstrated in Fig. 7. This elevated AUC score serves as a strong indicator of the model’s robust performance across a spectrum of classification thresholds, affirming its reliability in distinguishing between positive and negative cases of infection. In addition to the AUC metric, a comprehensive evaluation of the model’s performance encompasses a range of metrics including sensitivity, specificity, precision, accuracy, and the F1 score. Notably, Table 6 shows our model achieves a sensitivity of 95.98%, indicating its ability to accurately detect true positive cases, while maintaining a high specificity of 97.08% to effectively identify true negative cases. The precision of 97.12% further underscores the model’s precision in classifying positive cases, while the overall accuracy of 96.52% reaffirms its overall efficacy in classification tasks. Additionally, the F1 score of 96.55% provides a balanced measure of the model’s performance, considering both precision and recall. Lastly, the Matthews Correlation Coefficient (MCC) of 0.93 further validates the reliability and effectiveness of our model, providing a single, comprehensive metric that takes into account all four elements of the confusion matrix. This holistic assessment underscores the Swin DFU-Net model’s potential as a valuable tool in clinical decision-making processes related to the diagnosis and management of DFU.
ROC curve for the Swin DFU-Net model.
The prevalence of infection in DFU and related complications can often be attributed to a lack of adherence to a healthy lifestyle and inadequate safe-ty measures among individuals affected by diabetes. Proper management of diabetes, including adherence to a balanced diet and implementation of appropriate safety precautions, is crucial in mitigating the risk of infection development in DFU patients. Further-more, providing comprehensive guidance and sup-port from caregivers to diabetic patients plays a vital role in addressing these challenges effectively with proactive patient education and the integration of advanced technological tools, holds immense promise in the effective classification and management of infection in diabetic foot ulcers. Moreover, advancements in technology have paved the way for innovative techniques in the diagnosis, treatment, and prediction of infection in DFU cases. By leveraging technological solutions such as machine learning algorithms and imaging modalities, healthcare professionals can enhance their ability to accurately classify infection conditions, leading to timely interventions and improved patient outcomes. Ensuring proper management of diabetes, coupled with proactive patient education and the integration of advanced technological tools, holds immense promise in the effective classification and management of infection in DFU
This multifaceted approach can significantly contribute to reducing the burden of infection complications and improving the overall quality of care for DFU patients. The proposed framework is based on the Swin transformer model based on the hybrid CNN approach. The proposed framework can be leveraged across other deep learning applications. By utilizing this framework, healthcare professionals can make better decisions on medical screening. As a result, this network will be essential to improving clinical research and healthcare systems’ efficacy and efficiency. In comparison to previous models, the suggested model’s results show that it is more resilient and reliable. The performance of the suggested Swin DFU-Net model is compared to earlier studies that used the same dataset but had different topologies, depths, and parameter values (see Table 6). By collecting global features rather than just local ones, this method which combines inputs from the EfficientNet b0 model mid-layer to the transformer block offers a number of benefits. It effectively mitigates overfitting and minimizes the possibility of noise impacting the minority label. The Table 6 compares the performance of various deep learning models for infection classification in DFU images. The Swin DFU-Net framework, proposed in this study, achieved a high It effectively mitigates overfitting and minimizes the possibility of noise impacting the minority label. The Table 6 compares the performance of various deep learning models for infection classification in DFU images.
The Swin DFU-Net framework proposed in this study demonstrates remarkable performance in classifying DFU images as infected or non-infected. The framework achieved an accuracy of 96.52%, precision of 97.12%, specificity of 97.08%, sensitivity of 95.98%, and F1-Score of 96.55%. These metrics indicate its high effectiveness, surpassing other models such as Res7Net, ResNet50, and the Colour (RGB)-texture coded based CNN. It is also closely comparable to the EfficientNet B0 model proposed by Liu et al. [34], which achieved an accuracy of 96.80%. Goyal et al. [35] introduced the SuperPixel Colour Descriptor, an innovative feature descriptor that uses a carefully designed machine learning (ML) technique. An Ensemble CNN model was subsequently implemented to improve ischemia and infection detection, achieving a 73% accuracy for ischemia classification, surpassing traditional ML methods. Das et al. [36] developed the DFU-SPNet model, which specializes in classifying DFU images rather than healthy ones. The model utilizes multiple kernel sizes and incorporates three tiers of convolutional layers, enabling it to capture both global and local features. Trained with the SDG optimizer, the model attained an impressive AUC of 97.4%. Similarly, Yogapriya et al. [37] presented an efficient diabetic foot infection classification model using deep learning techniques. The proposed DFINET model employs a unique architecture with convolutional filters of various sizes and three tiers of convolutional layers, achieving an AUC of 97.4%. Ahsan et al. [38] introduced various end-to-end CNN-based deep learning architectures for categorizing infection and ischemia utilizing the DFU2020 dataset. Through weight fine-tuning and the implementation of affine transform techniques for input data augmentation, the ResNet50 model achieved impressive accuracies of 99.49% for ischemia and 84.76% for infection classification. These findings collectively underscore the potential of advanced deep learning methods, such as the Swin DFU-Net framework, in improving the accuracy and reliability of DFU image classification. The superior performance of Swin DFU-Net, particularly in comparison to several established models, highlights its potential for clinical application in DFU diagnosis and treatment planning. By effectively distinguishing between infected and non-infected DFUs, this framework can significantly aid healthcare professionals in making informed decisions, ultimately improving patient outcomes. Future research should focus on further validating these models with larger and more diverse datasets, exploring the integration of additional clinical parameters, and assessing the real-world impact of these advanced diagnostic tools in clinical settings. Despite this minor difference across metrics in the Swin DFU-Net framework demonstrates superior performance in the Grad CAM plot analysis indicates that the proposed Swin DFU-Net framework is more effective in identifying and highlighting the infected regions in DFU images compared to the Efficient Net B0 model.
Overlaying activation maps of CNN model and Swin DFU Net model onto DFU images.
Grad-CAM [41] stands as a cornerstone methodology, offering unparalleled insights into the decision-making processes of deep CNNs. Through meticulous analysis of gradient information, Grad-CAM facilitates precise localization of relevant image regions crucial for classification. Its interpretability and versatility have made it an indispensable tool for understanding model behavior and guiding network optimization important for predicting the target. It becomes evident that the Swin-DFU Net exhibit a notable capacity for globally refining feature activation across the pertinent object, showcasing a propensity for holistic information processing. This stands in contrast to the Efficient Net model, which predominantly operates at a local level, as demonstrated in the preceding Fig. 8. This behavior underscores the inherent strengths of transformer architectures in capturing long-range dependencies and contextual information within the image data, thereby contributing to enhanced object recognition and understanding. In conclusion, the proposed Swin DFU-Net framework is a promising deep learning model for DFU infection classification, demonstrating high accuracy, precision, specificity, sensitivity, and F1-Score. The proposed framework has the potential to significantly improve the accuracy and efficiency of DFU infection diagnosis, leading to better patient outcomes.
It is feasible to diagnose if the sample provided in this study exclusively points to an infection or not. However, the study does not facilitate real-time assessments of pain intensity or complexity levels. The datasets primarily include cases of visible infections that have undergone debridement, a process involving the removal of dead skin. It’s important to recognize that in some cases, infection may be present but not immediately visible or may necessitate analysis without undergoing debridement. Moreover, it’s relevant to acknowledge that although the analysis of infection was carried out separately, these conditions can coexist within the same wound. Thus, there is potential value in curating a dataset that includes images with multiple conditions occurring simultaneously within a single DFU image, thereby facilitating investigation into multi-label classification. Secondly, it’s important to recognize that the infection images utilized in the study were captured professionally with meticulous care. However, in the envisioned real-world scenario, these images will likely be captured by nurses, who may not possess advanced photography skills and could be considered novice photographers. Consequently, many infection images may not be perfectly centered, could be blurry, or might be captured from arbitrary angles. Incorporating such imperfect images into the dataset would enhance the model’s robustness and realism, enabling it to perform effectively in practical clinical settings.
Conclusion
In this study, we introduced the Swin DFU-Net framework, a hybrid model combining the Efficient Net and Swin Transformer architectures, for the binary classification of infection and non-infection in DFU images. We conducted a comprehensive evaluation of the proposed models and compared them to several baseline models, including Efficient Net B0, Res7Net, DFINET, ResNet50, and Color (RGB)-texture coded based CNN. The Swin DFU-Net models demonstrated exceptional performance in the binary classification of infection, surpassing all baseline models in infection classification accuracy. The Swin DFU-Net framework’s superior performance can be attributed to its ability to effectively leverage the strengths of both the Efficient Net and Swin Transformer architectures. The Efficient Net component provides high-resolution feature extraction, while the Swin Transformer component enables efficient modeling of long-range dependencies in the input data. This hybrid approach allows the Swin DFU-Net framework to accurately classify DFU images as infected or non-infected, even in complex cases where traditional classification methods may struggle. In conclusion, the Swin DFU-Net framework is a promising tool for the binary classification of infection in DFU images, demonstrating superior performance compared to several baseline models. The hybrid architecture of the Swin DFU-Net framework enables accurate feature extraction and long-range dependency modeling, resulting in high-accuracy classification of infected and non-infected DFU images. The promising outcomes of this study suggest that the Swin DFU-Net framework has the potential to significantly aid doctors in efficiently diagnosing DFU infections, leading to improved patient outcomes.
Funding
The authors did not receive any funding.
Author contributions
All authors contributed to the design and methodology of the study, assessment of the outcomes and writing of the manuscript.
Data availability
No datasets were generated or analyzed during the current study.
Footnotes
Conflict of interest
The authors do not have any conflicts of interest to report.
