SwinDFU-Net: Deep learning transformer network for infection identification in diabetic foot ulcer

Abstract

BACKGROUND:

The identification of infection in diabetic foot ulcers (DFUs) is challenging due to variability within classes, visual similarity between classes, reduced contrast with healthy skin, and presence of artifacts. Existing studies focus on visual characteristics and tissue classification rather than infection detection, critical for assessing DFUs and predicting amputation risk.

OBJECTIVE:

To address these challenges, this study proposes a deep learning model using a hybrid CNN and Swin Transformer architecture for infection classification in DFU images. The aim is to leverage end-to-end mapping without prior knowledge, integrating local and global feature extraction to improve detection accuracy.

METHODS:

The proposed model utilizes a hybrid CNN and Swin Transformer architecture. It employs the Grad CAM technique to visualize the decision-making process of the CNN and Transformer blocks. The DFUC Challenge dataset is used for training and evaluation, emphasizing the model’s ability to accurately classify DFU images into infected and non-infected categories.

RESULTS:

The model achieves high performance metrics: sensitivity (95.98%), specificity (97.08%), accuracy (96.52%), and Matthews Correlation Coefficient (0.93). These results indicate the model’s effectiveness in quickly diagnosing DFU infections, highlighting its potential as a valuable tool for medical professionals.

CONCLUSION:

The hybrid CNN and Swin Transformer architecture effectively combines strengths from both models, enabling accurate classification of DFU images as infected or non-infected, even in complex scenarios. The use of Grad CAM provides insights into the model’s decision process, aiding in identifying infected regions within DFU images. This approach shows promise for enhancing clinical assessment and management of DFU infections.

Keywords

Diabetic foot ulcer infection classification convolutional neural network Swin Transformer Grad CAM

1. Introduction

DFU represent a significant challenge in diabetes care on a global scale, affecting a vast number of individuals. Within the healing process of DFUs, infection and ischemia are prevalent issues that can escalate to the point of necessitating limb amputation and hospitalization [1]. Post-amputation, a patient’s life quality often experiences a rapid decline, with their life expectancy typically falling below three years [2]. This statement underscores the significant impact that amputation can have on an individual’s overall well-being, including physical, mental, and social aspects. However, it is crucial to note that various factors, such as prosthetic use, rehabilitation, psychological support, and health management, can significantly influence the quality of life after amputation. While the statement highlights a general trend, individual experiences and outcomes can vary, and healthcare professionals strive to provide the necessary support to help patients navigate the challenges associated with limb loss. This paper specifically delves into diabetic foot infection, a primary concern as it manifests in 40%–80% of DFU cases [3]. The presence of bacteria in the wound is the primary instigator of infection, leading to cellular damage. Given their typical location on the lower extremities, particularly the soles of the feet, DFUs are particularly vulnerable to infection.

DFU are a major global health concern, affecting an estimated 40–60 million people worldwide, with the potential to increase to 700 million by 2045 due to the rising number of individuals with diabetes [4]. This burden is disproportionately felt in developing nations where access to proper foot care and specialized healthcare is limited, with 80% of people with diabetes residing there based on International Diabetes Federation, 2023 [5]. The economic impact of DFU is significant, exceeding $10 billion annually in the US alone (American Diabetes Association, 2023) [6]. This includes direct costs of treatment and indirect costs due to job losses and productivity declines (World Health Organization, 2023) [7]. Despite advancements in treatment, approximately 15% of diabetic foot ulcers progress to amputations, highlighting the need for improved strategies (International Wound Journal, 2023) [8].

Several factors increase the risk of developing diabetic foot ulcers, including:

•
Poor diabetes management: Uncontrolled blood sugar levels significantly increase the risk.
•
Peripheral neuropathy: Nerve damage in the feet reduces sensation, leading to unnoticed injuries.
•
Poor foot circulation: Reduced blood flow impairs healing and increases the risk of infection.
•
Smoking: Smoking significantly hinders wound healing and worsens foot ulcers.

Prevention is crucial in reducing the burden of diabetic foot ulcers. Regular self-inspections, good hygiene, and proper footwear are essential for preventing foot ulcers. Maintaining blood sugar control through medication, diet, and exercise is also key to reducing the risk of complications [9]. Regular check-ups with a healthcare professional for foot examinations are vital for early detection and treatment of any issues.

In recent years, advancements in medical imaging techniques have provided a promising avenue for the early detection and characterization of infection in DFU. Various imaging modalities, including but not limited to infrared thermography, Doppler ultrasound, magnetic resonance imaging (MRI), and positron emission tomography (PET), have shown potential in assessing blood flow, tissue perfusion, and oxygenation levels. Moreover, new paths for the automatic and precise identification of infection in imaging data have been made possible by the combination of artificial intelligence (AI) and machine learning algorithms [10]. Computer vision techniques are revolutionizing medical diagnosis, offering the potential to significantly improve both accuracy and speed within clinical practice [11]. Image processing plays a vital role in assisting physicians with disease identification, finding applications in surgery, biological imaging, and treatment planning [12]. With the use of both high-level analytical methods and low-level image processing techniques (such as edge detection, region growth, and line recognition), this cutting-edge technology tackles particular problems in healthcare [13].

Accurately extracting pertinent information from medical images is a crucial step in building these disease detection systems. Deep learning (DL) technologies have emerged as powerful tools, replacing traditional image processing methods [14]. These technologies empower computers to recognize and learn meaningful features associated with specific medical conditions. This breakthrough has opened new doors for automated disease diagnosis [15]. In particular, CNN based models have proven incredibly effective in advancing healthcare diagnostics, with applications spanning areas such as: neuropathic ulcer detection, breast tumor segmentation and classification, cancer cell diagnosis, genetic pattern analysis and image segmentation [16]. The development of different healthcare systems for the identification of ailments like infection DFU, neuro-pathic foot ulcers, and abrasions heavily relies on deep learning. A computer vision-based system was developed for diagnosing patients with neuropathic ulcers, focusing on detecting infection and ischemia [17]. Earlier attempts to detect infection in wounds employed traditional machine learning techniques that relied on handcrafted features. These methods primarily focused on identifying signs of infection through visual examination of wounds and confirming diagnoses with wound cultures. However, these approaches have limitations, as they can be time-consuming and dependent on the clinician’s experience. Hsu et al. [18] used Support Vector Machines (SVM) with clustering to classify infections using machine learning. They demonstrated the method’s effectiveness in categorizing complex infection patterns. Wag et al. [19] proposed a deep learning method for wound segmentation and infection detection. This method jointly learns task-relevant visual characteristics, showing efficacy on a large-scale wound database. Alzubaidi et al. [20] introduced DFU QUTNet for neuropathic ulcer classification. They achieved high precision (95.4%), recall (93.6%), and F1-score (94.5%). Al-Garaawi et al. [21] developed a novel CNN method for DFU classification using texture features from RGB images. Their technique achieved an F1 score of 95.2% and an AUC of 98.1%. EL Kady et al. [22] enhanced DFU diagnosis accuracy with a hybrid ResNet50-GAN model. This model outperformed ResNet50 alone, achieving an accuracy of 0.84. Cui et al. [23] created a CNN-based algorithm for ulcer segmentation. It surpassed U-net and SVM, achieving an accuracy of 93.4% and reducing false positives by 0.9%. Reyes-Luévano et al. [24] proposed DFU_VIRNet for classifying DFU using visible and thermography images. The model obtained an F1-score of 0.83 and an AUC of 0.91. Hong S et al. [25] applies machine learning to assess recurrence risk of diabetic foot ulcers (DFUs) in elderly diabetic patients, achieving 93% accuracy with SVM. Findings underscore SVM’s effectiveness in predicting DFU recurrence, aiding targeted interventions for improved patient outcomes.

Building upon these methodologies, this study presents a novel DL model that combines CNN with Swin transformer for reliable categorization of infection in DFUs from patient foot images. This represents the first-ever application of a hybrid model a combination of CNN with Swin transformer model for infection detection in DFUs. We evaluated the suggested model’s efficacy by contrasting it with many baseline models on a range of performance measures, including accuracy, recall, precision, F1-score, MCC, and AUC-ROC. The goal of this study was to create an automated deep learning system that would enable medical professionals to classify infections in DFU quickly and accurately. Utilizing its attention-based architecture, the Shifted Window (Swin) Transformer [26] excels in image classification tasks, offering state-of-the-art results. As a Vision Transformer (VIT) network grounded in self-attention, its application in hyperspectral image (HSI) classification is limited. To adapt it for HSI classification, the structure was modified. By integrating the single-stage Swin Transformer into the network architecture, specifically leveraging its attention module, the aim is to enhance DFU classification accuracy and robustness. The experiments demonstrate the effectiveness of Transformer-based architectures in DFU image classification, even with small datasets, without complex model designs.

The notable contributions of this research endeavor are outlined as follows.

•
The block6a_expand_activation mid-layer of the Efficient Net-B0 model is integrated by the Swin DFU network into the Swin Transformer Blocks, removing the requirement for manual intervention and specialized knowledge.
•
Swin DFU-Net autonomously acquires relevant features, showcasing superior accuracy compared to existing methods while maintaining operational efficiency.
•
The hybrid model streamlines workflow by combining feature extraction and classification into a cohesive process, ensuring high performance in classifying infection-positive and infection-negative images with minimal misclassification.
•
On a sizable DFU dataset, Swin DFU applies a delineation-free feature acquisition technique, exhibiting mastery by extracting pertinent features straight from the full image without the need for operator intervention or further segmentation procedures.

This research study follows the following structure: Section 2 delves deeply into the Swin DFU framework and covers data preprocessing and model building techniques. In Section 3, the experimental results are presented together with a comprehensive explanation and a complete description of the dataset and metrics employed. Section 4 provides a summary of the study’s limits and comments, and Section 5 concludes this research work.
2. Methodology

For clinical purposes, creating an accurate diagnosis system to classify DFU is a difficult undertaking. This section explains how we combined global and local context information using a Shifted Window Transformer to provide a hybrid Swin DFU framework for the infection classification problem. The planned Swin DFU-Net framework is depicted on Fig. 1. The framework does the following tasks: DFU sample extraction, label loading, sample pre-processing, sample splitting, data augmentation, training with Swin transformer (using the efficient model’s output as input), and statistical parameter analysis on the test set. Algorithm 1 provides a step-by-step flow of Swin DFU net classification task.

Algorithm 1: Transformer network for Infection classification and detection in DFU
Input: DFU Infection training dataset $\gamma$ 1 (80%), Validation dataset $\gamma$ 2 (10%) and testing dataset $\gamma$ 1 (10%)
$\beta$ represents the batch size
$\sigma$ represents epochs
$\lambda$ represents optimizers
$\eta$ represents learning rate
$\varepsilon$ represents the number of samples grouped into a mini-batch size.
Output: $\Omega=$ weight the Swin transformer model
Begin: • Resize each image in the training dataset to a size of 384x384. • Apply augmentation techniques on train data. • Use the Efficient Net model for feature extraction. • Provide patch extraction and embedding to the Swin Transformer. • Apply patch merging to the Swin Transformer. • Set the training parameters: $\beta$ , $\sigma$ , $\lambda$ , $\eta$ , and $\varepsilon$ . • Train the Swin DFU Net framework and calculate the primary weight. • For each value of $\sigma$ from 1 to $\sigma$ , do the following: – Select the mini-batch size $\varepsilon$ . – Perform forward propagation and evaluate the categorical loss function. – Perform backpropagation and improve the weights. End

2.1 DFU infection dataset

The DFU dataset, including images of infection, is accessible upon completion of a dataset release agreement [27]. It comprises two main classes, each with two subclasses: ischemia positive/negative and infection positive/negative. For this study, focus is on distinguishing infection from non-infection, enhancing overall classification. The dataset contains 5890 images, evenly split between infection positive/negative, labeled by a senior physician. The data is divided into 3534 images to training (Tr) and 1178 images to validation sets (Va) and 1178 images to testing sets (Te). Example images of infection and non-infection classes are depicted in Fig. 2. Table 1 furnishes additional details pertaining to the dataset.

Table 1
Pre-augmentation data overview table

Dataset	Label	Tr	Va	Te
DFU	Infection positive	1767	589	589
	Infection negative	1767	589	589
	Total image samples	3534	1178	1178

Figure 1.

Proposed Swin DFU-Net infection classification.

Figure 2.

Sample infection (positive and negative) in DFU.

2.2 Dataset preprocessing

A number of preprocessing processes were carried out before the DFU images were entered into the Swin DFU Net model. Each image in the DFU dataset was downsized to 384 $\times$ 384 pixels while maintaining the RGB channel format in accordance with transfer learning principles. The images were transformed into NumPy arrays in order to maximize training performance and minimize memory use. Furthermore, to ensure randomness in the training process, the dataset was shuffled to handle unordered samples effectively. Given the extensive parameters of deep neural networks, a substantial amount of training data is essential. To bolster the dataset, data augmentation techniques were employed. These methods alleviate overfitting issues, strengthen model robustness, and improve the Swin DFU-Net framework’s performance, among other things. For augmentation, a vectorized implementation of CutMix [28] and MixUP [29] was utilized. This augmentation process substantially expanded the DFU dataset, increasing its size from 3534 to 10310 samples.

Figure 3.

Augmented images using CutMix techniques.

Figure 4.

Augmented images using MixUP techniques.

•

CutMix involves randomly selecting a patch from one image and pasting it onto another image, while simultaneously adjusting the labels proportionally to the area of the patch. This encourages the model to learn robust features across different regions of the input images. Augmented images using CutMix is shown in Fig. 3.

•

MixUP blends pairs of images and their corresponding labels in a weighted manner, generating new synthetic data points. By interpolating between samples, MixUP promotes smoother decision boundaries and improves the generalization ability of the model. Augmented images using CutMix is shown in Fig. 4.

2.3 EfficientNet-B0 model

EfficientNet-B0, a CNN design belonging to the EfficientNet family, was developed to achieve a harmonious balance between model performance and computational efficiency [30]. This is achieved by scaling the network breadth, depth, and resolution consistently using the compound scaling technique. Due to its success in obtaining a favorable trade-off between model size and accuracy, it has been widely employed in a variety of computer vision tasks. The model demonstrates superior performance with high accuracy and efficiency, outperforming existing models while using significantly fewer parameters. EfficientNet-B0 showcases state-of-the-art accuracy on transfer learning datasets, surpassing other models with a substantial reduction in parameters. Its scalability and balance in network width, depth, and resolution contribute to its success in achieving optimal performance across different applications.

The EfficientNet B0 model is integrated into the workflow, with particular focus on its mid layer, block6a_expand_activation, serving as the input to the Swin Transformer Blocks. The input shape of the EfficientNet B0 model is configured to be 384, resulting in an output size of 24 at the layer block6a_expand_activation. This output serves as the pivotal input for the subsequent Swin transformer processing. In essence, the features extracted by block6a_expand_activation layer, are harnessed as rich representations to be further refined and contextualized within the Swin Transformer architecture. This seamless integration ensures that the Swin transformer framework can effectively leverage the hierarchical feature representations derived from EfficientNet B0, thus enhancing the overall model’s performance and capacity to capture intricate patterns and nuances within the input data.

2.4 Swin transformer model

The Swin Transformer (ST) is an advanced deep learning architecture with a distinctive hierarchical structure and attention mechanism. Unlike conventional transformers that process input tokens sequentially, the ST introduces a hierarchical organization by grouping tokens into patches and conducting multi-scale processing, enabling the model to capture both local and global dependencies efficiently. The ST integrates a novel self-attention mechanism called Shifted Windows, which enhances computational efficiency by reducing the cost of attending to distant tokens, achieving remarkable scalability. This design allows the ST to handle larger input sequences without significantly escalating computational demands [31]. These innovative features make the ST a potent tool for various tasks such as imaging application, promising significant advancements in research and practical applications. It distinguishes itself by its versatility in accommodating diverse input modalities and effectively scaling to different input sizes. Unlike conventional CNNs, which are tailored for fixed-size inputs, the ST adeptly processes inputs of variable sizes without necessitating resizing or cropping. This inherent flexibility renders it particularly suitable for tasks involving high-resolution images or sequences of varying lengths. Additionally, the ST exhibits robust generalization, owing to its self-attention mechanism’s ability to capture long-range dependencies and contextual information adeptly.

Table 2
Swin DFU network architecture overview: Layers, shapes, and connections

Layer (type)	Output shape	Param #	Link to
Input layer (A)	(None, 384, 384, 3)	0	–
EfficientNet (B)	(None, 24, 24, 672), (None, 12, 12, 1280)	4049571	(A)
Patch extract (C)	(None, 144, 2688)	0	(B)
Patch embedding (D)	(None, 144, 64)	181312	(C)
tf.cast TFOpLambda (E)	(None, 144, 64)	0	(D)
Swin blocks sequential (F)	(None, 144, 64)	33544	(E)
Patch merging (G)	(None, 36, 128), (None, 12, 12, 64)	32768	(F)
Swin head sequential (H)	(None, 128)	512	(G)
Conv head sequential (I)	(None, 1280)	0	(B)
tf.concat TFOpLambda (J)	(None, 1408)	0	(H), (I)
Dense layer (H)	(None, 2)	2818	(J)
Total params: 4300525
Trainable params: 4258246
Non-trainable params: 42279

2.5 Model tuning

In this research, a detailed approach is presented for fine-tuning the Hybrid-EfficientNet-Swin-Transformer model, a SOTA known for its effectiveness in various computer vision tasks. The approach involves several meticulous steps to adapt the pre-trained model to new domains seamlessly. Initially, the target dataset is carefully preprocessed and curated with precise labeling and data integrity. Advanced deep learning frameworks are used to load the model and adjust its final classification layer to suit the specific research task requirements. The model architecture is provided in Table 2. To facilitate effective learning, appropriate loss functions and optimizers are selected based on the dataset’s characteristics. During the training process, a combination of techniques is employed, including gradual layer unfreezing, adaptive learning rate scheduling, and careful data augmentation to enhance model performance and prevent overfitting. Various fine-tuning strategies such as regularization and transfer learning paradigms are explored to improve the model’s generalization capabilities. Through iterative experimentation and thorough hyperparameter optimization, the approach is refined to achieve superior performance benchmarks. The fine-tuned model is rigorously evaluated on dedicated validation sets, with metrics like accuracy, precision, recall, and F1-score meticulously scrutinized to validate its effectiveness and robustness. This methodology not only delivers promising results but also provides valuable insights into the intricate process of fine-tuning complex deep learning architectures, thereby enhancing their practical utility in real-world applications.

3. Dataset exploration, metrics, and results evaluation

3.1 Dataset exploration

The DFU dataset, a collaborative effort between Manchester Metropolitan University and Lancashire Teaching Hospitals, focuses on DFU images capturing infection and ischemia cases [27]. It aims to enhance DFU detection through advanced methodologies. Images were taken using Kodak DX4530, Nikon D3300, and Nikon COOLPIX P100 cameras, with close-up shots of the foot from a standardized distance. DFU regions were extracted, and natural data augmentation techniques applied. The dataset of 5890 images is split into training, testing, and validation sets as mentioned in Table 1.

3.2 Experimental framework

The Keras library’s capabilities were used in the conception of the Swin DFU framework, allowing for the smooth integration of Python with neural networks (NNs). Detailed specifications of the experimental setup, including various parameters, are meticulously documented in Table 3.

Table 3
Model experimental setup with parameters

Object	Hardware configuration
CPU	12th Gen Intel Core i5-1240P
GPU	NVIDIA Tesla P100
RAM	16 GB
Storage	500 GB SSD

3.3 Swin-DFU metrics

The assessment of the Swin DFU framework entailed a comprehensive analysis of various statistical metrics, including accuracy (ACC), Kappa statistic, F1-score (FS), precision (PRE), Matthew’s correlation coefficient (MCC), specificity (SPE), sensitivity (SEN), and recall (REC). These metrics were computed from the confusion matrix, which records the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). Throughout the evaluation, TP represents accurately detected positive foot skin, while TN signifies precisely identified negative foot skin. Conversely, FP indicates incorrectly classified positive foot skin, and FN denotes misidentified negative foot skin. This thorough assessment framework offers insights into the performance and capabilities of the Swin DFU-Net model, aiding its refinement and optimization for future applications in diabetic foot ulcer detection and diagnosis. The evaluation of the Swin DFU framework involved analyzing various statistical metrics derived from the confusion matrix. These metrics are detailed in Table 4. These equations offer a quantitative measure of the Swin DFU framework’s performance in detecting positive and negative infection presence, enabling a comprehensive assessment of its efficacy in infection classification in DFU.

Table 4
Classification performance measure for Swin-DFU framework

Performance measure	Formula
Sensitivity (SEN)	$\frac{TP}{(TP+FN)}$
Specificity (SPE)	$\frac{TN}{(FP+TN)}$
Precision (PRE)	$\frac{TP}{(TP+FP)}$
Accuracy (ACC)	$\frac{(TP+TN)}{(TP+TN+FP+FN)}$
F1 Score (FS)	$\frac{2TP}{(2TP+FP+FN)}$
Matthews correlation coefficient (MCC)	$\frac{(TP\times TN-FP\times FN)}{\sqrt{(TP+FP)\times(TP+FN)\times(TN+FP)\times% (TN+FN)}}$

Table 5

Configuring training parameters for the Swin-DFU framework

Parameter	Value
Optimizer	Adam
Learning rate	0.01
Learning scheduler	Cosine
Epoch	20
Grad accumulation	8
Learning rate decay epoch	2.4
Learning rate warmup epoch	5
Learning rate decay factor	0.97

3.4 Model training and parameter optimization

Appropriate hyperparameter settings were used to train the Swin DFU net. During the framework training process, the optimizer and gradient descent loss functions are two essential components for choosing the hyperparameters. When deciding on our framework’s Adam optimizer function [32] is chosen, for its adept handling of sparse gradients within sizable datasets, achieved by amalgamating the advantageous traits of RMSProp and AdaGrad optimizers [33]. Given the classification nature of our model, we adopted the categorical cross entropy loss function for its suitability to this task. The research incorporates the gradient accumulation technique to address the computational complexities of transformer-based architectures, which often limit batch size due to their intensive computational requirements. By partitioning the batch into smaller mini-batches and aggregating their outcomes, the method computes loss and gradients, postponing parameter adjustments until multiple mini-batches have been processed. This effectively alleviates memory constraints, allowing for training with reduced memory consumption similar to larger batch sizes. The Swin DFU framework uses a learning rate scheduler with hyperparameters such as decay epoch, warmup epoch, and decay factor to balance the trade-off between rapid convergence and model stability. The detailed parameters used in the Swin DFU framework are provided in Table 5.

Figure 5.

Training plot for proposed Swin DFU Net framework: Insights from training vs. validation accuracy and loss plots.

Table 6

The performance measures of binary classification of infection by Swin-DFU Net and baseline models

Paper	Classification method	Accuracy (%)	Precision (%)	Specificity (%)	Sensitivity (%)	F1-Score (%)	MCC
Proposed framework	Swin DFU-Net	96.52	97.12	97.08	95.98	96.55	0.93
Liu et al. [34]	Efficient Net B0	96.80	96.00	97.00	96.00	96.80	–
Goyal et al. [35]	Efficient Net B0	72.70	73.50	74.40	70.90	85.00	0.45
Das et al. [36]	Res7Net	80.00	79.70	80.20	80.40	79.80	0.60
Yogapriya et al. [37]	DFINET	91.98	93.72	93.49	90.57	92.12	0.84
Ahsan et al. [38]	ResNet50	84.76	83.27	85.71	89.80	85.00	0.94
Nora et al. [39]	Color (RGB)-texture	74.00	74.00	75.00	74.00	–	–
	coded based CNN
Huong et al. [40]	EfficientNet-B0	91.00	92.00	92.00	89.00	–	–

Figure 6.

Proposed Swin DFU Net confusion matrix.

In Fig. 5, the simulation results of the proposed Swin-DFU net framework are showcased, extracted from the training phase of the model. Additionally, it verifies that overfitting did not occur during the Swin-DFU model training process. The loss curve demonstrates a rapid and smooth convergence, with minimal oscillations. Figure 6 illustrates the comprehensive analysis of the proposed Swin DFU net model through the presentation of the confusion matrix. By harnessing the feature-rich representations extracted from the EfficientNet model, our approach adeptly discerns the nuanced presence of infection within DFU. The confusion matrix serves as a visual testament to the model’s discriminative prowess, showcasing its ability to accurately classify a considerable number of true positives and true negatives. Particularly noteworthy is the model’s remarkable capacity for correctly identifying instances of infection within the DFU dataset, underscoring its potential for clinical application.

Furthermore, the Swin DFU-Net deep learning model exhibits an admirable degree of consistency and stability, as evidenced by the achieved AUC (Area Under the Curve) value of 0.92, as demonstrated in Fig. 7. This elevated AUC score serves as a strong indicator of the model’s robust performance across a spectrum of classification thresholds, affirming its reliability in distinguishing between positive and negative cases of infection. In addition to the AUC metric, a comprehensive evaluation of the model’s performance encompasses a range of metrics including sensitivity, specificity, precision, accuracy, and the F1 score. Notably, Table 6 shows our model achieves a sensitivity of 95.98%, indicating its ability to accurately detect true positive cases, while maintaining a high specificity of 97.08% to effectively identify true negative cases. The precision of 97.12% further underscores the model’s precision in classifying positive cases, while the overall accuracy of 96.52% reaffirms its overall efficacy in classification tasks. Additionally, the F1 score of 96.55% provides a balanced measure of the model’s performance, considering both precision and recall. Lastly, the Matthews Correlation Coefficient (MCC) of 0.93 further validates the reliability and effectiveness of our model, providing a single, comprehensive metric that takes into account all four elements of the confusion matrix. This holistic assessment underscores the Swin DFU-Net model’s potential as a valuable tool in clinical decision-making processes related to the diagnosis and management of DFU.

Figure 7.

ROC curve for the Swin DFU-Net model.

4. Discussion

The prevalence of infection in DFU and related complications can often be attributed to a lack of adherence to a healthy lifestyle and inadequate safe-ty measures among individuals affected by diabetes. Proper management of diabetes, including adherence to a balanced diet and implementation of appropriate safety precautions, is crucial in mitigating the risk of infection development in DFU patients. Further-more, providing comprehensive guidance and sup-port from caregivers to diabetic patients plays a vital role in addressing these challenges effectively with proactive patient education and the integration of advanced technological tools, holds immense promise in the effective classification and management of infection in diabetic foot ulcers. Moreover, advancements in technology have paved the way for innovative techniques in the diagnosis, treatment, and prediction of infection in DFU cases. By leveraging technological solutions such as machine learning algorithms and imaging modalities, healthcare professionals can enhance their ability to accurately classify infection conditions, leading to timely interventions and improved patient outcomes. Ensuring proper management of diabetes, coupled with proactive patient education and the integration of advanced technological tools, holds immense promise in the effective classification and management of infection in DFU

This multifaceted approach can significantly contribute to reducing the burden of infection complications and improving the overall quality of care for DFU patients. The proposed framework is based on the Swin transformer model based on the hybrid CNN approach. The proposed framework can be leveraged across other deep learning applications. By utilizing this framework, healthcare professionals can make better decisions on medical screening. As a result, this network will be essential to improving clinical research and healthcare systems’ efficacy and efficiency. In comparison to previous models, the suggested model’s results show that it is more resilient and reliable. The performance of the suggested Swin DFU-Net model is compared to earlier studies that used the same dataset but had different topologies, depths, and parameter values (see Table 6). By collecting global features rather than just local ones, this method which combines inputs from the EfficientNet b0 model mid-layer to the transformer block offers a number of benefits. It effectively mitigates overfitting and minimizes the possibility of noise impacting the minority label. The Table 6 compares the performance of various deep learning models for infection classification in DFU images. The Swin DFU-Net framework, proposed in this study, achieved a high It effectively mitigates overfitting and minimizes the possibility of noise impacting the minority label. The Table 6 compares the performance of various deep learning models for infection classification in DFU images.

The Swin DFU-Net framework proposed in this study demonstrates remarkable performance in classifying DFU images as infected or non-infected. The framework achieved an accuracy of 96.52%, precision of 97.12%, specificity of 97.08%, sensitivity of 95.98%, and F1-Score of 96.55%. These metrics indicate its high effectiveness, surpassing other models such as Res7Net, ResNet50, and the Colour (RGB)-texture coded based CNN. It is also closely comparable to the EfficientNet B0 model proposed by Liu et al. [34], which achieved an accuracy of 96.80%. Goyal et al. [35] introduced the SuperPixel Colour Descriptor, an innovative feature descriptor that uses a carefully designed machine learning (ML) technique. An Ensemble CNN model was subsequently implemented to improve ischemia and infection detection, achieving a 73% accuracy for ischemia classification, surpassing traditional ML methods. Das et al. [36] developed the DFU-SPNet model, which specializes in classifying DFU images rather than healthy ones. The model utilizes multiple kernel sizes and incorporates three tiers of convolutional layers, enabling it to capture both global and local features. Trained with the SDG optimizer, the model attained an impressive AUC of 97.4%. Similarly, Yogapriya et al. [37] presented an efficient diabetic foot infection classification model using deep learning techniques. The proposed DFINET model employs a unique architecture with convolutional filters of various sizes and three tiers of convolutional layers, achieving an AUC of 97.4%. Ahsan et al. [38] introduced various end-to-end CNN-based deep learning architectures for categorizing infection and ischemia utilizing the DFU2020 dataset. Through weight fine-tuning and the implementation of affine transform techniques for input data augmentation, the ResNet50 model achieved impressive accuracies of 99.49% for ischemia and 84.76% for infection classification. These findings collectively underscore the potential of advanced deep learning methods, such as the Swin DFU-Net framework, in improving the accuracy and reliability of DFU image classification. The superior performance of Swin DFU-Net, particularly in comparison to several established models, highlights its potential for clinical application in DFU diagnosis and treatment planning. By effectively distinguishing between infected and non-infected DFUs, this framework can significantly aid healthcare professionals in making informed decisions, ultimately improving patient outcomes. Future research should focus on further validating these models with larger and more diverse datasets, exploring the integration of additional clinical parameters, and assessing the real-world impact of these advanced diagnostic tools in clinical settings. Despite this minor difference across metrics in the Swin DFU-Net framework demonstrates superior performance in the Grad CAM plot analysis indicates that the proposed Swin DFU-Net framework is more effective in identifying and highlighting the infected regions in DFU images compared to the Efficient Net B0 model.

Figure 8.

Overlaying activation maps of CNN model and Swin DFU Net model onto DFU images.

Grad-CAM [41] stands as a cornerstone methodology, offering unparalleled insights into the decision-making processes of deep CNNs. Through meticulous analysis of gradient information, Grad-CAM facilitates precise localization of relevant image regions crucial for classification. Its interpretability and versatility have made it an indispensable tool for understanding model behavior and guiding network optimization important for predicting the target. It becomes evident that the Swin-DFU Net exhibit a notable capacity for globally refining feature activation across the pertinent object, showcasing a propensity for holistic information processing. This stands in contrast to the Efficient Net model, which predominantly operates at a local level, as demonstrated in the preceding Fig. 8. This behavior underscores the inherent strengths of transformer architectures in capturing long-range dependencies and contextual information within the image data, thereby contributing to enhanced object recognition and understanding. In conclusion, the proposed Swin DFU-Net framework is a promising deep learning model for DFU infection classification, demonstrating high accuracy, precision, specificity, sensitivity, and F1-Score. The proposed framework has the potential to significantly improve the accuracy and efficiency of DFU infection diagnosis, leading to better patient outcomes.

4.1 Limitations

It is feasible to diagnose if the sample provided in this study exclusively points to an infection or not. However, the study does not facilitate real-time assessments of pain intensity or complexity levels. The datasets primarily include cases of visible infections that have undergone debridement, a process involving the removal of dead skin. It’s important to recognize that in some cases, infection may be present but not immediately visible or may necessitate analysis without undergoing debridement. Moreover, it’s relevant to acknowledge that although the analysis of infection was carried out separately, these conditions can coexist within the same wound. Thus, there is potential value in curating a dataset that includes images with multiple conditions occurring simultaneously within a single DFU image, thereby facilitating investigation into multi-label classification. Secondly, it’s important to recognize that the infection images utilized in the study were captured professionally with meticulous care. However, in the envisioned real-world scenario, these images will likely be captured by nurses, who may not possess advanced photography skills and could be considered novice photographers. Consequently, many infection images may not be perfectly centered, could be blurry, or might be captured from arbitrary angles. Incorporating such imperfect images into the dataset would enhance the model’s robustness and realism, enabling it to perform effectively in practical clinical settings.

5. Conclusion

In this study, we introduced the Swin DFU-Net framework, a hybrid model combining the Efficient Net and Swin Transformer architectures, for the binary classification of infection and non-infection in DFU images. We conducted a comprehensive evaluation of the proposed models and compared them to several baseline models, including Efficient Net B0, Res7Net, DFINET, ResNet50, and Color (RGB)-texture coded based CNN. The Swin DFU-Net models demonstrated exceptional performance in the binary classification of infection, surpassing all baseline models in infection classification accuracy. The Swin DFU-Net framework’s superior performance can be attributed to its ability to effectively leverage the strengths of both the Efficient Net and Swin Transformer architectures. The Efficient Net component provides high-resolution feature extraction, while the Swin Transformer component enables efficient modeling of long-range dependencies in the input data. This hybrid approach allows the Swin DFU-Net framework to accurately classify DFU images as infected or non-infected, even in complex cases where traditional classification methods may struggle. In conclusion, the Swin DFU-Net framework is a promising tool for the binary classification of infection in DFU images, demonstrating superior performance compared to several baseline models. The hybrid architecture of the Swin DFU-Net framework enables accurate feature extraction and long-range dependency modeling, resulting in high-accuracy classification of infected and non-infected DFU images. The promising outcomes of this study suggest that the Swin DFU-Net framework has the potential to significantly aid doctors in efficiently diagnosing DFU infections, leading to improved patient outcomes.

Funding

The authors did not receive any funding.

Author contributions

All authors contributed to the design and methodology of the study, assessment of the outcomes and writing of the manuscript.

Data availability

No datasets were generated or analyzed during the current study.

Footnotes

Conflict of interest

The authors do not have any conflicts of interest to report.

References

Mills

Sr Conte

Armstrong

Pomposelli

Schanzer

Sidawy

Andros

, Society for Vascular Surgery Lower Extremity Guidelines Committee. The society for vascular surgery lower extremity threatened limb classification system: Risk stratification based on wound, ischemia, and foot infection (WIfI). Journal of Vascular Surgery. 2014 Jan 1; 59(1): 220-34.

Bradley

Roxburgh

McMillan

Guthrie

. A systematic review of the neutrophil to lymphocyte and platelet to lymphocyte ratios in patients with lower extremity arterial disease. Vasa. 2024 Apr 2.

Richard

Sotto

Lavigne

. New insights in diabetic foot infection. World J Diabetes. 2011.

Jodheea-Jutton

Hindocha

Bhaw-Luximon

. Health economics of diabetic foot ulcer and recent trends to accelerate treatment. The Foot. 2022 Sep 1; 52: 101909.

International Diabetes Federation. IDF Diabetes Atlas, 10th ed. Brussels, Belgium: International Diabetes Federation. 2023.

American Diabetes Association. Economic costs of diabetes in the U.S. in 2023. Diabetes Care. 2023; 46(3): 364-372.

World Health Organization. Global report on diabetes. Geneva: World Health Organization. 2023.

International Wound Journal. Advances in diabetic foot ulcer treatment: A review. Int Wound J. 2023; 20(2): 250-259.

6 Top Tips to Prevent Diabetic Foot Ulcers: Midwest Institute for Non-Surgical Therapy: Vascular and Interventional Radiologists [Internet]. Available from: https://www.mintstl.com/blog/6-top-tips-to-prevent-diabetic-foot-ulcers.

10.

Zhu

Huang

Chen

Xie

Zhang

. Deep learning algorithms for COVID-19 detection based on chest CT and radiographs: A systematic review and meta-analysis. Acad Radiol. 2020; 28(1): 49-61.

11.

Gao

. Intelligent point of care testing for medicine diagnosis. Interdisciplinary Medicine. 2024 Jan; 2(1): e20230031.

12.

Hosseini

Asadi

Emami

Ebnali

. Machine learning applications for early detection of esophageal cancer: A systematic review. BMC Medical Informatics and Decision Making. 2023 Jul 17; 23(1): 124.

13.

Zolfaghari

Suravee

Riboni

Yordanova

. Sensor-based locomotion data mining for supporting the diagnosis of neurodegenerative disorders: A survey. ACM Computing Surveys. 2023 Aug 26; 56(1): 1-36.

14.

Khan

Kader

Islam

Rahman

Kamal

Toha

Kwak

. Machine learning and deep learning approaches for brain disease diagnosis: Principles and recent advances. Ieee Access. 2021 Feb 26; 9: 37622-55.

15.

Pyun

Kwon

Yoo

Kim

Gong

Yeo

Han

. Machine-learned wearable sensors for real-time hand-motion recognition: Toward practical applications. National Science Review. 2024 Feb; 11(2): nwad298.

16.

Kugunavar

Prabhakar

. Convolutional neural networks for the diagnosis and prognosis of the coronavirus disease pandemic. Visual Computing for Industry, Biomedicine, and Art. 2021 May 5; 4(1): 12.

17.

Joy

Oommen

Appukuttan

Thomas

. Reliability of the university of Texas classification in predicting the outcomes of diabetic foot in a tertiary Center in Kerala: A prospective observational study. Asian Journal of Medical Sciences. 2023 Nov 1; 14(11): 174-80.

18.

Hsu

Shih

Chang

Lai

. Automatic wound infection interpretation for postoperative wound image. In Eighth International Conference on Graphic and Image Processing (ICGIP 2016). SPIE. Vol. 10225, 2017 Feb 8. pp. 438-443.

19.

Wang

Yan

Smith

Kochhar

Rubin

Warren

Wrobel

Lee

. A unified framework for automatic wound segmentation and analysis with deep convolutional neural networks. In 2015 37th Annual International Conference of the Ieee Engineering in Medicine and Biology Society (EMBC). IEEE. 2015 Aug 25. pp. 2415-2418.

20.

Alzubaidi

Fadhel

Oleiwi

Al-Shamma

Zhang

. DFU_QUTNet: Diabetic foot ulcer classification using novel deep convolutional neural network. Multimedia Tools and Applications. 2020 Jun; 79(21): 15655-77.

21.

Al-Garaawi

Ebsim

Alharan

Yap

. Diabetic foot ulcer classification using mapped binary patterns and convolutional neural networks. Computers in Biology and Medicine. 2022 Jan 1; 140: 105055.

22.

El-Kady

Abbassy

Ali

. Advancing diabetic foot ulcer detection based on RESNET and GAN integration. Journal of Theoretical and Applied Information Technology. 2024 Mar 31; 102(6): 2258-68.

23.

Cui

Thurnhofer-Hemsi

Soroushmehr

Mishra

Gryak

Domínguez

Najarian

López-Rubio

. Diabetic wound segmentation using convolutional neural networks. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE. 2019 Jul 23. pp. 1002-1005.

24.

Reyes-Luévano

Guerrero-Viramontes

Romo-Andrade

Funes-Gallanzi

. DFU_VIRNet: A novel Visible-InfraRed CNN to improve diabetic foot ulcer classification and early detection of ulcer risk zones. Biomedical Signal Processing and Control. 2023 Sep 1; 86: 105341.

25.

Hong

Chen

Lin

Xie

Chen

Xie

. Personalized prediction of diabetic foot ulcer recurrence in elderly individuals using machine learning paradigms. Technology and Health Care. 2024 Apr 25 (Preprint); 1-2.

26.

Liu

Lin

Cao

Wei

Zhang

Lin

Guo

. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021. pp. 10012-10022.

27.

Yap

Cassidy

Pappachan

O’Shea

Gillespie

Reeves

. Analysis towards classification of infection and ischaemia of diabetic foot ulcers. In 2021 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE. 2021 Jul 27. pp. 1-4.

28.

Liu

Wang

Tan

. Harnessing hard mixed samples with decoupled regularizer. Advances in Neural Information Processing Systems. 2024 Feb 13; 36.

29.

Abhishek

Brown

Hamarneh

. Multi-sample ζ-mixup: Richer, more realistic synthetic samples from ap-series interpolant. Journal of Big Data. 2024 Mar 23; 11(1): 43.

30.

Deepak

Bhat

. Deep learning-based CNN for multiclassification of ocular diseases using transfer learning. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization. 2024 Dec 31; 12(1): 2335959.

31.

Yang

Liu

Zheng

Liu

Lou

Zhang

. Insulator Defect Detection Based on YOLOv8s-SwinT. Information. 2024 Apr 6; 15(4): 206.

32.

Arshad

Alsamhi

Qiao

Lee

. IOTM: Iterative Optimization Trigger Method a Runtime Data-Free Backdoor Attacks on Deep Neural Networks. IEEE Transactions on Artificial Intelligence. 2024 Apr 4.

33.

Kabiri

Ghanou

Khalifi

Casalino

. AMAdam: adaptive modifier of Adam method. Knowledge and Information Systems. 2024 Feb 27; 1-32.

34.

Liu

John

Agu

. Diabetic foot ulcer ischemia and infection classification using efficientnet deep learning models. IEEE Open Journal of Engineering in Medicine and Biology. 2022 Nov 21; 3: 189-201.

35.

Goyal

Reeves

Rajbhandari

Ahmad

Wang

Yap

. Recognition of ischaemia and infection in diabetic foot ulcers: Dataset and techniques. Computers in Biology and Medicine. 2020 Feb 1; 117: 103616.

36.

Das

Roy

Mishra

. DFU_SPNet: A stacked parallel convolution layers based CNN to improve Diabetic Foot Ulcer classification. ICT Express. 2022 Jun 1; 8(2): 271-5.

37.

Yogapriya

Chandran

Sumithra

Elakkiya

Shamila Ebenezer

Suresh Gnana Dhas

. Automated detection of infection in diabetic foot ulcer images using convolutional neural network. Journal of Healthcare Engineering. 2022; 2022(1): 2349849.

38.

Ahsan

Naz

Ahmad

Ehsan

Sikandar

. A deep learning approach for diabetic foot ulcer classification and recognition. Information. 2023 Jan 6; 14(1): 36.

39.

Al-Garaawi

Ebsim

Alharan

Yap

. Diabetic foot ulcer classification using mapped binary patterns and convolutional neural networks. Computers in Biology and Medicine. 2022 Jan 1; 140: 105055.

40.

Huong

Tay

Jumadi

Mahmud

Ngu

. DFU infection and ischemia classification: PSO-optimized deep learning networks. Applications of Modelling and Simulation. 2023 Oct 1; 7: 111-21.

41.

Selvaraju

Cogswell

Das

Vedantam

Parikh

Batra

. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 2017. pp. 618-626.

SwinDFU-Net: Deep learning transformer network for infection identification in diabetic foot ulcer

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSION:

Keywords

1. Introduction

Table 1 Pre-augmentation data overview table

2.4 Swin transformer model

Table 2 Swin DFU network architecture overview: Layers, shapes, and connections

3. Dataset exploration, metrics, and results evaluation

3.1 Dataset exploration

3.2 Experimental framework

Table 3 Model experimental setup with parameters

Table 4 Classification performance measure for Swin-DFU framework

5. Conclusion

Funding

Author contributions

Data availability

Footnotes

Conflict of interest

References

Table 1
Pre-augmentation data overview table

Table 2
Swin DFU network architecture overview: Layers, shapes, and connections

Table 3
Model experimental setup with parameters

Table 4
Classification performance measure for Swin-DFU framework