Real-time skin cancer detection: Optimizing YOLOv8 with CLEO for enhanced performance

Abstract

Skin cancer is one of the most common types of cancer, and early identification is key to better patient outcomes. An optimized YOLOv8, method of detecting skin cancer utilizing the CLEO (Chimpanzee Leader Election Optimization) optimizer to enhance performance is presented in this paper. YOLOv8 incorporates the CLEO optimizer to fine-tune the model's hyperparameters, enhancing the model's efficacy and precision to identify skin lesions. By leveraging YOLOv8's advanced object detection capabilities and CLEO's optimization strengths, this approach offers a highly effective solution in real-time, for automated skin cancer screening, potentially aiding dermatologists in early diagnosis and treatment planning. Extensive experimentation on publicly available ISIC (International Skin Imaging Collaboration), skin cancer dataset demonstrates that the proposed model improves the precision, accuracy, recall, and mAP (mean Average Precision) rates by 2.1%, 2.1%, 6.5%, and 1.1% as compared with the unoptimized algorithm. Further, YOLOv8 integrated with CLEO optimizer attains superior detection accuracy and quicker inference times in comparison to the other cutting-edge methods for skin cancer detection. The analysis can set the stage for practical use.

Keywords

artificial intelligence deep learning optimization YOLOv8

1 Introduction

Melanoma and non-melanoma are two forms of skin cancer.¹ While melanoma is extremely invasive, non-melanoma rarely spreads to other areas.² Skin cancer, particularly melanoma, has seen a significant rise globally, with early diagnosis being critical for successful treatment. Traditional diagnostic methods, such as biopsy and dermatoscopy, are time-consuming, often reliant on expert interpretation, and prone to variability. With advances in artificial intelligence (AI), particularly in the field of deep learning, automated detection systems offer an opportunity to assist dermatologists in diagnosing skin cancer more accurately and efficiently. In this context, object detection models like YOLO (You Only Look Once) have proven efficient in recognizing different kinds of skin lesions. The development of YOLO versions can be seen in the timeline depicted in Figure 1. However, several challenges are faced by various current methods for detecting skin cancer. The model generalization is limited by the scarcity of large, annotated datasets with diverse lesion types. The model predictions can be biased due to class imbalance, where malignant cases are under-represented. Variations in imaging conditions, such as lighting, resolution, and acquisition methods, can hinder consistent detection. Finally, ensuring real-time processing while maintaining high accuracy poses computational challenges. Addressing these issues is essential for robust and reliable skin cancer detection systems.

Figure 1.

Timeline of YOLO versions.

YOLOv8, the latest version in the YOLO family, introduces substantial improvements in speed, accuracy, and real-time detection capabilities, making it an idyllic candidate for medical image analysis. However, despite its powerful architecture, the performance of YOLOv8 can be further enhanced by optimizing its hyperparameters to suit specific medical tasks like skin cancer detection. Optimization is significant because medical images, particularly those of skin lesions, are complex, with high variability in terms of color, texture, and shape, which makes the detection task challenging.

This article uses the YOLO-based methodology to overcome the shortcomings of the current skin cancer identification techniques This study proposes an optimized YOLOv8 model for skin cancer detection, using the CLEO optimizer. The CLEO optimizer is designed to fine-tune the model's parameters systematically, leading to improvements in both detection accuracy and computational efficiency. By integrating CLEO with YOLOv8, we aim to maximize the model's capability to differentiate between malignant and benign skin lesions, ensuring reliable detection across different types of non-cancerous and cancerous lesions. Since skin cancer detection is a challenging task, no study has actually compared the performance of YOLOv5, YOLOv7, and YOLOv8 algorithms for skin cancer detection. Furthermore, this paper is the first to fine-tune the hyperparameters of YOLOv8 using CLEO optimizer to identify the skin lesions.

The primary contribution of this work is to develop an enhanced skin cancer detection system by leveraging the strengths of YOLOv8 for real-time detection in conjunction with the CLEO's ability to optimize model performance. Through comprehensive experimentation and evaluation on publicly available skin cancer datasets, we demonstrate that the proposed approach not only improves detection accuracy but also significantly reduces inference time, making it a viable solution for clinical application. This optimized YOLOv8 model offers a promising step toward automated skin cancer detection systems that can assist in early diagnosis, reduce workload, and support decision-making in dermatology. The major contributions of the proposed study are as follows:

Optimized YOLOv8 model for skin cancer detection: CLEO algorithm is applied to optimize YOLOv8 model for skin cancer detection. CLEO fine-tuned the hyperparameters of YOLOv8 to make the model more performative.

Detection in real-time: The proposed framework is highly oriented towards real-time application because it is capable of low inference times up to 8 ms. The model achieves high detection accuracy, providing the ability to process at real-time speeds, thus very well-suited to clinical applications.

Contribution to medical artificial intelligence: This work significantly contributes to medical imaging artificial intelligence field by highlighting the successful usability of optimization algorithms, such as CLEO, for deep learning frameworks. These methods can be used for enhancing the performance of diagnostic tools in medical imaging.

The remainder of this manuscript is structured as follows: Section 2 reviews the relevant literature, while Section 3 outlines the proposed approach for skin cancer detection. Section 4 presents the experiments conducted and discusses the resulting findings. Finally, Section 5 offers conclusions drawn from the study.

2 Related literature

An overview of advanced deep learning techniques is provided in this section to detect skin cancer specifically focusing on methods utilized for localizing skin lesions.³ Potential cancerous areas in an image is identified by these techniques, aiding early detection and treatment. In the field of dermatology, Convolutional Neural Networks (CNNs) are the cornerstone for the task of image analysis. Utilizing traditional CNN architectures, like ResNet, AlexNet,^4–7 and VGGNet,⁸ classification is performed. The differentiation amid benign and malignant lesions in images is performed exceptionally well by extracting features by these models. Impressive accuracy has been delineated by the VGGNet to classify skin lesions by using small convolutional filters and deep architecture.^9–12 Vanishing gradient issue is mitigated by the ResNet by employing residual connections. As a result, deeper networks are constructed to capture intricate patterns in skin images.^13–16 Latest developments in CNNs have substantially improved the accuracy of skin lesion detection.^17–20 To precisely locate lesions, segmentation models have been utilized. To segment lesions from surrounding skin, U-Net^21–24 and Mask R-CNN^25,26 models have earned traction. Fine details and high-level features are captured by using encoder-decoder structures in these architectures, making them apt for accurate localization tasks. Pixel-wise classification is the explicit intent of these architectures.

Another prevalent approach involves the use of transfer learning techniques. Transfer learning enables the models to learn robust features, benefitted by the large-scale image classification task, which can be adapted to detect skin lesions. Few authors have taken pre-trained models (e.g., VGG16 and ResNet) and fine-tuned them on dermatology datasets using transfer learning.^9,14,20 The large-scale image classification tasks benefit this technique, enabling the models to learn robust features, adapted to detect skin lesions. Training time and model generalization are significantly reduced by this approach. Notable achievements have been shown on public datasets such as the ISIC archive for the task of skin cancer detection by models such as InceptionV3²⁷ and EfficientNet.²⁸ Further, ensemble methods that combine predictions from multiple models have been shown to produce very good results, as they are able to increase overall precision and reduce false positives.^29–31 These architectures also incorporate attention mechanisms to increase relevance and reduce noise to improve their focus on certain regions of interest, for further refining detection capabilities.^32–34 CNN architectures utilize attention mechanisms to focus on the most relevant features of an image.³⁵ This capability can be especially helpful in detecting skin lesions, where malignancy is indicated by subtle visual cues. Segmentation performance is increased by a variant of U-net variant which incorporates attention mechanism to concentrate on significant areas.^36,37

The evolution of accurate and efficient methods for skin lesion detection have been paved by the recent developments in YOLO. The YOLO framework has accrued popularity because it is capable of real-time object detection and it provides a very high accuracy rate. For binary classification tasks, YOLOv2 has shown good performance including distinguishing melanoma from benign lesions.³⁸ Multi-class recognition is enabled by YOLOv3, escalating upon the capabilities of its predecessors. It is adept at classifying different types of skin lesions in a single framework, like squamous cell carcinoma, basal cell carcinoma, and melanoma.³⁹ Improved detection rates across various skin lesion types have been shown by both YOLOv4 and YOLOv5.^40,41 Experiments show that YOLOv7 is better than its earlier versions in skin cancer detection tasks, attaining high mAP on standard datasets.⁴² Highly recent advancements in deep learning and object detection have improved the accuracy and decision-making in skin cancer detection. YOLOv7-XAI is introduced by Veeramani and Jayaraman⁴³ which combines YOLOv7 with explainable AI for multi-class skin lesion diagnosis to ensure fair and transparent decisions in clinical settings. Similarly, YOLOSkin was proposed by Aishwarya et al.,⁴⁴ which is a fusion framework that uses optimized YOLO detectors for deployment on edge devices such as the Nvidia Jetson Nano, attaining improved performance and accessibility for real-time applications. The potential of YOLO-based models to advance dermatological diagnostics is highlighted by these studies. Various approaches that use YOLO for skin cancer detection have been reviewed in this survey highlighting methodologies, performance, and challenges.

The ongoing development of YOLO models holds promise for improving early detection and treatment outcomes for skin cancer, ultimately benefiting patient care.

3 Methodology

3.1 Architecture of YOLOv8

YOLOv8 is the latest iteration in the YOLO (You Only Look Once) family of object detection models, known for its efficiency in real-time object detection tasks. Building on the advancements of earlier versions (YOLOv3, v4, v5, and v7), YOLOv8 introduces several architectural improvements designed to enhance performance in terms of speed, accuracy, and versatility. The input layer of YOLOv8 processes images of varying sizes, typically resized to a square shape (e.g., 640 × 640 pixels). This preprocessing step ensures that the input is standardized for the model to detect objects at multiple scales, which is particularly useful in dealing with medical images such as skin lesion datasets, where lesions vary in size and shape. The architecture of YOLOv8 is shown in Figure 2.

Figure 2.

The architecture of YOLOv8, displaying its modular design with backbone, neck, and head components.

The backbone of YOLOv8 is responsible for feature extraction from the input image. The backbone consists of multiple layers of convolution, the C2F (CSP bottleneck with 2 convolutions and Fused) module, activation functions (such Leaky ReLU) and batch normalization, which helps capture low-level to high-level features such as edges, textures, and objects of interest. Feature extraction efficiency is improved by the C2F module while reducing computational complexity. It is built on the previous versions of YOLO by integrating CSPNet (Cross Stage Partial Network) concepts and residual connections to better capture diverse and fine-grained features, enhancing the ability of the model for object detection tasks. CSPNet structure, bottleneck block and 2 convolutions are the key components of the C2F module. The CSPNet structure splits the input feature map into two parts, one for feature extraction, and another for bypasses processing. Feature extraction is performed by one part through a series of convolutional layers. These convolutions are bypassed and combined (fused) with the processed feature map later by the other part. Computational overhead is reduced preserving rich information by this strategy improving both the model's learning capacity and its efficiency. A small number of convolutional filters in intermediate layers comprise the “bottleneck” part of C2F to reduce the dimensionality of the feature map temporarily. Efficient computation by curtailing the number of parameters while preserving the ability to learn significant patterns in the data is ensured by the bottleneck structure. There are two consecutive convolutional layers in the module to extract high-level features (such as object boundaries and shapes) and low-level features (such as textures and edges). Non-linearity is introduced and the learning process is stabilized by batch normalization and activation functions (such as SiLU or Leaky ReLU) following the convolutions. The output is fused together in the C2F stage after the split path and the convolutional path, combining the information from both processed and unprocessed data streams. The model's capability to learn more diverse features is enhanced by this fused output, without adding needless complexity. Residual or skip connections are introduced by the C2F module, similar to ResNet's residual blocks, allowing the skipping of certain layers and adding them directly to the output. As an outcome of this vanishing gradients outcome is tackled, and the flow of gradients is improved during the backpropagation, leading to more stabilized and efficient training process.

The overall computational load is reduced by C2F via splitting the input into two paths, one that performs processing and one that bypasses convolutions, so as to make the model faster without losing accuracy. Rich features are captured by the model by combining both processed and unprocessed data via merging of two paths at the end in order to detect objects at various scales, shapes, and contexts. The gradient flow during training is improved by residual connections to accelerate convergence and stabilize training, ensuring that deeper layers receive more informative gradient updates. The C2F module's ability to handle both low-level and high-level features is extremely valuable in skin cancer detection task, where fine-grained details like texture, color variations, and lesion boundaries are vital. Subtle variations amid benign and malignant lesions are efficiently captured by the module, maintaining computational efficiency to make the model suitable for real-time applications. The C2F module is portrayed in Figure 3.

Figure 3.

The C2F module, representing its structure within the YOLOv8 architecture.

The feature maps generated by the backbone are further refined by the neck of YOLOv8 before passing them to detection head. YOLOv8 typically uses Path Aggregation Network (PANet) or Feature Pyramid Network (FPN) to handle multi-scale feature representation. This is essential because objects in medical images can appear at different scales, and the model needs to detect both small and large skin lesions effectively. FPN combines low-resolution, semantically rich features with high-resolution features to improve detection at different scales. PANet enhances the information flow from the lower layers to the higher layers, improving the model's ability to detect small lesions or subtle variations in skin patterns.

Final object detection takes place in the head of the YOLOv8 model. Each object is classified and bounding box is predicted. Multiple convolutional layers are present in head to refine the output of the neck, generating three distinct scales of predictions. Objects are detected at small, medium, and large sizes corresponding to each scale. Anchor-free detection is supported in YOLOv8, where object location is predicted independently of predefined anchor boxes. Hence, computational complexity is reduced and speed is improved, which is beneficial for medical imaging where object shapes and sizes can differ substantially. Bounding boxes, confidence scores, and class probabilities are predicted by the YOLO head for each detected object. Prediction process in YOLOv8 can be decoupled into classification and localization heads. Performance can be improved by this separation, specializing object classification and bounding box regression independently. The flexibility of YOLOv8 in terms of model size and complexity, is useful for skin cancer detection where a balance between speed and accuracy is essential. When the YOLOv8 model is tailored for skin cancer detection, skin lesions of various sizes and shapes are identified by the multi-scale object detection capability of the model. Features like lesion edges, color variations, and textures, which are key indicators of skin cancer are effectively extracted by the backbone and neck structure. Irregular nature of skin lesions is well adapted by the anchor-free option of the YOLOv8 model, leading to improved prediction accuracy.

Therefore, integration of advanced techniques such as CSPNet, PANet, and anchor-free detection in YOLOv8's architecture yields a model that is both fast and highly accurate, constituting it an idyllic contender for medical image analysis tasks such as skin cancer detection.

3.2 CLEO

An optimization algorithm called CLEO inspired by the behavior of chimpanzees in nature, is used for enhancing the performance in fine-tuning complex models. To effectively optimize hyperparameters is the primary objective of the algorithm, enhancing convergence speed and accuracy in a variety of applications, including deep learning and machine learning jobs. CLEO starts by initializing a population of potential solutions. A troop of chimpanzees model the population in CLEO, where each member of the population represents a candidate solution to the optimization problem. Each candidate solution represents a set of hyperparameters (or model weights) that can be optimized. These candidates are usually initialized randomly within a predefined search space, ensuring diversity in the initial population. The troop is split into two groups: followers and leaders. The leaders are individuals with the best fitness values, and they guide the troop's movement in the solution space. The dynamics between the troop members are designed to simulate how chimpanzees elect and follow leaders based on social interactions and environmental cues. The “leader election” process is based on the principle that the best solution (or top solutions) is selected as leaders. These leaders influence the rest of the troop in exploring the search space. Each candidate solution in the population is evaluated based on a fitness function that measures its performance in the optimization task. The positions of all individuals are updated in each iteration. The algorithm of CLEO is depicted in Figure 4.

Figure 4.

The CLEO algorithm.

Similar to other evolutionary algorithms, CLEO focuses on balancing exploration (searching for global solutions in the broader search space) and exploitation (refining existing solutions to find the local optima). Chimpanzees in the troop dynamically adjust their behavior based on the leader's performance. If the leader's fitness improves, the troop follows closely (exploitation). If the leader fails to improve, the troop may explore other regions of the search space to elect a new leader (exploration).

In the context of YOLOv8 for skin cancer detection, CLEO is used to optimize various hyperparameters, such as learning rate, batch size, anchor sizes, and model architecture components (e.g., depth and width). The CLEO finds the best set of hyperparameters that maximize detection accuracy while minimizing computational cost. By using CLEO to fine-tune YOLOv8, the overall model can achieve better performance in detecting subtle skin lesions, balancing high detection rates with efficient resource use.

3.3 Dataset used

The Skin Cancer ISIC dataset is one of the largest and most comprehensive publicly available datasets for skin cancer research. It is widely used in dermatology and computer vision research for developing and testing algorithms aimed at detecting skin cancer, particularly melanoma, which is one of the deadliest forms of skin cancer. The dataset consists of high-quality dermoscopic images of different types of skin lesions, including both benign and malignant conditions. The images are collected from various clinical sources, research institutions, and dermatology practices worldwide, ensuring diversity in skin types, lesion types, and other real-world variations like lighting and image quality. The dataset covers a wide range of skin lesion categories, such as melanoma, nevus (moles), basal cell carcinoma, squamous cell carcinoma, seborrheic keratosis, actinic keratosis, and dermatofibroma. The images have been annotated by the expert dermatologists. These annotations are further supported by histopathological validation when available, ensuring high-quality labels.

The ISIC dataset has evolved over time with the introduction of multiple annual challenges (since 2016), expanding in size. For example, the ISIC 2020 challenge dataset had over 33,000 images. The images in the ISIC dataset are primarily dermoscopic, meaning they are taken using a dermatoscope, which allows for more detailed and magnified views of the skin's surface compared to standard clinical images. This improves the visibility of patterns, textures, and colors in the lesion, which is crucial for diagnosing melanoma and other skin conditions. The ISIC dataset is regularly updated and expanded to include new images and annotations, making it a dynamic and valuable resource for advancing skin cancer detection technologies. Its rigorous curation ensures that it serves as a reliable standard for training and evaluating deep learning models.

3.4 Workflow process

The Figure 5 outlines the complete end-to-end pipeline for skin cancer detection using the YOLOv8 model optimized by the CLEO algorithm. The pipeline starts from the initialization and data preparation phases. The criteria for including or excluding images for skin cancer detection model depends on factors such as resolution, and diagnostic clarity, to ensure the dataset's quality and relevance. A minimum resolution threshold was established to retain fine details critical for accurate lesion analysis. Images below this resolution threshold were rescaled during pre-processing if the original quality allowed it without loss of diagnostic information. Images with clear, histopathologically validated diagnoses were prioritized to maintain ground-truth accuracy for supervised learning. Duplicate images or those showing the same lesion multiple times under similar conditions were excluded to avoid redundancy in the dataset. Images that did not align with the study's focus on dermoscopic skin cancer detection (e.g., non-dermoscopic photographs or unrelated skin conditions) were excluded. Images lacking reliable diagnostic labels or with conflicting annotations were excluded to prevent model confusion during training.

Figure 5.

Workflow algorithm.

Sampling of the data was carried out in order to achieve a balanced and representative dataset, reducing bias, and thus enabling reliable and equitable detection of various types of skin lesions. Stratified Sampling was carried out on the images based on the type of lesion (such as melanoma, benign keratosis, nevus). Common and rare skin cancer types are equally represented to improve the generalizability of the model. Augmentation techniques such as albumentations (e.g., rotation, flipping, zooming) were used to synthetically expand the dataset, specifically for minority classes. This not only addresses imbalance but also helps the model generalize better to unseen data. The dataset was split into training, validation, and testing subsets while maintaining equal distribution in each subset. The pipeline proceeds through training and evaluation of the model, and ends with visualizing the performance metrics. The evaluation results can then be used to fine-tune the model further. The process of hyperparameter optimization is depicted in Figure 6. CLEO ensures that the model converges towards optimal weights during training, potentially leading to better accuracy and generalization. This workflow aims to build a highly accurate and optimized model for identifying skin cancer lesions, using advanced object detection and optimization techniques.

Figure 6.

Workflow of CLEO in optimizing YOLOv8.

3.5 Implementation environment

The computer system that was utilized to perform the study had Intel Core i7 10th Generation processor with 2 TB SSD, 64 GB DDR4 3200 MHz RAM, and Nvidia RTX 3090 24 GB GDDR6X VRAM GPU. The operating system used was Ubuntu 20.04 LTS (64-bit). The code implementation was developed in Python 3.10 with PyTorch 1.13.1 framework. OpenCV was used for pre-processing the images and albumentations was used for image augmentation. The size of images used for training is 640 × 640. The batch size was set to 8. Scikit-learn library was used for computing metrics like precision, recall, and mAP.

4 Results and discussion

The Figure 7 shows four performance metrics plotted over the training process of the YOLOv8 model optimized by the CLEO algorithm for skin cancer detection. These graphs present the model's improvement as training progresses, reflecting its ability to detect and classify skin lesions from a dataset like ISIC.

Figure 7.

Results of YOLOv8 with CLEO on ISIS dataset.

Figure 8.

Precision-recall curve of YOLOv8 with CLEO on ISIS dataset.

The precision and recall curves approaching 1.0 indicate a model that makes very few false positives and false negatives, meaning it's accurate and sensitive in detecting skin cancer lesions. The top-left graph represents precision of the suggested model. The precision value starts low and increases as training progresses, indicating that the model becomes better at predicting true positives (skin cancer lesions) and reducing false positives as training continues. Toward the end of training, precision stabilizes near 0.97 (or 97%), which indicates that the model is highly accurate in its predictions, with very few false positives. It means the model is able to successfully detect all relevant instances of skin cancer (i.e., its sensitivity). Recall is presented in top-right subgraph. Recall also starts relatively low and steadily increases with more training epochs. By the end of training, the recall approaches 0.98, indicating the model is detecting almost all true positives, i.e., it's capturing most skin cancer lesions accurately with very few missed detections (false negatives). The mAP50 is presented in bottom-left graph. The mAP50 score rises quickly during the initial epochs, suggesting that the YOLOv8 model rapidly learns to correctly predict the locations of skin cancer lesions. It eventually stabilizes at a high value (0.84 or 84%), indicating that the model is highly accurate in predicting the presence and location of lesions with a good overlap between predicted and true bounding boxes. The mAP50-95 is illustrated in bottom-right graph. The mAP50-95 score also starts low but increases steadily, indicating the model becomes better at accurately localizing lesions at various IoU (Intersection over Union) thresholds over time. By the end of training, the mAP50-95 approaches around 0.48, which is a reasonable value considering the difficulty of consistently achieving near-perfect overlap (95%) in object detection tasks. The graphs show that the YOLOv8 model, optimized using the CLEO algorithm, improves steadily across all four metrics during training.

The Precision-Recall curve for the YOLOv8 model optimized with the CLEO algorithm is illustrated in Figure 8. The model effectively maintains high precision while increasing recall value, as suggested by the high and consistent curve, demonstrating a strong capability of the model to identify cancerous lesions while reducing false positives. At higher recall levels there is a significant drop in precision indicating a trade-off where the model starts to detect more false positives in an attempt to capture every true positive case. Overall, it is indicated by the PR curve that the CLEO optimization significantly contributes in enhancing the efficacy of the model in real-time skin cancer detection, striking a balance between the accuracy and sensitivity in a medical setting. It is suggested by these results that the optimized YOLOv8 model is capable of performing well in both identifying the presence of skin cancer lesions and localizing them accurately in the images, which qualifies the model for usage in assisting automated skin cancer diagnosis.

A critical role is played by the technique used in capturing the input data, on the performance of the model. The main key factors that influence the model's training and inference outcomes are resolution, imaging modality, and pre-processing methods.

Imaging Modalities: Various features of the skin lesion are captured using different imaging methods, such as dermoscopy, clinical photography, or histopathological imaging. The structural and pigmentation details are highlighted in dermoscopic images whereas contextual skin information is highlighted in clinical photographs. The types of features the YOLOv8 integrated with CLEO model learns depends on the choice of modality, which potentially affects its generalization to other modalities.

Resolution: Finer details are present in the high-resolution images, which are required for differentiating between subtle features of skin lesions and hence accurate detection. However, pre-processing and inference of high-resolution images increases the computational load.

On the other hand, low resolution images lead to loss of vital information, thus reducing the accuracy of detection. Resolution must be balanced with the computational efficiency for applications in real-time.

Pre-processing Techniques: Normalization, noise reduction, and data augmentation are some of the pre-processing techniques that greatly affect the robustness of the model. Effective pre-processing enhances feature extraction, minimizes artifacts, and ensures consistency across images. For instance, augmentation of the dataset with cropped, rotated or brightness-adjusted images improves the ability of the model to identify lesions under distinct conditions, thus accentuating its real-world applicability.

Table 1.

Performance metrics of various models.

Model	Accuracy	Precision	Recall	mAP50	mAP50-90	Inference time (ms)
YOLOv5	0.92	0.94	0.90	0.84	0.45	12
YOLOv7	0.94	0.95	0.91	0.88	0.46	10
YOLOv8 (Without CLEO)	0.95	0.96	0.92	0.90	0.47	8
YOLOv8 (With CLEO)	0.97	0.98	0.98	0.91	0.48	8
EfficientDet	0.90	0.91	0.88	0.83	0.44	13
Faster R-CNN	0.89	0.90	0.87	0.82	0.42	150
Mask R-CNN	0.91	0.92	0.89	0.85	0.45	200

Therefore, to achieve reliable and generalizable skin cancer detection results, it is essential to ensure diversity and quality in data acquisition and pre-processing.

4.1 Comparative evaluation

This section presents the performance comparison of YOLOv8 model integrated with CLEO optimizer for skin cancer detection with numerous deep learning-based and conventional methods that have been employed in the area of medical image analysis, specifically for skin cancer identification. Detection accuracy, precision, recall, mAP rates (mAP50 and mAP50-95) and inference time are the key parameters on which the comparison focuses for skin cancer detection. The comparative analysis of distinct models on the basis of key metrics is illustrated in the Table 1. Highest overall performance is demonstrated by the YOLOv8 model optimized with CLEO, showing an accuracy of 0.97, precision of 0.98, and recall of 0.98. It displays a mAP of 0.91 at 50% IoU and a score of 0.48 for the more challenging mAP50-95. The model has the lowest inference time, only 8 ms, outperforming other models, thereby making it highly reliable for real-time skin cancer detection and classification. Both YOLOv7 and YOLOv8 (without CLEO) also perform well, albeit they lag somewhat behind the optimized model. In comparison, YOLOv5 also perform well but show slightly lower accuracy and mAP values, with YOLOv7 performing better than YOLOv5. EfficientDet, Faster R-CNN, and Mask R-CNN models achieve decent accuracy and precision but falls short compared to YOLO models, particularly in recall and mAP metrics. The Faster R-CNN, EfficientDet and Mask R-CNN models, while achieving reasonable accuracy and precision, have significantly slower inference times (150 ms, 13 ms and 200 ms, respectively). Their mAP50-95 values (0.42 for Faster R-CNN, 0.44 for EfficientDet and 0.45 for Mask R-CNN) indicate that they are less effective than YOLOv8, especially in handling varying levels of object overlap. YOLO models are significantly faster, with inference times around 8–12 milliseconds, while the EfficientDet, Faster R-CNN and Mask R-CNN models are much slower, with times of 13 ms, 150 ms and 200 ms, respectively. The plotted Figure 9 provides a clear graphical comparison of the performance metrics across different models for skin cancer detection.

Figure 9.

Comparative analysis of various algorithms.

The CLEO optimizer likely helped tune the hyperparameters of the YOLOv8 model effectively, allowing for better generalization and faster convergence, as seen by the high values of precision, recall, mAP50, and mAP50-95. Overall, YOLOv8, especially when optimized with CLEO, stands out due to its balance between detection accuracy and speed, making it the most suitable for real-time skin cancer detection tasks.

4.2 Limitations

The proposed study presents significant advancements in accuracy, precision, and inference speed for skin cancer detection. However, the study has certain limitations that should be addressed:

Dataset Generalizability: The model's performance is evaluated on a specific dataset, potentially limiting its generalizability to other datasets or real-world scenarios where image characteristics may vary (e.g., lighting, resolution, or imaging modality).

Focus on Single Modality: The study primarily uses dermoscopic images. Real-world scenarios often involve other modalities, such as clinical photographs or histopathological images, which are not considered in this work.

Clinical Validation: While the model demonstrates strong performance in experimental settings, there is a need of clinical validation to assess its robustness and reliability in real-world clinical workflows.

Computational Resource Requirements: Despite achieving real-time inference speeds, the system relies on high-end hardware (e.g., NVIDIA RTX 3090), which may not be accessible in resource-limited settings.

Explainability Challenges: The study does not address the interpretability of the model's predictions, which is crucial for clinical adoption, as healthcare practitioners require insights into why specific predictions are made.

Potential Overfitting: Optimizing the YOLOv8 model with CLEO might lead to overfitting on the training data, especially if the dataset is not sufficiently diverse or balanced across all lesion types.

Future work could address these limitations by incorporating diverse datasets, including multiple imaging modalities, validating the model in clinical settings, exploring lightweight implementations, and integrating explainable AI techniques to enhance trust and usability.

5 Conclusions and future research

In this work, an optimized YOLOv8 model, enhanced by the CLEO algorithm for skin cancer detection was presented. The detection accuracy of skin cancer lesions was significantly improved by the integration of YOLOv8's strong object detection capabilities and CLEO's effective hyperparameter optimization. The ISIC dataset was used for experimentation purpose. High precision of 0.98 and a similar recall value of 0.98 was demonstrated by the optimized model, indicating minimum false positives. The mAP rates achieved are 0.91 and 0.48 for mAP@0.5 and mAP@0.5:0.95 IoU thresholds respectively. The results indicate that the model accurately detects and localize skin lesions over a wide range of IoU thresholds. A key role was played by CLEO optimizer to refine the training process of the model, facilitating faster convergence and better generalization. The optimized model has a lot of potential for clinical applications, assisting the dermatologists for automatic skin cancer detection in early stages. The model offers the potential to support non-invasive diagnostic methods reducing the need for repeated unneeded biopsies, owing to high detection accuracy and ability to handle complex lesion structures. This framework can be integrated into clinical workflows for real-time lesion analysis during routine check-ups or deployed in mobile health applications (nano version of YOLOv8), enabling individuals to monitor skin lesions using smartphone cameras, enhancing accessibility to dermatological care. Such capabilities are particularly valuable in remote or underserved areas with limited access to dermatology specialists.

Additionally, the system can act as a clinical decision support tool, providing dermatologists with objective, rapid analyses to complement their expertise, thereby reducing diagnostic errors and improving efficiency. Its high accuracy and efficiency make it an essential tool for advancing skin cancer detection in diverse healthcare settings.

Although the proposed YOLOv8 integrated with CLEO model performs well, several avenues for future work can further improve its applicability and accuracy. The generalization of the model across diverse populations and imaging conditions can be enhanced by training with a larger and more varied dataset that covers a wide range of skin tones, lighting conditions, and lesion kinds. For detection and diagnosis of skin cancers that may not be apparent with standard imaging techniques, the model can be expanded by integrating it with other medical imaging technologies, such as multispectral imaging. Incorporating multi-modal data, such as histopathological images and patient metadata, could lead to more comprehensive diagnostic capabilities. Additionally, efforts could be made in exploring lightweight implementations of the model to optimize the framework for deployment on resource-constrained devices, enabling its use in remote or low-resource settings. Further research could explore explainable AI techniques to increase transparency and trust in automated decisions, particularly in clinical environments. Lastly, integrating this framework with telemedicine platforms and longitudinal tracking systems could pave the way for continuous monitoring of skin lesions, facilitating proactive healthcare and personalized treatment strategies. Techniques can be devised to improve the prediction interpretability of the YOLOv8 integrated with CLEO model, which can offer clinicians with more transparent and understandable diagnostic insights, fostering greater trust in AI-assisted diagnosis. Therefore, the optimized YOLOv8 model can become an even more powerful tool in early skin cancer detection, ultimately improving patient outcomes through faster and more accurate diagnostics.

Footnotes

ORCID iDs

Priyanka Nandal

Navdeep Bohra

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Dwivedi

. Analyzing recent trends and public sentiment for internet of healthcare things and its impact on future health crisis. Internet Healthc Things: Mach Learn Secur Priv 2022: 95–112.

Toğaçar

Cömert

Ergen

. Intelligent skin cancer detection applying autoencoder, MobileNetV2 and spiking neural networks. Chaos Solitons Fractals 2021; 144: 110714.

Zafar

Sharif

, et al. Skin lesion analysis and cancer detection based on machine/deep learning techniques: a comprehensive survey. Life 2023; 13: 146.

Medhat

Abdel-Galil

Aboutabl

, et al. Iterative magnitude pruning-based light-version of AlexNet for skin cancer classification. Neural Comput Appl 2024; 36: 1413–1428.

Pavan

Kumar Reddy

Mohan Ghantasala

, et al. Enhanced skin cancer classification with AlexNet and transfer learning. In: 2023 international conference on sustainable communication networks and application (ICSCNA), 2023 Nov 15. IEEE, pp.1020–1024.

Bhuvaneswari

Raman

. Enhancing skin cancer diagnosis with AlexNet convolutional neural networks. In: 2023 international conference on integrated intelligence and communication systems (ICIICS), 2023 Nov 24. IEEE, pp.1–5.

Quishpe-Usca

Cuenca-Dominguez

Arias-Viñansaca

, et al. The effect of hair removal and filtering on melanoma detection: a comparative deep learning study with AlexNet CNN. Peer J Computer Science 2024; 10: e1953.

Tabrizchi

Parvizpour

Razmara

. An improved VGG model for skin cancer detection. Neural Process Lett 2023; 55: 3715–3732.

Ezzat

Mobtasem

Moustafa

, et al. Early skin cancer detection based on MobileNet & VGG-16. In: 2024 intelligent methods, systems, and applications (IMSA), 2024 Jul 13. IEEE, pp.384–389.

10.

Faghihi

Fathollahi

Rajabi

. Diagnosis of skin cancer using VGG16 and VGG19 based transfer learning models. Multim Tools Appl 2024; 83: 57495–57510.

11.

Alshehri

. Skin-NeT: skin cancer diagnosis using VGG and ResNet-based ensemble learning approaches. Trait du Signal 2024; 41: 1689–1705.

12.

Ingle

Shaikh

. Skin cancer recognition using CNN, VGG16 and VGG19. In: International conference on information and communication technology for intelligent systems, 2023 Apr 27. Singapore: Springer Nature Singapore, pp.131–144.

13.

Singh

Banerjee

Chakraborty

, et al. Classification of melanoma skin cancer using inception-ResNet. In: Frontiers of ICT in healthcare: proceedings of EAIT 2022, 2023 Apr 25. Singapore: Springer Nature Singapore, pp.65–74.

14.

Gouda

Amudha

. Skin cancer classification using ResNet. In: 2020 IEEE 5th international conference on computing communication and automation (ICCCA), 2020 Oct 30. IEEE, pp.536–541.

15.

Sambyal

Gupta

. Skin cancer detection using Resnet. In: Proceedings of the international conference on innovative computing & communication (ICICC), 2022 Feb 21.

16.

Mehra

Bhati

Kumar

, et al. Skin cancer classification through transfer learning using ResNet-50. In: Emerging technologies in data mining and information security: proceedings of IEMIS 2020, Vol. 2, 2021 May 5. Singapore: Springer Nature Singapore, pp.55–62.

17.

Adegun

Viriri

. Deep learning techniques for skin lesion analysis and melanoma cancer detection: a survey of state-of-the-art. Artif Intell Rev 2021; 54: 811–841.

18.

Jeyakumar

Jude

Priya

, et al. A survey on computer-aided intelligent methods to identify and classify skin cancer. Informatics 2022; 9: 99. MDPI.

19.

Gilani

Marques

. Skin lesion analysis using generative adversarial networks: a review. Multim Tools Appl 2023; 82: 30065–30106.

20.

Shah

Pandya

, et al. A comprehensive study on skin cancer detection using artificial neural network (ANN) and convolutional neural network (CNN). Clinical eHealth 2023; 6: 76–84.

21.

Miradwal

Mohammad

Jain

, et al. Lesion segmentation in skin cancer detection using UNet architecture. In: Computational intelligence and data analytics: proceedings of ICCIDA 2022, 2022 Sep 2, pp.329–340. Singapore: Springer Nature Singapore.

22.

Rajinikanth

Kadry

Damaševičius

, et al. Skin melanoma segmentation using VGG-UNet with Adam/SGD optimizer: a study. In: 2022 third international conference on intelligent computing instrumentation and control technologies (ICICICT), 2022 Aug 11. IEEE, pp.982–986.

23.

Bindhu

Thanammal

. Segmentation of skin cancer using Fuzzy U-network via deep learning. Meas: Sens 2023; 26: 100677.

24.

Yin

Zhou

Nie

. DI-UNet: dual-branch interactive U-net for skin cancer image segmentation. J Cancer Res Clin Oncol 2023; 149: 15511–15524.

25.

Pokhrel

Sanin

Sakib

, et al. Improved skin disease classification with mask R-CNN and augmented dataset. Cybernet Syst 2023: 1–15. https://doi.org/10.1080/01969722.2023.2296254

26.

Sugiharti

Arifudin

Efrilianda

, et al. Mask region-based convolutional neural network for detection of skin cancer. In: AIP conference proceedings, 2023 Jun 16, Vol. 2614. AIP Publishing.

27.

Mathina Kani

Parvathy

Maajitha Banu

, et al. Classification of skin lesion images using modified inception V3 model with transfer learning and augmentation techniques. J Intell Fuzzy Syst 2023; 44: 4627–4641.

28.

Kanchana

Kavitha

Anoop

, et al. Enhancing skin cancer classification using efficient net B0-B7 through convolutional neural networks and transfer learning with patient-specific data. Asian Pac J Cancer Prev: APJCP 2024; 25: 1795.

29.

Akilandasowmya

Nirmaladevi

Suganthi

, et al. Skin cancer diagnosis: leveraging deep hidden features and ensemble classifiers for early detection and classification. Biomed Signal Process Control 2024; 88: 105306.

30.

Mohanty

Das

. Skin cancer detection from dermatoscopic images using hybrid fuzzy ensemble learning model. Int J Fuzzy Syst 2024; 26: 260–273.

31.

Nandal

Pahal

Khanna

, et al. Super-resolution of medical images using real ESRGAN. IEEE Access 2024; 12: 176155–176170.

32.

Teodoro

Silva

Rosa

, et al. A skin cancer classification approach using gan and roi-based attention mechanism. J Signal Process Syst 2023; 95: 211–224.

33.

La Salvia

Torti

Marenzi

, et al. Edge and cloud computing approaches in the early diagnosis of skin cancer with attention-based vision transformer through hyperspectral imaging. J Supercomput 2024; 80: 16368–16392.

34.

Himel

Islam

Al-Aff

, et al. Skin cancer segmentation and classification using vision transformer for automatic analysis in dermatoscopy-based noninvasive digital system. Int J Biomed Imag 2024; 2024: 3022192.

35.

Reis

Turk

. Fusion of transformer attention and CNN features for skin cancer detection. Appl Soft Comput 2024; 164: 112013.

36.

Alahmadi

. Multiscale attention U-net for skin lesion segmentation. IEEE Access 2022; 10: 59145–59154.

37.

Cai

Hou

Zhou

. Intelligent skin lesion segmentation using deformable attention transformer U-net with bidirectional attention mechanism in skin cancer images. Skin Res Technol 2024; 30: e13783.

38.

Roy

Haque

Neubert

. Automatic diagnosis of melanoma from dermoscopic image using real-time object detection. In: 2018 52nd annual conference on information sciences and systems (CISS), 2018 Mar 21. IEEE, pp.1–5.

39.

Banerjee

Singh

Das

, et al. Diagnoses of melanoma lesion using YOLOv3. In: Computational advancement in communication, circuits and systems: proceedings of 3rd ICCACCS 2020. Springer Singapore, 2022, pp.291–302.

40.

Albahli

Nida

Irtaza

, et al. Melanoma lesion detection and segmentation using YOLOv4-DarkNet and active contour. IEEE Access 2020; 8: 198403–14.

41.

Elshahawy

Elnemr

Oproescu

, et al. Early melanoma detection based on a hybrid YOLOv5 and ResNet technique. Diagnostics 2023; 13: 2804.

42.

Datta

Prakash

Singh

. Skin cancer detection with edge devices using YOLOv7 deep CNN. In: International conference on data analytics & management, 2023 Jun 23. Singapore: Springer Nature Singapore, pp.55–63.

43.

Veeramani

Jayaraman

. YOLOv7-XAI: multi-class skin lesion diagnosis using explainable AI with fair decision making. Int J Imag Syst Technol 2024; 34: e23214.

44.

Aishwarya

Kannaa

Seemakurthy

. YOLOSkin: a fusion framework for improved skin cancer diagnosis using YOLO detectors on Nvidia Jetson Nano. Biomed Signal Process Control 2025; 100: 107093.