Abstract
In image-based diagnosis, machine learning recently showed great potential, particularly in the detection of cancer, the identification of tumour cells, and the diagnosis of COVID-19. Similar methods could be used to detect monkeypox on human skin, however there isn’t a public dataset with data on monkeypox that can be used to train and evaluate machine learning models. In order to address this, the dataset “Monkeypox2022” has been created and made available on GitHub (
Introduction
The COVID-19 pandemic in 2020 was a global alert and a call to action, but the emergence of monkey pox in 2022 in several countries highlighted another global threat [1]. Monkeypox is a viral disease caused by the Zoonotic Orthopox virus, belonging to the Poxviridae family and Orthopox virus genus, closely related to cowpox and smallpox. The virus primarily spreads from monkeys and rodents to humans, but transmission between humans is also common [2]. Monkey pox was first discovered in a monkey in a laboratory in Copenhagen, Denmark in 1958 [3]. The preliminary human case of monkeypox was reported in 1970 [4] and it was speculated that the people living near tropical rainforests are at risk of contracting monkey pox through close contact with infected individuals, animals, or objects [5]. Symptoms of monkey pox include fever, body pain, fatigue, and later, the appearance of a red bump on the skin [6].
Despite being less contagious than COVID-19, the incidence of monkey pox is on the rise. In the 1990s, there were only a few dozen cases reported annually in West and Central Africa [7]. However, by 2020, the number of cases had risen to 5,000. Previously thought to only occur in Africa, cases of monkey pox have now been reported in other part of world, causing growing fear and concern among the public, as seen through social media posts [8]. No specific treatment has been recommended for monkey pox, but two oral medications, Brincidofovir and Tecovirimat, are recommended for its treatment and prevention [9, 10]. In other countries, smallpox vaccines are given to those with monkey pox [11]. Diagnosing monkey pox requires examination of skin lesions and a patient’s exposure history, with electron microscopy being the definitive method. Polymerase Chain Reaction (PCR) [12] commonly used in COVID-19 diagnosis may also be used to confirm the presence of the virus [13].
No research has been found that shows the potential of Machine Learning (ML) in detecting monkeypox through image processing. The lack of research on using ML for diagnosing monkeypox through image processing is due to the fact that no public data set for ML model training and evaluation is available.Widespread virus exposure in many countries could lead to development of an appropriate ML method with a data set in the future.
There is limited research on using ML for diagnosing monkeypox through image processing due to the lack of a public dataset. Taking all these possibilities into consideration, we concluded that there is an imperative need to compile a data set that contains images of patients who are suffering from the monkeypox disease. Dr. Joseph Cohen’s creation of the Monkey pox data set at the beginning of the COVID-19 outbreak served as an inspiration for us when we decided to create the monkeypox data set. Dr. Cohen assembled the data set of 98 chest X-ray samples from a variety of different sources, including websites and journals [23]. Taking this into consideration, we have compiled a dataset of 164 images of patients suffering from monkeypox, which after data augmentation techniques, resulted in 553 total samples. This difference is due to the fact that our data set was created more recently. Different studies carried out in the beginning of COVID-19 make use of restricted datasets and emphasize the importance of transfer learning approaches [14, 15, 16, 17, 18]. These studies were undertaken by numerous researchers. Our dataset is expected to aid researchers and practitioners in constructing a model for diagnosing monkeypox. The technical contribution consists:
The first publicly accessible collection of images of monkeypox was made by combining images acquired from many sources (including websites and news organisations). It is stored in the GitHub repository: To detect patients with monkeypox based on photos, a low-modified VGG16 model was used. Locally Interpretable Model-agnostic Explanations (LIME) are used to create the explanation of analysis in order to validate the findings.
The manuscript is organized into six sections, including a review of deep learning, transfer learning, and LIME in Section 2, a condensed description of the experiment’s methodology in Section 3, findings in Section 4, discussion in Section 5, and a summary of results and potential avenues for further investigation in Section 6.
Machine Learning (ML) has proven useful in diverse fields, including medical imaging and disease diagnosis. ML’s features allow it to effectively and efficiently solve medical imaging problems. As an example, Miranda and Felipe (2015) developed a computer-aided diagnostic system for breast cancer diagnosis utilizing fuzzy logic, which mimics the thinking of radiologists and saves time. The system outputs cancer detection results based on specific criteria such as contour, shape and density [18].
Ardakani et al. (2020) assessed 10 distinct ML models using a small data set consisting of 108 COVID-19 samples and 86 Non-COVID-19 samples, and attained an accuracy of 99 percent [19]. Further, Wang et al. (2020) achieved 73.1 percent accuracy based on a proposed inception based novel model [20]. Sandeep et al. (2022) used CNN model for detecting a number of skin ailments and showed that using an existing VGG Net version, it is possible to diagnose skin diseases with a degree of accuracy of 71%. In contrast, they recommended strategy that produces the best results, displaying higher performance by reaching accuracy of 78%. Velasco et al. proposed an app and discovered an accuracy of about 94.4 percent in identifying people with chickenpox symptoms [21]. Roy et al. (2019) were able to recognize a number of skin diseases, including acne, chickenpox, cellulitis, candidiasis, and others, using a variety of segmentation algorithms [22].
The rapid spread of the monkeypox virus across multiple countries highlights the importance of detecting individuals with potential infection. The increasing pressure on clinical diagnostics due to the epidemic has led medical experts to believe that AI technology could help ease this burden by analysing visual data [13]. The aim is to enhance the level of medical care provided to COVID-19 patients in hospitals through these measures. However, at present, there is no publicly available data set for monkeypox. The potential advantages of using an AI-based method to rapidly detect and eradicate the monkeypox illness are therefore difficult to understand [14, 15, 16, 17, 18].
A crucial component of the research gap that is being discussed here was collecting patient photographs that show monkeypox. However, lesser availability of the photographs of patients having monkeypox could be an issue but for initial test cases, this should not pose any issue. At the same time, the content of the database will be continuously updated with new data that has been given by numerous organisations based in numerous nations all over the world.
Methodology
This section covers the various steps followed in methodology of collection of the dataset and application of models.
Data collection
Various steps followed for data collection.
Following processes are followed in the process of data collection.
The dataset was obtained from multiple online sources due to the absence of a trustworthy medical dataset available to the public. The initial dataset was constructed through the use of Google search engine as illustrated in Fig. 1, which outlines the different steps involved in the data search process. A more in-depth explanation of these steps follows. The process of gathering data for non-monkeypox samples uses a comparable method and involves searching for terms such as “Blisters” and “Macules”, as well as normal images such as those of faces, legs and hands without any signs of illness. Dataset size is increased by manually collecting more normal photos from individuals without skin disease symptoms, who have given their consent through a signed permission form.

Figure 2 demonstrates the dataset of the images collected from various sources.
Table 1 displays the features of the datasets generated in this study. However, there are only 21720 samples., classical machine learning and transfer learning can be used to create a disease diagnosis model, as demonstrated by earlier COVID-19 studies that used as few as 40–100 samples and deep learning models [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24].
Total samples
For effective training, deep learning networks like CNN, RNN, and GANs require vast data sets. However, transfer learning and conventional ML can produce good results with fewer samples. For data augmentation in this study, the Keras Image Data Generator is used. It provides a variety of options, including rotation, scaling, and flipping [25], to improve image data. Following advice from the reference [26], Table 2 displays the parameters that were employed.
Percentage of applied data augmentation
Percentage of applied data augmentation
Proposed modified VGG16 model.
In the beginning, transfer learning was used in a pilot test to gauge how well machine learning models performed on the generated data set. Initial testing was conducted with a customized VGG16 model [27]. A pre-trained architecture, an updated layer, and a prediction class make up the core model. High-dimensional features are recognized by the pre-trained architecture, and the updated layer then processes the fresh data. The used modified VGG16 model is depicted in Fig. 3. It comprises 16 CNN layers with unique filter widths and stride values [28]. The model starts with an input layer, followed by two 3
A SHAP extension made exclusively for deep learning models is called DeepSHAP. It makes use of a SHAP algorithm variation that takes into account the deep neural network input spaces’ multidimensionality and frequently complex structures. DeepSHAP makes it easier to assign estimates to multiple input characteristics, allowing researchers and practitioners to understand the importance of various features in the context of deep learning models. This transparency makes it easier to troubleshoot and improve these models and fosters trust in their practical use.
DeepLIFT is a deep learning interpretation method that tries to explain specific predictions by linking the model’s output to specific input features. This is done by comparing the forward pass activations of each neuron or network node to reference activations produced from a baseline input. DeepLIFT can emphasize the most important input data elements that significantly influence the prediction, making it simpler to understand how the model decides what to predict. This approach is very useful for understanding the importance of features in intricate deep learning models.
Another method for model-independent interpretability that sheds light on machine learning model predictions is CXplain. To give a comprehensive comprehension of each forecast, it combines Shapley values and contrastive explanations. CXplain pinpoints the fundamental traits that result in distinctive predictions by contrasting a specific instance to a comparable alternative. This technique helps in understanding the rationale behind particular choices and identifying potential biases or inaccuracies in the model’s predictions. CXplain is a useful tool for assuring the equity and openness of AI applications.
In the first phase of parameter tuning, the batch size, number of epochs, and learning rate are adjusted to optimize the performance of the proposed model. The initial choices for the study were made based on the experimental parameters given in Table 3.
Parameters for the VGG16 model
The Adam optimization algorithm, known for its superior performance compared to other algorithms, was used to minimize the model loss. Adam often shows remarkable results in binary image classification, making it a useful choice for this study [30].
LIME (Local interpretable model agonistic explanations) is an effective technique that aids in the analysis of model predictions and offers perceptions into the “Blackbox” of CNN models [31]. Its effectiveness in describing image classification complexities has led to widespread use in recent years [32]. LIME employs super pixels, segments of an image with a high amount of data, to assist in initial predictions. Table 4 shows the LIME parameters used in this research to compute super pixel values. These parameters have been proven beneficial in various image prediction studies [33].
Parameter used to identify super pixels
Parameter used to identify super pixels
Experiment setup
In order to conduct the experiment, a standard laptop with Windows 10, 16 GB of RAM, and an Intel Core I7 processor was used. The entire experiment was conducted five times, and the outcome that is being presented here is the average of the outcomes from all five computing runs.
Dataset
Samples of images for dataset
Samples of images for dataset
The data set used for the investigation is shown in Table 5. It includes 43 samples of monkeypox and 47 samples of chickenpox for Study One and 587 samples of augmented monkeypox and 1167 additional samples for Study Two. The data samples for Chickenpox, Measles, and Normal were combined to form the “Other” class. Training, testing ratio is 80:20, a common procedure in machine learning [34, 35, 36].
Key performance indicators used to assess the efficacy of a machine learning model include accuracy, recall, precision, F1-score, sensitivity, and specificity [37, 38, 39, 40, 41, 42].
The proportion of accurate predictions the model makes compared to all of its forecasts is known as accuracy. Recall quantifies the percentage of true positive predictions to all real positive cases, while precision determines the ratio of true positive forecasts to all positive predictions generated by the model. The F1-score is a composite metric that combines precision and recall harmonic means to get a single number that sums up the model’s overall performance. Sensitivity measures the proportion of true positive forecasts to all real positive situations, whereas specificity gauges the proportion of real negative predictions to all real negative cases. These measurements assist in offering a thorough picture of a model’s effectiveness and data classification accuracy.
Table 6 showcases the performance metrics (accuracy, recall, precision, F1 score, sensitivity, and specificity) for Study One and Study Two with 95% confidence interval (CI) due to limited samples. Study One outperforms Study Two in terms of accuracy with up to 9% higher accuracy in the train set. The models exhibit high sensitivity but low specificity. Despite this, their overall performance is still noteworthy.
Results of performance metrics of studies one and two
Results of performance metrics of studies one and two
Accuracy and loss curves for study one and two.
Visualization of confusion matrices.
Figure 4 depicts the degree of accuracy and loss that the modified VGG16 model had throughout each epoch when it was applied to (a) Study One and (b) Study Two. Additionally, the results indicate that the model in Study One performed better than the model in Study Two as the accuracy for Study One was higher and reached its maximum earlier. Overfitting can also be observed in Study Two as the accuracy on the validation set decreases after reaching its maximum value at 100 epochs. These results provide a visual representation of the model’s performance and help to highlight the strengths and weaknesses of each model.
The results of the performance evaluation of the proposed model are depicted in Fig. 5, which shows the confusion matrices of both Study One and Study Two. Study One has the lowest error rate (2.7%) among all the studies. On the other hand, Study Two has the highest error rate (12.33%) due to the unbalanced ratio of the data set. The monkeypox cases to other cases ratio was 1:1.98, which could have contributed to the higher rate of misclassifications in Study Two. The figures provide a visual representation of the model’s accuracy and helps to identify the areas for improvement.
AUC curves for (a) study one, and (b) study two.
Classification results.
The modified VGG16 models’ ROC (Receiver Operating Characteristics) curve analysis results are shown in Fig. 6. It includes the AUC, TPR, and FPR metrics (True Positive Rate, False Positive Rate). Study One’s train set had the highest AUC at 0.972, while Study Two’s test set had the lowest at 0.748. A high AUC (close to 1) indicates better model performance, while a low AUC (close to 0.5) shows poor performance. The low AUC score of 0.748 in Study Two’s test set shows that the model’s prediction capabilities are unreliable.
Figure 7 shows the classification results found by the proposed models using LIME, depicted visually.
In this research, we created a new dataset for categorizing Monkeypox infections using image analysis techniques. A modified version of VGG16 was used and its accuracy was tested in two separate studies, resulting in an accuracy of roughly 0.83 to 0.085 on a small dataset and 0.78 with 0.022 on an unbalanced dataset. The model’s reliability was also analyzed using LIME, which is a popular explainable AI approach. The new dataset provides a valuable opportunity for further research and development of image-based tools for Monkeypox diagnosis. However, there is currently no accessible image dataset for monkeypox to compare the model’s performance with others. Following are few of the drawbacks of the study.
Lack of large and diverse data sets: The modest size of the data sets used in the research can influence the model’s generalization capacity, which could cause it to perform poorly on larger and more varied data sets. Limited generalizability: Since the study only considered the classification of monkeypox, other types of skin illnesses may not be included in the model’s scope of applicability. Inadequate validation: The model’s validation was only conducted using the test set, which had a small sample size, and as a result, the outcomes might not accurately reflect the model’s genuine performance. Explainability limitations: Although LIME was used to explain the model’s predictions, its interpretation might still be arbitrary and might not fully explain the model’s behavior.
As a result, additional study is required to overcome these constraints, which can be accomplished by gathering more varied data sets, verifying the model using larger and more varied data sets, and developing more sophisticated interpretability approaches.
The goal of the research was to address the shortage of data on patient photos infected with the monkeypox virus by creating a new, publicly available dataset obtained from open sources. The modified VGG16 model was used in both studies, and results showed the ability of the model to differentiate between patients with monkeypox symptoms and those without with accuracy ranging from 78% to 97%. An explainable AI approach LIME was used to provide insights into the model’s predictions. The finding inspires future research to employ transfer learning in clinical diagnostics and demonstrates the promise of AI-based technology in the early detection and prevention of monkeypox. The strategy described in the text for the scant data and methods of analysis can be helpful to medical professionals. To further improve the model, continuous collection of new patient data, evaluation on imbalanced data, comparison with other studies, and development of a mobile-based tool are recommended.
Ethical approval
This article does not contain any studies with animals performed by any of the authors.
Funding
This research is not funded by a government or non-governmental organization.
Data availability
Data is available as dataset “Monkeypox2022” and is made available on GitHub (
Author contributions
In this research all the authors has equal contribution.
Footnotes
Conflict of interest
The author declare that they have no conflict of interest.
