Abstract
Deep learning algorithms have become the most prominent methods for medical image analysis over the past years, leading to enhanced performances in various medical applications. In this paper, we focus on applying intelligent skin disease detection to face images, where the crucial challenge is the low availability of training data. To achieve high disease detection and classification success rates, we adapt the state-of-the-art StarGAN v2 network to augment images of faces and combine it with a transfer learning approach. The experimental results show that the classification accuracies of transfer learning models are in the range of 77.46–99.80% when trained on datasets that are extended with StarGAN v2 augmented data.
Introduction
Lack of available training data is one of the crucial challenges in designing deep learning-based methods for classification and segmentation. This paper addresses the classification of skin diseases in faces from image data. Even though some public datasets are available [1, 2, 3], the amount of data that they contain is incomparable to the amounts of data that are available for other standard computer vision problems, and thus, the application of models and their performances for this problem are restricted. Additionally, larger training datasets, especially in the case of deep learning methods, lead to better performance [4, 5].
Generative adversarial networks (GANs) are a popular approach for generative modeling using deep learning and have been demonstrated to be effective in several tasks, such as image synthesis [6], image translation [7] and video generation [8]. They enable the generation of new output data.
The proposed approach for improving the classification accuracy in facial skin disease classification, for which only small datasets are available. The classifier performance is enhanced by enlarging the input dataset by generating images with StarGAN v2.
Data generation, which is an unsupervised learning prob-lem, is posed within the process of GAN training using two submodels: a generator model, which is trained to generate new output data that are similar to the input, and a discriminator model, which classifies outputs of the generator model as fake or real. In this way, training is performed in a supervised manner.
The main advantage of GANs is their ability to generate realistic new examples based on input data such that humans are unable to distinguish whether they are fake or real [9]. Therefore, they are mostly used for knowledge translation across domains and to create high-resolution output images from input data. Since they provide outstanding performance in data generation and the capability to capture the underlying data structure, they have been shown to be a great choice for improving the accuracy of medical image segmentation [10]. There is a plethora of work that addresses the problems of data augmentation, image segmentation, detection and classification of medical images [2, 10, 4, 5] with GANs and their extensions.
Their ability to generate images that are realistic to the human eye additionally promises to overcome the challenges of data labeling in the medical field [5]. GANs are commonly used for skin-lesion analysis, segmentation, and classification [2, 11, 12]. Similarly, this paper tackles the problem of augmenting data to improve transfer learning classifiers for skin disease detection and recognition (Fig. 1).
Skin melanoma assessment [13] and skin disease classification are one of the most commonly addressed issues in medical image processing [2]. Developing automatized approaches for skin issue detection and recognition would reduce the time needed to obtain diagnoses and decrease the amount of resources that must be expended by both patients and health care institutions. Transfer learning is a common approach for addressing skin disease classification problems [11]. Transfer learning enables knowledge transfer from large datasets to a specified target task for which only a smaller dataset is available. This is done by first training a CNN model on a large dataset and then retraining this network to address the target problem.
One of the main challenges is obtaining adequate datasets for achieving high accuracy in skin disease detection and classification. Therefore, this paper’s primary goal is to generate high-quality augmented data to improve the classification accuracy for facial skin diseases.
The work presented in this paper extends StarGAN v2 [14], which is a powerful framework that can synthesize rich style images, to medical-domain image generation. We test augmented data for training eight different classification models in the Transfer Learning Suite in PyTorch. 1
https://pytorch.org/vision/stable/models.html.
The experiments are conducted with the Kaggle dataset 20+ Skin Disease Directories with Face Images. 2
https://www.kaggle.com/syedaun/20-skin-disease-directories-with-face-images/code.
The main contributions of this article are:
We propose a GAN model for facial skin image generation based on the StarGAN v2 architecture. The experimental results show great efficiency in the generation of a rich diversity of skin images. We examine and verify transfer learning-based classifiers for skin disease detection and classification with newly generated data in combination with the original data. A framework for medical image analysis, generation and classification is created with the possibility of automatic diagnosis (Fig. 2).
The VGG19 model showed the highest overall validation accuracy of 99.85%.
Block diagram of the proposed approach for facial skin disease prediction.
Sample images from the five diagnostic categories of facial skin diseases, namely, Acne and Rosacea, Eczema, Systemic Disease, Urticaria Hives and Vascular Tumors, used in this paper.
This paper examines the problem of classifying the fol-lowing skin diseases: Acne and Rosacea, Eczema, Systemic Disease, Urticaria Hives and Vascular Tumors. Examples from the “20+ Skin Disease Directories with Face Images” dataset are shown in Fig. 3.
Literature review
Transfer learning [15] and generative adversarial networks are a prevalent deep learning method in industry and academia. They have also received attention in the field of medical image processing [10, 2, 5].
Kazeminia et al. [5] analyzed applications of GANs in the medical field and provided a comprehensive overview of available approaches that use GANs. The main identified benefits of GANs are in enlarging datasets and exploitation in semi- and unsupervised settings. The main disadvantages based on their analysis are the trustworthiness of the generated data and the unstable training of standard GANs. Similarly, Xun et al. [10] identified instability, low repeatability and low interpretability as the main drawbacks based on an analysis of 120 GAN-based architectures for medical image segmentation.
Bissoto et al. [2] conducted a detailed study of GAN-based image synthesis for skin lesion detection and classification. Their work showed that data augmentation with GANs provides favorable results only on test sets with anomalous inputs, and they advise careful application for medical image processing.
Qin et al. [11] constructed a framework for the analysis of medical images and skin lesion classification. They also used GAN-based data synthesis. They proposed a deep learning-based classifier, which improved the overall classification results. However, their method encountered mode monotony for some categories in generated images. Similarly, we also address improving the performance of transfer learning models with augmented data.
Lei et al. [12] proposed GANs with dual discriminators for addressing the problem of data synthesis in the skin lesion segmentation task.
Their novel GAN demonstrated superior performance on the skin lesion dataset from the ISIC 2017 challenge compared to state-of-the-art methods. Their approach’s main challenge was representing images with exceedingly obscure boundaries, low contrast, and noise. Due to their remarkable performance, GANs have attracted much attention in the scientific community [8, 16, 17, 18, 19, 9, 9].
Recent advances in GANs have overcome well-known problems such as training instability and mode collapse [8, 19]. The versatility and robustness of GANs have also been improved [18].
Armandpour et al. [19] introduced partition-guided GANs, which divide the data space into smaller regions to overcome the mode collapse problem. These partitions have simpler distributions and alleviate the mode collapse problem for parts of the data with nonexistent density. There have also been advances toward utilizing the latent space interpretation of GANs for semantic editing and useful direction determination [20].
Additionally, for image generation, GANs spontaneously learn multiple interpretable features in the latent space, e.g., the gender of a face or the lighting conditions in the case of scene synthesis.
Recent methods are capable of identifying versatile semantics from GAN networks of different types in an unsupervised learning setup [16]. Tritrong et al. [17] showed that GANs can be used not only for synthesis but also for few-shot semantic part segmentation.
They set up novel GANs for unsupervised representation learning for problems involving scene semantics, object part reasoning, and, alternatively, prediction using pixel-wise already-discriminative representations obtained from GAN processes of synthesis.
Lin et al. [9] introduced a scalable training process for training unconditional GANs. The main benefit of their approach is that it adapts to different latency requirements coming from hardware or a user.
Proposed approach
The proposed approach is presented in Fig. 2. In this section, we describe image synthesis with StarGAN2 in detail and present an overview of the classifier construction.
Generate With StarGAN v2
Image synthesis
This section briefly introduces the concepts of StarGAN v2. Full details can be found in the paper introducing StarGAN v2 by Choi et al. [14].
A general GAN consists of two neural networks: a generator
[h] Face Skin Disease Prediction
The StarGAN v2 discriminator is a multitask discriminator where each output branch
where
An adversarial objective
Overview of the transfer learning models
Additional data augmentation and preprocessing steps
We built classifiers using the transfer learning approach, which enables the utilization of knowledge from a previously trained source model to train a new model more rapidly and with better performance. This is especially beneficial when we have several domains of interest that potentially have different distributions and might lie in different feature spaces and we do not have sufficient training data in some of these domains. In these cases, knowledge transfer improves the classification performance. In our scenario of skin disease classification, this knowledge transfer was performed by a pretrained ImageNet [22] model. We examined several classification models. An overview of these models is presented in Table 1. To train classifiers on the newly obtained dataset consisting of the original data and synthesized data, which was still relatively small, additional classical computer vision data augmentation was necessary to increase the dataset size. This included the following data augmentation and preprocessing steps:
StarGAN v2 network architecture details
An example of reference-guided image generation results. In the first row are given the source images and in the left column are given the reference images.
The model was trained for 25,000 epochs with a batch size of eight. It was implemented in PyTorch with a learning rate of
Experimental results
This section describes the evaluation setup and conducted experiments.
Transfer learning models trained with StarGAN v2
Transfer learning models trained with StarGAN v2
Accuracy and loss over 25 epochs for the transfer learning classification model using STARGAN v2 augmented images.
Overall results of the StarGAN v2 generation of latent representations. In each row, the source image is on the left, while the remaining images are synthesized images.
Overall results of StarGAN v2 in reference-guided image generation. In the first row, the source and reference images are presented, and in the first column, the original images are presented, while the remaining images are synthesized images.
This article proposes a framework for medical image synthesis and classification. Face skin images are generated via the StarGAN v2 data augmentation technique, and a skin condition classifier is created based on the transfer learning approach. The generated images are used together with the original data to improve the classifier’s performance. Changes in the classification network’s accuracy and loss of training and validation are visualized in Fig. 5.
The proposed approach using StarGAN v2 successfully renders distinctive styles. The style codes of the references are captured from face skin images representing a specific disease, and new realistic images are synthesized, as shown in the experimental results. High-level semantics, such as skin condition features, are inferred from the reference images, and the identity and pose are obtained from the source images. The VGG19 model showed the highest overall validation accuracy of 99.85%.
Future work will integrate trained models into a robotic system (Misty 2) for online skin disease detection and recognition. We will apply our attribution-based confidence metric [24] for detecting adversarial attacks on skin cancer detection systems.
Footnotes
Acknowledgments
This material is based upon work supported by the Henry Luce Foundation – Clare Boothe Luce Fund. The authors acknowledge the support provided by Dave Reed, Carol Zuegner, Catherine Baker, Joi Katskee, and all the faculty members from the Department of Computer Science, Design, and Journalism, Creighton University.
