Abstract
Alopecia Areata (AA) is one of the most widespread diseases, which is generally classified and diagnosed by the Computer Aided Diagnosis (CAD) models. Though it improves AA diagnosis, it has limited interoperability and needs skilled radiologists in medical image interpretation. This problem can be solved by developing Deep Learning (DL) models with CAD for accurately diagnosing AA patients. Many studies engaged only in specific DL models such as Convolutional Neural Network (CNN) in medical imaging, which provides different independent results and many parameters, which limits their generalizability for different datasets. To combat this limitation, this work proposes an Ensemble Pre-Learned DL and an Optimized Long Short-Term Memory (EPL-OLSTM) model for AA classification. Initially, many healthy and AA scalp hair images are separately fed to the pre-learned CNN structures, i.e. AlexNet, ResNet, and InceptionNet to extract the deep features. Then, these features are passed to the OLSTM, in which the Battle Royale Optimization (BRO) algorithm is applied to optimize the LSTM’s hyperparameters. Moreover, the output of the LSTM is classified by the fuzzy-softmax into the associated AA classes, including mild, moderate, and severe. Thus, this model can increase the accuracy of differentiating between healthy and multiple AA scalp hair classes. Finally, an extensive experiment using the Figaro1k (for healthy scalp hair images) and DermNet (for different AA scalp hair images) datasets demonstrates that the EPL-OLSTM achieves 93.1% accuracy compared to the state-of-the-art DL models.
Keywords
Introduction
Hair is an essential feature of a person’s physical appearance. The keratin layer of hair becomes brittle and split due to the impact of environmental factors, like temperature, and humidity, along with physicochemical treatments, therefore damaging hair quality or causing hair loss. This results in Alopecia Areata (AA), which is an autoimmune disease involving nonscarring hair loss in well-defined patches that can influence the whole scalp area and tend to baldness [1]. This AA affects millions of individuals worldwide, particularly those with a family background of AA. It instigates while the body’s autoimmune system targets the hair follicles, impeding their regular operations, and avoiding potential hair growth. According to the World Health Organization (WHO), it is predicted that 1 in 1000 individuals are affected by AA disease. The lifetime risk of occurring AA in the population is nearly 2% [2]. Particularly, AA data related to the condition and its symptoms exhibit many distinct characteristics compared to other kinds of data. AA data includes hair loss patterns, clinical features, treatment options, disease progression, psychological impact, genetic factors and research and clinical trials. It helps understand the clinical profile of individuals affected by the condition, as well as the response to treatment and management strategies. Mostly, trichoscopy and biopsies are required to classify and diagnose AA in the past decades [3]. But the disadvantages of these diagnostic models include the unpredictability of the number of tests needed for proper diagnosis.
As a result, there is a huge opportunity to develop novel models based on Artificial Intelligence (AI) algorithms for classifying and diagnosing AA [4–6]. Machine learning models including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), decision trees, etc., have revealed effective performance in the classification and diagnosis of multiple diseases. These models adopt various computer algorithms that exhibit the perspective to learn and adapt. In dermatology, effective classification and diagnosis have been accomplished by various machine learning models [7]. For instance, SVM, KNN, and decision trees have been applied to analyze and categorize scalp images, which assists in classifying scalp conditions like dandruff, AA, etc. But, these models do not perform well on multi-classification tasks, are sensitive to the parameters like kernel function, and do not learn the correlation of samples.
To tackle all these issues, DL models have been employed in recent medical diagnosis systems. In dermatology, few researchers developed different CNN models to classify and diagnose scalp hair problems, i.e., dandruff, AA, allergies, and folliculitis, oily scalp. Also, these models can predict the different levels of hair loss from the human scalp or skin images [8, 9]. However, these models realize different performances for different datasets due to the variation in the number of samples. This results in restricting the generalizability of these models and challenging to set the appropriate parameters for network learning when using a variety of datasets.
Hence, to address these problems, in this manuscript, an EPL-OLSTM model is proposed for AA classification from both human healthy scalp hair and AA scalp hair images. Compared to the previous works in AA classification, the proposed model can extract deep features from the scalp hair images using pre-learned CNN structures and classify them into corresponding AA classes using the OLSTM network in an automated way. This alleviates the manual extraction of features and reduces the computational complexity. The main contributions of this model are the following: First, healthy and AA scalp hair images of various individuals are independently given to the AlexNet, ResNet, and InceptionNet models for deep feature extraction. Second, the extracted deep features are passed to the OLSTM network, wherein the LSTM’s hyperparameters are optimized by the BRO algorithm. Finally, the fuzzy-softmax function is applied to classify the resultant features from the LSTM network into the associated AA classes, including mild, moderate, and severe.
Based on this model, the accuracy of classifying and diagnosing AA can be improved significantly. It also enhances the model generalization for different kinds of medical images. The findings reveal the future potential of the ensemble DL model to differentiate AA classes and diagnose patients suitably.
The remaining article is prepared as follows: Section 2 discusses the works related to the classification and diagnosis of AA/scalp hair problems. Section 3 explains the EPL-OLSTM model and Section 4 illustrates its performance compared to the existing models. Section 5 summarizes the study and presents its future enhancement.
Literature survey
Nabahhin et al. [10] developed an expert model, which conducts treatment for various probable hair loss disorders of the levels between individuals by asking yes or no questions. First, it may ask the customer to choose the proper answer on all screens. At the end of the dialog session, the treatment and suggestions for the disorder were provided to the customer. But more characteristics related to hair loss were needed to improve the diagnosis.
Wang et al. [11] applied the DL models to hairy scalp images to identify the different scalp conditions. In this model, the ImageNet-VGG-f structure Bag-Of-Words (BOW) was executed with an SVM classifier and Histogram-Of-Gradients (HOG) or Pyramid HOG (PHOG) with an SVM classifier. But the number of scalp images for training was inadequate and the accuracy was limited to the small datasets.
Lee et al. [12] identified the topographic phenotypes of AA using cluster analysis and designed a grading model to stratify diagnosis. At first, clinical images of patients with AA were collected. Then, topographic phenotypes of AA were detected by hierarchical clustering with Ward’s method. Also, variances in clinical features and diagnosis were compared across the different clusters. But the statistical efficiency was degraded because of the limited number of patients with severe AA.
Seo and Park [13] presented a scheme to prevent hair loss and diagnose the scalp by capturing Alopecia Feature (AF) depending on the scalp image. Primarily, the scalp images were preprocessed by image processing to fine-tune the contrast of microscopy input and reduce the light reflection. Then, the AFs like the number of hair, follicles, density, etc., were extracted from the preprocessed images by the gridline selection and eigenvalues to compute the growth level of alopecia. But it needs a massive quantity of scalp images and designs an AI model to automatically extract several kinds of AFs for increasing efficiency.
Fatima et al. [14] investigated clinical, dermoscopic, and histopathological findings in patients of AA. In this investigation, 50 successive patients participating dermatology outpatient department of a tertiary care hospital over 2 years with clinical attributes evocative of AA were chosen. After that, a clinical analysis was conducted by dermoscopy and skin biopsy taken from the margin of an active lesion. Moreover, the data was evaluated by determining the mean and standard variance. However, it needs an automated model to identify and diagnose AA appropriately.
Ibrahim et al. [15] presented an analysis of the pre-trained categorization of scalp conditions with the help of image processing methods. At first, the scalp images were collected and preprocessed. Then, various characteristics like shape, color, and texture were obtained from all images to determine the Region-Of-Interest (ROI). The values of the pre-trained features were utilized as a reference during the categorization. The SVM was used to categorize the scalp conditions. But it takes more time and complexity due to the independent feature extraction and classification processes.
Zhang et al. [16] developed a rapid and simple technique to identify the level of hair damage based on the lightweight CNN model called Hair Diagnosis MobileNet (HDM-Net). In this technique, the HDM-Net was utilized to obtain and choose the features. Such features were then fed to the SVM to categorize hair damage images. Though it reduces the number of parameters, its accuracy was not effective.
Shakeel et al. [17] developed a model for the categorization of healthy hairs and AA. First, hair images of healthy and AA conditions were collected and preprocessed for partition. Then, various features such as texture, shape, and color were extracted from each segment. Moreover, SVM and KNN classifiers were employed to classify those features into healthy and AA. But these classifiers have a high computational complexity while using more images.
Gao et al. [18] presented a deep learning model for automated trichoscopy scan evaluation and a quantitative framework to categorize male androgenetic alopecia. First, trichoscopy scans were obtained, and a deep learner was constructed based on a Fully Convolutional Network (FCN). Then, the relationships between fundamental and detailed categorization were examined, and a quantitative framework was applied to predict fundamental and detailed categorization through multiple ordinal logistic regressions. But its performance was limited to the number of samples.
Jeong et al. [19] developed a deep learning-based intelligent scalp diagnosis and classification system called AI-ScalpGrader using EfficientNet to diagnose and categorize scalp conditions. But it achieved accuracy values of 87.3 to 91.3%. Roy and Protity [20] presented the 2D CNN model to predict different kinds of hair loss and scalp-related diseases. But the drawback of this framework was the unavailability of a proper dataset and the lack of variety among the images distributed over the internet.
Ying and Lin [21] developed a new self-learning fuzzy automaton with input and output fuzzy sets for system modeling, which can be used to solve issues in medical applications. Xing et al. [22] developed an efficient federated distillation learning system for multi-task time-series classification. It can be used for medical systems to analyze time-series data.
From the literature, it is observed that the current studies focused on machine learning and DL models for scalp hair problem classification. However, such studies face many challenges in AA classification such as limited and heterogeneous data, challenging scalp hair image analysis, inter- and intra-observer variability, lack of standardized classification criteria and generalizability to diverse populations. These challenges hinder the accuracy of classification models and hinder the generalizability of the condition. Therefore, this study develops an ensemble DL model for classifying and diagnosing AA diseases in humans using both healthy scalp and AA scalp hair images.
Proposed methodology
In this section, the EPL-OLSTM model is described briefly for classifying and diagnosing AA. Figure 1 depicts the overview of this study. First, the deep features are extracted from both healthy scalp hair and AA scalp hair images using the pre-learned CNN structures. After that, the extracted features are given to the OLSTM network, followed by the fuzzy-softmax layer for AA classification.

Block diagram of the study.
In this study, two different publicly available databases are acquired and they are: Figaro1k database: It is an open database comprising 1050 healthy scalp hair images, equally distributed in various classes like straight, wavy, and curly [23]. Of these, 350 images of normal hair are considered for this study. Dermnet database: It is an open database accessible on Dermnet, containing 23 classes of dermatological disorders, including AA [24]. Overall, 1050 images (350 from each AA type) are obtained for three distinct AA types: mild, moderate, and severe.
The healthy scalp hair and AA scalp hair images from these databases are processed by the EPL-OLSTM model for AA classification and diagnosis.
Deep feature extraction using pre-learned CNN model
In this study, three distinct pre-learned CNN structures are considered for deep feature extraction: AlexNet, InceptionNet-V1, and Residual Network (ResNet). The InceptionNet-V1 and ResNet structures have a single Fully Connected (FC) layer. The AlexNet structure has 3 distinct FC layers (FC6, FC7, and FC8), which contain various distinctive characteristics with efficiencies that vary from all others. The separate efficiencies of these layers are determined and the best-performing layer of these models is predicted as the FC6 layer. Table 1 presents the characteristics of the pre-learned CNN structures. Figures 2–4 illustrates the structures of the pre-learned CNN models.
Details of different pre-learned CNN structures
Details of different pre-learned CNN structures

Structure of AlexNet.

Structure of InceptionNet-V1.

Structure of ResNet.
So, these pre-learned CNN structures are separately used for extracting the deep features from both healthy scalp hair and AA scalp hair images. After completing the deep feature extraction, the extracted features are fed to the OLSTM network for further processing.
The LSTM network includes 3 gate control strategies such as forget, input, and output gate. Meanwhile, it adopts the choice of dependent data on LSTM unit regulation that efficiently prevents the issue of gradient explosion and vanishing. Its architecture is depicted in Fig. 5.

Architecture of LSTM network.
The presence of the forget gate is to compute the level of forgetting of the data course preceded by the ongoing LSTM unit as Equation (1):
In Equation (1), W f , b f are the weight vector and bias value of the forget layer, respectively. σ is the sigmoid activation function, x t is the input feature in the input gate, f t is the forget gate, and ht-1 is the result of a previous hidden state.
The role of the input gate is to estimate how much present data is included in the data course as Equations (2), and (3):
In Equations (3), W
i
, W
C
are the weight vector of the input gate and neuron condition vector, respectively. b
i
, b
C
are the bias values of the input gate and neuron condition vector, respectively. tanh is the hyperbolic tangent activation function,
Once the data traverse via the input and forget gates, the LSTM fine-tunes their units to determine the outcome of the ongoing LSTM unit and pass it to the consecutive LSTM unit as Equation (4):
In Equation (4), C t is the current cell state, and Ct-1 is the old cell state. The output gate merges the present input and LSTM unit to compute the result of the present LSTM unit as Equations (6):
In Equations (6), h t represent the hidden state that serves as the solution of the block over t, o t is the output gate, W o and b o are the weight vector and bias value of the output gate, respectively.
The LSTM network is trained by the Adam optimizer with an initial training rate of 0.001, an epoch number of 100, and a batch size of 15.
On the other hand, the major problem is tuning of LSTM network’s hyperparameters such as the number of hidden layers, number of hidden nodes in each layer, batch size, number of epochs, training rate, weight, and bias values. To solve this problem and optimize the LSTM network’s hyperparameters, the BRO algorithm is adopted in this study.
The BRO algorithm is motivated by the kind of digital games such as battle royale. The BRO is a population-based algorithm, where all individuals are defined by the warrior (different set of LSTM network’s hyperparameters) who wants to relocate to the safest (best hyperparameter set) location and stay living.
The BRO initiates with a random population that can be evenly dispersed over the search area. Then, all individuals fire a gun at the soldier who is closest to them in an attempt to kill them. So, soldiers in powerful locations attack their closest compatriots. Each time, a soldier is wounded by the other; the injury level rises by 1. Such relations are computed by x i . injury = x i . injury + 1, where x i . injury defines the injury level of the i th warrior among the population. Additionally, warriors seek to switch locations as soon as they get an injury to hit enemies from a different angle. As a result, to concentrate on exploitation, the injured warrior travels in the direction of a location between its original location and the safest location thus far (leaders). Such relations are determined according to:
In Equation (7), r denotes an arbitrary number evenly distributed between 0 and 1, xinj,d represents the location of the injured warrior in size d and xopt,d indicates the location of the optimal result obtained thus far. As well, when injured warriors will kill their enemy in a consecutive iteration, x i . injury can be reassigned to 0. To concentrate on search, when the injury level of a warrior exceeds the fixed threshold value, the warrior dies and respawns arbitrarily from the possible search area and x i . injury can be reassigned to 0. According to the test and error, the threshold value is set to 3. This activity prevents early convergence and offers a better search. The warrior returning to the search area after being killed is as:
In Equation (8), ll
d
and ul
d
are the minimum and maximum limits of size d in the search area, correspondingly. Further, in all Δ iteration, the possible search area of the issue starts to minimize toward the optimal result. The initial value is Δ = log10 (MaxCircle); however, then
In Equations (9)
Also, the computational complexity of this BRO relies on the population dimension and the maximum number of iterations. Because all results should be evaluated with each other to determine their Euclidean distance from each other result, for the population dimension n, the computational complexity for each result is O (n2). So, for the number of iterations m, the computational complexity of BRO is O (n3). Figure 6 illustrates the OLSTM for AA classification.

Flow diagram of OLSTM network model for AA classification.
Algorithm 1: OLSTM using BRO algorithm
Arbitrarily initialize a population (set of hyperparameters); Initialize the maximum iteration (Itr
max
); Initialize Shrink = ceil (log10 (MaxCircle));
Itr = 1; inj = j; win = i; inj = i; win = j; Modify the location of the injured warrior depending on: xinj,d = r (max (xinj,d - xopt,d) - min (xinj,d - xopt,d)) + max (xinj,d - xopt,d) x
inj
. injury = x
i
. injury + 1; x
win
. injury = 0; xinj,d = r (ul
d
- ll
d
) + ll
d
; Modify f (x
inj
); x
inj
. injury = 0;
Modify (ul - ll) depending on Equation (9); When ll
d
or ul
d
surpasses the actual minimum/maximum limit, it is assigned to the actual ll
d
or ul
d
;
Choose the best warrior (optimal hyperparameters) as the result;
According to this BRO, the optimal hyperparameters utilized in the LSTM network model are chosen for model training. Moreover, the output of the LSTM network (FV) is provided to the FC layer followed by the fuzzy-softmax layer for AA classification.
If the LSTM network provides n features, then the LSTM network layer has the output in the form:
The output of the LSTM layer is passed through the softmax layer, which converts the raw output to class probabilities. The LSTM layer provides a vector with 4 scores, each score is associated with a different scalp hair condition. By the softmax classifier, the final result of the scalp hair conditions is calculated by
In Equation (11), T L j (FV) is the probability of choosing L j class as the scalp hair condition (i.e., healthy, mild AA, moderate AA, and severe AA).
Yu [25] defined that using the softmax function with fuzzy interference can increase the discrimination ability of this classification function. So, a new fuzzy-softmax function is applied that utilizes the Intuitionistic fuzzy sets. So, this function considers both membership and non-membership values of the LSTM x state values to the accurate classes.
In this fuzzy softmax classifier, T L j (FV) is calculated according to the fuzzy membership and fuzzy non-membership degree for all the FV input vectors to the associated classes.
In Equation (12), s L j (x i ) is the fuzzy significance of x i feature, associated with the L j output class, v L j (x i ) is the weight value between x i feature and the L j output class, and f (x) is an activation function. The value of s L j (x i ) is determined according to the fuzzy membership value μ L j (x i ) and the fuzzy non-membership value ϑ L j (x i ) of x i feature and the L j output class as:
The values of the factors μ L j (x i ), ϑ L j (x i ) are calculated depending on the weight vector U connecting each LSTM feature to the appropriate class, where u nj is the weight between n th feature in the LSTM layer adjacent to the L j output class.
The degree of significance for the non-membership of the fuzzy significance value in Equation (13) is controlled by the variable λ, which is equal to 0.7.
Thus, the fuzzy-softmax layer classifies the features from the healthy scalp hair and AA scalp hair images into different classes.
This portion investigates the success of the EPL-OSTM model by executing it in MATLAB 2017b using Figaro1k and Dermnet databases (discussed in Section 3.1). In this experiment, a total of 1400 photos (350 from Figaro1k and 1050 from DermNet databases) are used. Of these, 1120 photos (280 from Figaro1k (i.e., normal hair class) and 840 from DermNet databases (i.e., 280 mild, 280 moderate, and 280 severe AA)) are applied for training. Similarly, 280 photos (70 from Figaro1k (i.e., normal hair) and 210 from DermNet databases (70 mild, 70 moderate, and 70 severe AA)) photos are applied for testing. Figure 7 shows the some sample scalp hair images from the considered databases for various classes.

Healthy scalp hair and different kinds of AA scalp images.
The classical models, including HDM-Net [16], KNN [17], FCN [18], and CNN [20], which are also tested by using the considered datasets to ensure the proposed EPL-OLSTM models’ effectiveness. The performance evaluation metrics are defined as:
In Equation (10), the number of healthy pictures precisely identified as healthy is TP, while the number of AA pictures precisely identified as AA is TN. In addition, FP is the number of AA pictures identified as healthy, whereas FN is the number of healthy pictures identified as AA.
Table 2 presents the confusion matrices for the EPL-OLSTM on the considered test images.
Confusion matrix for existing and proposed AA classification and diagnosis models during testing phase
*Note: 1 –Healthy; 2 –Mild AA; 3 –Moderate AA; 4 –Severe AA.
Table 3 shows the performance values for existing and proposed AA classification and diagnosis models during the testing phase.
Performance analysis of existing and proposed AA classification and diagnosis models
Figure 8 illustrates the values of performance metrics for both existing and proposed AA classification and diagnosis models. It is noticed that the proposed EPL-OLSTM model can achieve higher efficiency compared to the other existing models. The accuracy of the EPL-OLSTM is increased by 18.57%, 14.43%, 11.54%, and 6.18% compared to the KNN, HDM-Net, FCN, and CNN models, respectively. The precision of the EPL-OLSTM is enhanced by 19.09%, 14.48%, 12.55%, and 5.72% compared to the KNN, HDM-Net, FCN, and CNN models, respectively. The recall of the EPL-OLSTM is 18.72%, 14.44%, 11.94%, and 6.57% compared to the KNN, HDM-Net, FCN, and CNN models, respectively. Also, the f-measure of the EPL-OLSTM is 18.9%, 14.46%, 12.25%, and 6.15% compared to the KNN, HDM-Net, FCN, and CNN models, respectively.

Comparison of proposed and existing AA classification and diagnosis models.
This reveals that the EPL-OLSTM can classify both healthy scalp hair and AA scalp hair images efficiently, in contrast with the other existing models.
Table 4 shows the computational complexity of existing and proposed AA classification and diagnosis models.
Computational complexity of existing and proposed AA classification and diagnosis models
*n: number of training data, k: nearest neighbor,k: convolution kernel size, d: input dimension.
The limitations of the proposed study include: (i) the deep-learning model needs a huge quantity of training samples, but this study considers limited samples, (ii) the availability of more well-annotated images representing different stages of AA can impact the model’s generalizability, (iii) the pre-learned CNN models cannot capture more discriminative features from scalp hair images, which may require additional preprocessing steps like segmentation.
Conclusion
In this study, the EPL-OLSTM model was designed to classify healthy and AA scalp hair images. First, the AlexNet, ResNet and InceptionNet-V1 were applied for deep feature extraction. Then, the OLSTM network with the fuzzy-softmax classifier was developed for classification. At last, the test results proved that the EPL-OLSTM model on the Figaro1k and DermNet datasets has an accuracy of 93.1% compared to the existing models. As a result, it supports physicians to diagnose patients who suffer from AA earlier. Future work will acquire more images from various sources for model training, validate using other pre-learned CNN models and develop advanced image segmentation models for improved feature extraction.
