Abstract
Type 2 diabetes mellitus (T2DM) detection is a chronic disease, which is caused due to the insulin disorder. Moreover, the decreased secretion of insulin increased the blood glucose level, thereby the human body cannot respond with the high glucose level. The T2DM sufferers do not produce enough insulin, or it resists insulin. The symptoms of T2DM disease are increased hunger, thirst, fatigue, frequent urination and blurred vision, and in some cases, there are no symptoms. The commonly utilized treatments of T2DM are exercise, diet, insulin therapy and medication. In this paper, the Competitive Multi-Verse Rider Optimizer (CMVRO)-based hybrid deep learning scheme is devised for T2DM detection. The hybrid deep learning involves two classifiers, such as Rider based Neural Network (RideNN) and Deep Residual Network (DRN). Moreover, the comparative analysis of T2DM detection is done by comparing various feature selection approaches, such as Tanimoto similarity, Chi square (Chi-2), Fisher Score (FS), Linear Discriminant Analysis (LDA), Random Forest (RF), and Support Vector Machine recursive feature elimination (SVM-RFE) for T2DM detection. Amongst these, the tanimoto similarity feature selection approach attained the better performance with respect to the testing accuracy, sensitivity and specificity of 0.932, 0.932 and 0.914, correspondingly.
Introduction
T2DM detection is a metabolic disorder, which is caused due to the increased level of blood sugar. Generally, the amount of blood sugar is increased by the decreased amount of insulin secretion or insulin dysfunction such that the human body is unable to react with insulin properly. This kind of insulin disorder does not give any symptoms, but it affects the body directly. Hence, the insulin disinfection is very difficult to identify the disease. Nowadays, the number of diabetic patients are continues to increase day by day [1, 2]. As stated by the report of investigation from an International Diabetes Federation (IDF) in 2015, there were 415 million individuals in the globe was affecting from diabetes where the ratio of T2DM is 95 percent of world’s populace caused from diabetes mellitus and more than 10 million people in Indonesia. Greater than 60 percent of individuals in the globe with diabetes patients are not conscious of diabetes, so that Type 2 DM patients are identified after the difficulties happen. Thus, the later recognition of T2DM patients has severe serious consequences [3], which affect the health of patients. Hence, the detection of T2DM at an earlier stage provides a better solution.
Feature selection is one of the efficient ways to lessen the sample data dimension. In applications with respect to high-dimensional samples, like face images, dimension reduction approaches have been extensively utilized [4, 5]. There have been various feature selection techniques are utilized by the researchers, such as feature similarity [4], mutual information [6], loss-margin of nearest neighbor classification [7], genetic algorithms [8] and ant colony optimization [9] for selecting the powerful features. Moreover, some of the researchers integrate the classifier with the optimization algorithms for selecting the appropriate features [10, 11, 12]. Though, SVM is a well-organized machine learning technique, its classification accuracy necessitates further enhancement in the classification of multidimensional space as well as dataset for feature interface variables. Based on such difficulties, the feature selection schemes can be subjected to decrease the intricacy of data structure for recognizing the significant feature variables as another group of testing occurrences. Moreover, the feature selection schemes select the features by filtering the redundant, inappropriate and noise data for diminishing the computation period and accuracy of detection. Some of the other effective feature selection strategies are backward feature selection (BFS), ranker and forward feature selection (FFS) [13].
The main motive of this research is the comparison of various feature selection techniques for T2DM detection using CMVRO-based hybrid deep learning. Here, the T2DM detection is done by hybrid deep learning, which is tuned by CMVRO algorithm. Moreover, CMVRO algorithm is fashioned by joining the CMVO and ROA, and hybrid deep learning involves two classifiers, such as RideNN and DRN. In addition, the comparison of T2DM detection is done by utilizing various feature selection techniques, such as Tanimoto similarity, Chi-2, FS, LDA, RF and SVM-RFE.
The main contribution of this research paper is,
CMVRO-based hybrid deep learning for T2DM detection: In this research, T2DM detection is done by hybrid deep learning classifier in which the weights and biases are tuned by the CMVRO algorithm. The proposed algorithm is the integration of CMVO and ROA, Furthermore, an effective method of feature selection is analyzed by the comparative analysis of T2DM with various feature selection approaches. The arrangement of this paper is organized as below. Section 2 displays the motivation behind topic and literature survey of T2DM techniques, Section 3 describes the T2DM detection based on various feature selection techniques, Section 4 shows the results and discussion of T2DM detection, and Section 5 defines the conclusion of this paper.
T2DM is one of the commonly used metabolic disorders, and its progression is primarily caused based on two main factors, such as the defective insulin secretion and the incompetence of insulin-sensitive tissues. The later identification of diabetes disease increased the risk of providing treatment. This inspires the researcher for selecting T2DM detection as a research topic.
Literature survey
The literature survey of different T2DM detection techniques are given below. Nilamyani et al. [3] modeled the recursive feature extraction technique for T2DM detection. In this method, the feature ranking was done with Chi-square, information gain and random forest. Moreover, the feature selection process was done based on LDA, RF and SVM for choosing the appropriate features. Though, the complexity of computation was maximal. For reducing the computational complexity, Sun et al. [14] modeled the RF method for searching the significant attributes in order to perform the T2DM detection. In this paper, the SVM and Logistic regression (LR) was utilized to compare the performance of RF method. In addition, the selected attributes were utilized for further disease prediction process. However, the effectiveness of disease detection was poor. In order to improve the disease detection, Hou et al. [15]devised the three classifiers, such as RF, LR and SVM for T2DM detection. This method selects the optimal features using FS, RFE and decision tree for accomplishing the better performance. Though, the processing period of this system was maximum. In order to reduce the processing time, LR approach was devised by Alshamlan [16] to detect the T2DM. Here, the pre-processing was done and the feature selection process was done using Chi-square and FS approach. In addition, the SVM and LR were employed to calculate the accuracy of detection.
Challenges
Block diagram of T2DM detection with various feature selection techniques.
The challenges of various T2DM detection techniques are given below.
In [14], RF technique was devised for selecting the appropriate features. This method was employed to process the missing information also. Moreover, it was failed to extract the more significant features. In order to extract more numbers of features, the devised scheme in [15] was employed to avoid the diabetes using examination-based interventions, which was utilized to enhance the premature identification of diabetes to lessen the problem on health system. However, it did not attain the better classification outcome. For attaining better classification outcome, the feature selection with LR classification was utilized for T2DM detection and the accuracy of detection in this method was not always accurate [16]. Thus, the challenge lies on enhancing the detection accuracy by implementing the optimized T2DM detection technique. The T2DM detection with complex dataset is very challenging to identify the disease. Moreover, the major challenge of T2DM detection is that the improper selection of features, which reduced the accuracy of detection.
This section portrays the comparative analysis of numerous feature selection techniques for T2DM detection. Here, the pre-processing is completed with data transformation technique and T2DM detection is done using CMVRO-based hybrid deep learning scheme. In this research, the main intension is to compare the various feature selection approaches, like Tanimoto similarity, Chi-2, FS, LDA, RF and SVM recursive for T2DM detection. Figure 1 portrays the block diagram of T2DM detection with various feature selection techniques.
Let us assume the input data with
where,
The input of pre-processing is
where,
Once the pre-processing is done, then the various feature selection methods are performed to select the significant features for assessing the efficiency of T2DM detection by considering the input of feature selection as
Tanimoto similarity
The first feature selection technique used for analyzing the efficiency of T2DM detection is Tanimoto similarity. The advantage of Tanimoto similarity is that it diminishes the evaluation cost by reducing the input variable count. This method estimates the resemblance among features as well as selected features, which is given by,
where,
The second feature selection approach is chi-square s [3], which is used to select the relevant features. Moreover, the chi-squared assessment is employed to compute the weight of attributes. Moreover, the expression for chi-square test is given by,
where,
FS [17, 15] is another feature selection scheme used in this paper. Generally, FS is widely utilized to lessen the dimension of data. Moreover, it effectively selects the features based on the feature subset. Moreover, the FS of
where,
The Eq. (6) shows the between-class scatter of
This section explains the process of feature selection using LDA algorithm [12]. Let us deliberate
where,
Let
where,
RF [14] refers to the supervised learning approach, which indicates that every instance or sample is illustrated as outcome. RF comprises of
SVM recursive
SVM recursive [13] is a SVM dependent feature selection approach, which is used to select the key and significant feature sets. This method diminishes the computation time of processing, thereby the rate of classification accuracy is to be improved. This method integrates the feature selection as well as SVM-RFE to examine the classification accuracy of multiclass difficulties. Moreover, Taguchi scheme was mutually joined with SVM classifier for optimizing the parameter to upsurge the classification accuracy for multiclass classification. Thus, the selected feature is represented as
After the feature selection, the selected features, such as
T2DM detection
In this research, the T2DM is identified by the hybrid deep learning classifier, which is formed by the coagulation of NN and DRN. Moreover, the weights as well as biases of hybrid deep learning classifier are trained by the CMVRO algorithm. The input of hybrid deep learning classifier is one of the selected features from
Structure of RideNN
The RideNN [18] classifier is utilized for determining the T2DM. It imitates the function of human brain in order to identify the relation among large data. RideNN involves four groups, such as bypass rider, follower, overtaker and attacker. This algorithm is designed by inspiring the behavior of riders, who move towards the target location. Moreover, leading rider is determined with respect to the success rate. Thus, the classifier outcome is appraised with transfer function specified as,
where,
DRN [19] is used to generate the powerful decision in which the decision regarding T2DM detection is completed. The DRN is exploited to analyze the visual imagery with improved accuracy. It aims to resolve multifaceted processes in computer vision, and it comes with own group of troubles. The linear classifier employed to notice the noisy pixels based on input data. DRN involve various layers, such as convolutional layer, pooling layer, activation function layer, residual blocks layer, batch normalization layer, softmax layer and linear classifier layer. Here, the linear classifier is utilized to generate the final classified outcome. Thus, the final outcome attained by the mishmash of softmax function with FC layer.
Here,
The training of DRN as well as RideNN is done with CMVRO, which is organized by mingling CMVO [20] and ROA [18] .The CMVO [20] is designed by taking the intelligence from competition method for becoming a winner. The CMVO is effective in resolving the global optimization concerns. In CMVO, the population is arbitrarily grouped based on the bicompetitions for generating two sets, such as winners, as well as losers. Likewise, the ROA [18] is designed from activities of rider sets, which travel to defeat a general target location to turn out to be a victor. Here, ROA is processed based on the bypass rider, follower, overtaker and then the attacker.
The final updated equation of CMVRO is specified as,
where,
Hence, the fusion of deep learning classifiers is done by merging the output taken from RideNN and DRN using correlation. Hence, the correlation is articulated as,
where,
This section describes the discussion of observational results based on various feature selection approaches for T2DM detection using GEO dataset. Moreover, the experimental setup, description of dataset and evaluation techniques are explained in this section.
Setup of experiment
The comparative analysis of various feature selection approaches are executed on Python tool with the help of Pycharm IDE in Windows 10 OS, Intel i3 core processor and 8 GB RAM.
Dataset description
The implementation of feature selection approaches is done using the GEO dataset [21], which contains the GED of diabetic as well as non-diabetic patients. Moreover, the dataset is formed by executing the microarray assessment to calculate the variations amongst transcriptome of T2DM patients and non-diabetic samples.
Evaluation metrics
Assessment by modifying the training data percentage based on (a) Testing accuracy, (b) Sensitivity and (c) Specificity.
The metrics utilized for evaluating the efficacy of feature selection approaches are testing accuracy, sensitivity and specificity.
Accuracy: It is expressed as the nearness amongst predicted value with original value, and is stated as,
where,
Assessment by changing the k-fold value based on (a) Testing accuracy, (b) Sensitivity and (c) Specificity.
Sensitivity: It designates fraction of positives, which are correctly recognized by T2DM detection technique, and is portrayed as,
Assessment by changing the selected features based on (a) Testing accuracy, (b) Sensitivity and (c) Specificity.
Specificity: It specifies the fraction of negatives, which are detected with devised model, and is stated as.
The effectiveness of feature selection approaches are analyzed using various state of the art techniques, such as Chi-2 [3], FS [17], LDA [12], RF [14], SVM recursive [13] and Tanimoto similarity.
Comparative analysis
The comparative analysis of various feature selection techniques are done with comparative techniques by varying the training data percentage, k-fold value and selected features.
Based on changing training data percentage
Figure 2a–c shows the comparative assessment of various feature selection techniques based on testing accuracy, sensitivity and specificity. Figure 2a displays the testing accuracy of several feature selection approaches with respect to training data percentage. When 90% of training data is selected, then the testing accuracy of Chi-2, FS, LDA, RF, SVM recursive and Tanimoto is 0.765, 0.798, 0.832, 0.854, 0.905 and 0.925, correspondingly. Figure 2b shows the comparative assessment of feature selection techniques for sensitivity. The sensitivity of Chi-2 is 0.784, FS is 0.814, LDA is 0.851, RF is 0.875, SVM recursive is 0.914 and Tanimoto is 0.927 for 90% of training data. Figure 2c demonstrates the specificity of feature selection techniques. As the selected training data is 90%, then the Chi-2 achieved the specificity of 0.741, FS achieved the specificity of 0.765, LDA achieved the specificity of 0.801, RF achieved the specificity of 0.823, SVM recursive achieved the specificity of 0.874 and Tanimoto achieved the specificity of 0.895.
Based on changing the k-fold value
Figure 3a–c shows the assessment of various comparative techniques according to evaluation metrics with varying k-fold value. Figure 3a displays the assessment of testing accuracy. Here, the testing accuracy of comparative techniques, such as Chi-2, FS, LDA, RF, SVM recursive and Tanimoto is 0.772, 0.814, 0.854, 0.874, 0.914 and 0.932 as the k-fold value is 8. Figure 3b illustrates the sensitivity achieved by various feature selection approaches. Here, the sensitivity of Chi-2 is 0.798, FS is 0.825, LDA is 0.841, RF is 0.862, SVM recursive is 0.884 and Tanimoto is 0.932 when the k-fold value is 8. Figure 3c shows the specificity of various feature selection techniques. The specificity of 0.762, 0.795, 0.832, 0.854, 0.874 and 0.914 is attained by the techniques, such as Chi-2, FS, LDA, RF, SVM recursive and Tanimoto for the k-fold value is 8.
Based on changing the selected features
Figure 4a–c establishes the assessment of comparative techniques, such as Chi-2, FS, LDA, RF, SVM recursive and Tanimoto. Figure 4a shows the testing accuracy of various feature selection approaches in terms of feature size. When selecting the size of feature is 8000, then the testing accuracy of Chi-2, FS, LDA, RF, SVM recursive and Tanimoto is 0.765, 0.798, 0.832, 0.854, 0.905 and 0.925, correspondingly. Figure 4b illustrates the sensitivity achieved by various feature selection approaches. Here, the sensitivity of Chi-2 is 0.784, FS is 0.814, LDA is 0.851, RF is 0.875, SVM recursive is 0.914 and Tanimoto is 0.927 when the feature size is 8000. Figure 4c demonstrates the specificity of feature selection techniques. As the selected feature size is 8000, then the Chi-2 achieved the specificity of 0.741, FS achieved the specificity of 0.765, LDA accomplished the specificity of 0.801, RF achieved the specificity of 0.823, SVM recursive recorded the specificity of 0.874 and Tanimoto achieved the specificity of 0.895.
Comparative results
Comparative results
Table 1 demonstrates the comparative results of various techniques, such as Chi-2, FS, LDA, RF, SVM recursive and Tanimoto in terms of evaluation parameters. Here, the investigation is done by considering the three kinds of variations, such as training data, k-fold value and selected features. From the discussion, it is clearly declared that the best outcome is attained in changing the k-fold value. Thus, the better testing accuracy achieved by the Chi-2, FS, LDA, RF, SVM recursive and Tanimoto techniques are 0.772, 0.814, 0.854, 0.874, 0.914 and 0.932. Moreover, the comparative techniques recorded the sensitivity of 0.798, 0.825, 0.841, 0.862, 0.884 and 0.932, and then the specificity of 0.762, 0.795, 0.832, 0.854, 0.874 and 0.914, respectively.
Conclusion
This paper shows the impact of various feature selection approaches for T2DM detection using GEO dataset. Here, the analysis is done with various state of the art feature selection methods, like Chi-2, FS, LDA, RF, SVM recursive and Tanimoto in terms of testing accuracy, sensitivity and specificity. Moreover, each of the techniques are tested separately with varying training data, varying k-fold value and varying selected features. Moreover, the T2DM detection is done by performing the various processing steps, such as pre-processing, feature selection and T2DM detection. The T2DM detection is done with the assistance of CMVRO-based hybrid deep learning approach. Here, the CMVRO algorithm is deliberated by the integration of CMVO and ROA. Moreover, the comparative analysis is done by passing the outcome of feature selection techniques separately to the T2DM detection depends on hybrid deep learning classifier. Furthermore, the experimental outcome provides that the maximum value of testing accuracy, sensitivity and specificity based on tanimoto similarity feature selection approach is achieved with the values of 0.932, 0.932 and 0.914 by adjusting the value of k-fold. The efficiency of the proposed work can be improved with the inclusion of more features. In future, some other optimization algorithm could be adopted for the detection of the disease. Moreover, advanced classifiers could be used for the disease detection.
