Abstract
Diabetes is among the most common medical issues which people are facing nowadays. It may cause physical incapacity or even death in some cases. It has two core types, namely type I and type II. Both types are chronic and influence the functions of the human body that regulate blood sugar. In the human body, glucose is the main element that boosts cells. However, insulin is a key that enters the cells to control blood sugar. People with diabetes type I do not have the ability to produce insulin. Whereas people with diabetes type II lack the ability to react to insulin and frequently do not make enough insulin. For adequate analysis of such a fatal disease, techniques with a minimum error rate must be utilized. In this regard, different models of artificial neural network (ANN) have been investigated in the literature to diagnose/predict the condition with a minimum error rate, however, there is a need for improvement. To further advance the accuracy, a deep extreme learning machine (DELM) based prediction model is proposed and investigated in this research. By using the DELM approach, a high level of reliability with a minimum error rate is achieved. The approach shows significant improvement in results compared to previous investigations. It is observed that during the investigation the proposed approach has the highest accuracy rate of 92.8% with 70% of training (9500 samples) and 30% of test and validation (4500 examples). Simulation results validate the prediction effectiveness of the proposed scheme.
Introduction
These days, diabetes is a common physiological health problem. The word diabetic is used when people cannot disrupt glucose because of insulin shortages [45]. The human organ known as the ‘pancreas’ is responsible for the production of hormones called insulin which is responsible for regulating the blood sugar (glucose) levels in the body. The reasons for diabetes incorporate obesity, inappropriate diet, inheritance, and lack of physical exercise, etc. [14]. Delays in the treatment and diagnosis of diabetes may cause patients to face some serious problems such as blindness, heart attacks, kidney failure, etc. [11,27].
As the living standards and lifestyles are continuously evolving, diabetes is becoming overwhelmingly common. Early diagnosis and precise analysis of diabetes is a worthy subject of study in healthcare. Diabetes is mainly diagnosed by frequently observing the blood glucose level and glucose tolerance in patients [21].
The prevalence of diabetes has been doubled worldwide over the last 10 years, with around 200 million people infected and an increase of about 6% in world diabetes prevalence per year based on the report of the Diabetes Research Centre. Research shows that 7.7% of adults aged 25–64 have been found suffering glucose impairment, almost 2 million people suffering from diabetes of which 16.8% are adults. Most of those patients will develop diabetes in the future [12]. As diabetes is a chronic disease that causes irreparable damage to the limbs and major organs of the body, intelligent instruments are used to improve detection methods to control the disease and support doctors in decision making [5]. Research shows that 80% of the chronic complications of type II diabetes can be avoided or delayed with early diagnosis of patients at risk [32].
However, one of the most significant issues in healthcare is a timely and accurate diagnosis of diseases. It is generally a complex task requiring a high level of skill and experience. Timely diagnosis and treatment may reduce the patient’s difficulties and therapeutic costs. [56]. Diabetes is not a curable disease but can be treated, fortunately [44]. In the area of medical prognosis, identifying and treating diseases accurately became a top priority.
Generally, the diagnosis of diabetes type II is a complex task that requires a high level of skill and experience. The artificial neural network (ANN) is the best solution to find high accuracy to diagnose this disease. In recent years, this method has been widely used in prediction, diagnosis, evaluation, and detection. ANN trained with appropriate data helps disease prediction [44].
With a deep learning neural network (DLNN), it is conceivable to carry out a diagnostic task on diverse diseases so that we can tackle the problem of disease recognition with reduced human errors [18].
It is important to achieve a high level of accuracy when using ANN’s to recognize diseases. The nature of the analysis depends on the testing and training of the dataset [13,18]. In this research, we focused on diabetes type II prediction using a DLNN approach. In this regard, before starting the diagnostic task, the patient datasets must prepare for the proposed network models. After testing and training, DLNN provides the means of achieving enhanced prediction accuracy with low errors.
Smith et al. used the early neural network model Adaptive learning routine (ADAP) to develop a prediction model for diabetes mellitus. The findings were then compared to results obtained from the use of the linear perceptive model and logistic regression respectively. Sensitivity and specificity were used as standard clinical benchmarks [51].
Machine learning methods are widely used for diabetes prediction and exhibit better results. The decision tree is a popular classification method in machine learning. Many decision-making trees generating a random forest (RF). In [57], the authors applied RF in contrast to principle component analysis (PCA). It was concluded that RF obtains almost 80% accuracy and outperforms PCA.
Many researchers are conducting experiments for diagnosing diseases with different classification algorithms of machine learning methods such as J48, support vector machine (SVM), and Naive Bayes, decision tree, decision table, etc. as researches have proved that machine learning algorithms work better on diagnosing various diseases [1,23,24]. The ability to combine data from several sources and integrate the background in the study enables data mining [4,28] and algorithms for learning machines to gain strength [15].
Early diagnosis helps to control the diseases. Machine learning can help people to make a preliminary assessment of diabetes mellitus based on their daily basis physical test data [3,30]. The most critical issue with machine learning methods is the selection of the correct features and classifiers.
In this paper, a deep extreme learning machine (DELM) for prediction of diabetes type II is proposed to achieve the highest accuracy. In the training and testing of diagnostic periods of diabetes with deep learning, a dataset [17] with 15 thousand data instances is used, so that each instance must include distinct and diverse characteristics. Consequently, the investigation and comparison with the state-of-the-art techniques in the same field are made and the outcome is concluded.
The rest of the paper is organized as follows. Section 2 briefly describes the related work. Section 3 presents the method to carry out a comprehensive evaluation for the prediction of diabetes type II. Section 4 discusses the simulation and results of the DELM approach. Section 5 outlines the conclusions.
Literature review
Research on the diagnosis of diabetes through ANN and data mining techniques has been extensive so far. We surveyed some of the prime studies in this section and in the results section, comparison with the proposed work is made in terms of training and testing accuracy of the diagnostic technique for type II diabetes. Sa’di et.al. [45] utilized data drilling methodology for type II diabetes diagnoses, such as Naive Bayes, Radial Basis Function (RBF) Artificial Neural Networks. They used a data collection of 768 instances, of which 230 were selected for the test phase. With 76.52% and 74.34% precision, the Naive Bayes algorithm with 76.95% was better than that of J48 and RBF respectively.
Olaniyi et. al., [36] use artificial neural backpropagation networks of various layers to identify type II diabetes where backpropagation is a controlled error-rectifying learning algorithm. The approaches of hybrid algorithms are the hottest field of computational intelligence. There are 4 branches of Computational Intelligence Fuzzy [7,22], Swarm [6], Evolutionary [26,54] and Neural nets. In various fields like wireless communication, cloud computing, image processing, healthcare, etc., hybrid structures of these approaches play an essential role.
Al-Rofiyee, et al. [2] utilized multilayer perceptron (MLP) ANN for diabetes type II identification. The diabetes diagnosis is one of the issues where a single layer perceptron may not help. It should be designed as an artificial neural multi-layer network to learn non-linear behavior from data. A different number of neurons is possible in each layer. A Multilayer perceptron has been used to diagnose type II diabetes. The MLP consists of many layers of input, hidden layers and a layer of output. About 20% of the instances are utilized for training purposes, the test collection set uses 60% and the training set uses 20%. Time and number of neurons are 2 significant parameters in the hidden layer of the MLP model. In the training phase, a significant diagnostic accuracy was obtained by using a various number of neurons in the MLP model with a minimum number of hidden layers but huge time complexity.
Kumar et.al. [29] used a dataset containing 250 diabetes instances. Each instance consists of 27 data characteristics. The patients’ average data and age also varies from 25 to 78 years. For diagnosis, a multilayer feedforward ANN with backpropagation is used. The back-propagation algorithm consisted of 3 functions of training, Levenberg Marquardt, Bayesian Regulation and BFGS Quasi-Newton. Finally, Bayesian Regulation’s back-propagation function achieved 88.8% as the most accurate diagnostics than Levenberg and BFGS Quasi. In addition, the Pima Indians diabetes data dataset is used to identify diabetes [36].
Kayear et.al. [25], use general neural regression networks (GNRN) and Pima Indians diabetes dataset. The GNRN model herein is supposed to include a 4-layer model, a single input layer of 8 Pima Indians diabetes features with 2 layers of 16 and 32 neurons, respectively. For the processes of training and testing, the above-mentioned dataset with 576 data samples and 192 data as test sets were utilized. The training and testing accuracy stages are 82.99% and 80.21%, respectively. A higher accuracy level was achieved when compared to other analysis in this paper during the training phase.
Sajida et al. [38] discuss the role of machine learning methods Adaboost and Bagging ensemble [35], using J48 as the basis of a risk factor classification for diabetes mellitus and patients as diabetic. Results after the experiment prove that the Adaboost machine learning ensemble technique is comparatively superior to bagging and a decision-making J48.
Orabi et al. [37] have developed a diabetes prediction system whose main objective is to prevent diabetes at a particular age. The system proposed is designed using the decision-making tree, based on the concept of machine learning. The results achieved were satisfactory because the designed system was successful in forecasting diabetes incidents at a given age, using the Decision Tree [41,49] to be more accurate.
Pradhan et al. [39] used Genetic Programming (GP) to train and test the database for diabetes prediction by using the UCI Diabetes Data Set. Genetic programming results [40,48] provide optimum accuracy in comparison with the other techniques implemented. By taking less time to generate a classifier, it can significantly improve accuracy. It is helpful for low-cost prediction of diabetes.
The two-submodule model and prediction model for diabetes chronic diseases were developed by Rashid et al. [53]. The first module uses ANN (Artificial Neural Networks), and the second module uses FBS (fasting blood sugar). Decision Tree [19] is used to identify the patient’s health symptoms of diabetes.
Nongyao et al. [34] used diabetes mellitus risk algorithms. In order to achieve the objective, the author has used 4 renowned classifications for the machine Decision Tree, ANN, Naive Bayes, and Logistic Regression. The techniques of bagging and boosting are used to improve the solidity of designed models. The results of the experiments show optimal results from the Random Forest algorithm among all used algorithms.

Detailed diagram for the prediction of diabetes type II using DELM approach.
After the conditional regression comparison, Neural Networks predicted albuminuria in type II diabetes in 2012 by Nakhjavan et al [33]. Controlling population effects and diagnosing, using logistic regression analysis to calculate the odds of an unusual neuroleptic version and diagnosis of diabetes in the individual age groups, was investigated in 2014 [47].
System model
Diabetes prediction in a human being is extremely important because it assists the patient to maintain his/her diet to control blood sugar effectively [42,46]. However, a correct diabetic prediction is a challenging task. In this research, a system for accurate diabetes prediction based on DELM is proposed. The proposed method has been divided into 3 main layers that are data acquisition layer, pre-processing layer, and application layer. The data acquisition layer deals with the appropriate data collection for investigation. In the preprocessing layer, standard data processing approaches are used to eliminate anomalies in the data. In the application layer, there are 2 sub-layers namely the prediction and performance assessment layers, respectively. The proposed DELM is investigated in the application layer to improve the prediction accuracy in diabetes type II.
Figure 1 depicts the components and detail of the proposed prediction approach. It shows that the data acquisition layer contains the input parameters to the neural system, where a training algorithm has been used to predict type II diabetes. Applications of artificial neural networks (ANNs) in various fields need no introduction. The ANNs comprise of a set of neurons which are the fundamental unit of information processing characterized by a layered arrangement, mainly, input, output and hidden layers [13,44,50].
Deep extreme learning machine (DELM)
The deep extreme learning machine (DELM) is a well-known technique used in various areas for predicting healthcare problems. It is a combination of Deep Learning (DL) and Extreme Learning Machines (ELM). The traditional ANN algorithms require more samples, exhibit slow learning rate and can overfit the learning model [10]. The idea of DELM was first specified by [20]. The DELM is used widely in various areas for classification and regression purposes because DELM learns faster and it is efficient in terms of computational complexity.

Structural diagram of a deep extreme learning machine.
DELM has been used in the proposed work to encapsulate the benefits of ELM as well as deep learning. In addition, it has been shown that DELM significantly improves the expected results. 3 layers included in the proposed DELM model the input layer, multiple hidden layers, and an output layer. The structural model of a DELM is shown in Fig. 2, where
In the modelling of machine learning algorithms to increase predictability and to improve the training process, complete sample data have been standardized to fit in the interval
A training sample is taken at first, as,
Initially, the ELM has randomly selected the biases of the hidden layer nodes, as given in Eq. (7). ELM also chose the network activation function
Then we can obtain Eq. (10) be exploiting the Eqs (8) and (9). The outcome of the hidden layer is M and transposition of N as
Here the μ is the regularization term used to increase the network’s overall stability [55].
A deep learning system consisted of a minimum of 4 layers with inputs/outputs that fulfill its needs. The neurons of every layer are trained by a deep neural network on a diverse set of parameters with the result of the previous layer [31]. This allows extensive datasets to be processed by the deep learning networks. Deep learning has gained huge attention because of its effectiveness in solving real-world issues. The Proposed DELM model is comprised of a single input layer with 4 neurons, 6 hidden layers, with 10 neurons in each hidden layer, and 1 output layer with 1 neuron (Fig. 2). The test and error method for selecting the number of nodes from hidden layers was used because of the lack of any special mechanism for specifying hidden layer neurons. The second hidden layer output is obtained as:
Where
In Eq. (13), the parameters
Updating the weight matrix μ between the second and third layer, yields to Eq. (16). Here
The
In the end, the resulting weight matrix is calculated as in Eq. (22) concerning the third hidden level and the final layer result. Equation (23) represents the outcome of the third hidden layer.
The desired output represented in Eq. (11) by
Finally, in Eq. (27). the output weight matrix from the 4th to the output layers is calculated. Equation (28) indicates the assessed outcome of the 5th layer. Equation (29) shows the required output of the DELM system.
We have examined the procedure of calculating the 4 hidden layers of the DELM system. The theory of cycle was used to show the DELM computing procedure. Equations (18), (22) can be recalculated to record the parameters of every hidden layer and ultimately the last result of the DELM network. If hidden layers are increased, the same process can be reused and performed in the same way.
The famous back-propagation algorithm (BPA) [9] incorporates components as weight initialization, feed forwarding, backward error propagation, updating weight and bias is subject to distinctive developments. An activation function like Sigmoid exists on each neuron in the hidden layer. The sigmoid input function and the hidden layer of the DELM can be composed in this way:
Equation (31) shows the backpropagation of error, which can be calculated by the sum of the square of the desired output from the calculated output divided by 2.
To reduce the overall error the adjustment of weight is required. Equation (32) shows the rate of change in weight for the output layer.
Writing Eq. (33) by using the chain rule method, yields to:
The value of change in weight can be achieved after substituting the values in Eq. (34) as shown in Eq. (35).
From
The calculation to determine appropriate weight change to the hidden weight is shown in the following procedure. It is complex because the weighted connection can lead to errors at all nodes.
From
Where
Where
The process of upgrading the weight and bias among the output and the hidden layer is shown in Eq. (36).
Equation (37) shows how to update the weight and bias between the input and the hidden layers.
Results and discussion
According to Table 1, each data sample has 10 features. The first 9 features are inputs, and the last feature is the only output. In order to classify the 15 thousand data samples, 10th feature is used as it is classified into two classes: class zero (healthy) and class 1 (patient). For validation, the proposed DELM is applied to the dataset provided in [17]. In this regard, DELM was used to train and fit 15 thousand instances. This data is arbitrarily divided for training and testing. Where 70% of data is used for training (9500 samples) and 30% of data is used for validation and testing (4500 samples). Data cleaning was performed during the pre-processing phase in which anomalies were removed from the dataset.
Features of dataset for diagnosing diabetes disease type II
Features of dataset for diagnosing diabetes disease type II
DELM has attempted to discover the finest configuration model for diabetes prediction in different hidden layers, hidden neurons, and combinations of the activation functions [8,16]. Therefore, we have tried the same number of neurons, different types of active functions in hidden layers. In order to measure the performance of the proposed method, together with the counterpart algorithms, we used several statistical measures given in Eqs (38) to (47).
For the occupancy data set [17], the DELM approach was used and the results obtained are enlisted in Table 2 and Table 3. As can be seen from Table 2, we take 70% of data (9500 samples) for training from the dataset [17]. We have 2 expected outputs negative (0) and positive (1). Negative (0) result demonstrated that the patient has no diabetes and positive (1) result demonstrated that the patient has diabetes. In Table 2, it is shown that on 9500 data samples we have expected the output of 5034 negative samples and 4476 positive samples. After applying training on 9500 data samples we get the result of 4661 samples negative output and 4150 samples of positive output. After comparing with expected output and result that got after applying the proposed approach it can be shown in Table 2 that the outcome of the proposed approach during training is 92.7% accurate and the miss rate is 7.2%. In this proposed method, we get 4661 negative samples output while the expected output is 5034 negative samples and 4150 positive samples while the expected output is 4476 positive samples.
Training accuracy of the proposed DELM system with varying hidden layers during the prediction of diabetes type II diagnosis in training
Testing accuracy of the proposed DELM system with varying hidden layers during the prediction of diabetes type II diagnosis in testing
Different statistical measures for prediction of diabetes type II

Performance comparison of the proposed DELM system with varying hidden layers during the prediction of diabetes type II diagnosis in testing and training.
As can be seen from Table 3 we take 30% of data (4500 samples) for testing and validation from the dataset [17]. In Table 3 it is shown that on 4500 data samples we have expected the output of 2200 negative samples and 2300 positive samples. After applying training on 4500 data samples we get the result of 1951 samples negative output and 2108 samples of positive output. After comparing with expected output and result that got after applying the proposed approach it can be shown in Table 3 that the result of the proposed approach during testing and validation is 90.2% accurate and the miss rate is 9.8%. In this experiment, we get 1951 negative samples output while the expected output is 2200 negative samples and 2108 positive samples while the expected output is 2300 positive samples.
The percentage of true positives that are properly recognized by the assessment is sensitivity and specificity is the percentage of true negatives that are properly recognized by the assessment. It is apparent from Table 4, based on the sample studied, a high sensitivity will recognize all patients with the disease by testing positive and a test with a higher specificity has a lower error rate. Sensitivity and specificity are the methods to evaluating the test’s therapeutic capacity. However, all test outcomes in clinical practice are recognized, so we want to understand how useful to predict the test.
It is observed that the overall performance in training of the proposed method (DELM) was 92.8% accurate as shown in Fig. 3, while the miss rate of training is 7.2%. But in testing and validation, the overall performance of the proposed method (DELM) was 90.2% accurate as shown in Fig. 3, while the miss rate of training is 9.8%. It is further shown in Fig. 3 that in the training phase results are more accurate with a minimum miss rate as compared to the testing and validation phase.
MSE comparison of the proposed DELM system with PNN [52]

MSE comparison of the proposed scheme with [52].
According to Table 5 PNN [52] utilized the Pima Indians diabetes data set [43] consisting of 768 data samples, the mean square error in each round is increasing. In the PNN [52] approach, there is only 1 hidden layer and an increasing number of neurons. Table 5 shows that during training the mean square error of PNN [52] methodology with 25, 75 and 200 neurons is 1.9205, 1.6885 and 1.1025 respectively. It is shown in Table 5 that MSE of the proposed DELM approach with the same number of neurons is

Comparison of diagnosing diabetes type II between proposed and other approaches.
Further, it is observed that in [52], with a single hidden layer when the neuron count increases the performance of the system also increases as shown in Table 5. While, in the proposed DELM solution as the number of hidden layers increases with the same (10) number of neurons, the performance increases significantly. This means that the performance of the system is increased by the increased number of neurons but not much as in the proposed DELM system.
This performance comparison in terms of MSE is shown in Fig. 4. The proposed scheme exhibits better performance compared to [52] with a significantly reduced and affordable computational complexity (number of layers x number of neurons per layer). In Fig. 4 x-axis shows 3 values (25, 75 and 200) against [52] and (4, 5 and 6) for proposed scheme. It is also apparent that the number of layers increases linearly (proposed) while number neurons increase sharply [52], MSE for the proposed scheme is low.
However, it is worth mentioning here that the datasets used by the proposed scheme and [52] are different. The diabetes dataset for the proposed scheme is taken from Kaggle [17] while [52] utilizes the Pima Indian diabetes dataset [43]. So, the approaches are not just different in terms of structure but datasets as well.
In Fig. 5, the prediction of diabetes type II is carried out by all conventional findings with data from Pima Indian diabetes set other than the Bayesian Regulation [29]. The proposed approach with DELM outperforms in terms of accuracy by other prototypes, such as backpropagation [36], Bayesian regulation [29], ANN [55], GRNN [25] and PNN [52]. Among these approaches, the worst approach is GRNN [25] with an accuracy of 82%. Moreover, PNN [52] is the best approach during the training phase compared to backpropagation [36], with 89.56% accuracy. In addition, the accuracy of ANN [55] and PNN [52] is quite close. The overall results for PNN [52] were 89.56%, but the proposed DELM system performance is 92.8% and was higher than the previously proposed methods in terms of accuracy rate. The values of the statistical measures suggest that DELM performance is much higher than the other approaches. So, the proposed DELM is a considerable choice for diabetes type II prediction.
Modeling, analysis, and prediction of diabetes type II is a challenging task. In this research, a model for diabetes type II prediction has been proposed to improve the prediction accuracy. The proposed model is an expert system based on an artificial neural system (ANN) with a deep extreme learning machine (DELM) possessing a high level of potential to predict diabetes type II. Various numbers of the hidden layer neurons were defined, and diverse activation functions and features were used for the ideal arrangement of different DELM parameters to obtain an optimized structure.
For measuring the performance of the proposed approach, various statistical measures have been used. These measuring figures show that proposed DELM in contrast to other algorithms (though with different datasets and structures) reflects better accuracy. Compared to past approaches, the proposed DELM technique produces attractive results. The proposed technique exhibits 92.8% accuracy which much better than the existing techniques in their own settings. Moreover, it is observed that the proposed approach exhibits an affordable computational complexity. DELM has been used in the proposed work to encapsulate the benefits of ELM as well as deep learning. We are confident in initial results and intended to expand this work in the future by investigating different datasets, learning machines, structures, and algorithms.
