Abstract
The Retinal image analysis has received significant attention from researchers due to the compelling need of early detection systems that aid in the screening and treatment of diseases. Several automated retinal disease detection studies are carried out as part of retinal image processing. Heren an Improved Ensemble Deep Learning (IEDL) model has been proposed to detect the various retinal diseases with a higher rate of accuracy, having multiclass classification on various stages of deep learning algorithms. This model incorporates deep learning algorithms which automatically extract the properties from training data, that lacks in traditional machine learning approaches. Here, Retinal Fundus Multi-Disease Image Dataset (RFMiD) is considered for evaluation. First, image augmentation is performed for manipulating the existing images followed by upsampling and normalization. The proposed IEDL model then process the normalized images which is computationally intensive with several ensemble learning strategies like heterogeneous deep learning models, bagging through 5-fold cross-validation which consists of four deep learning models like ResNet, Bagging, DenseNet, EfficientNet and a stacked logistic regression for predicting purpose. The accuracy rate achieved by this method is 97.78%, with a specificity rate of 97.23%, sensitivity of 96.45%, precision of 96.45%, and recall of 94.23%. The model is capable of achieving a greater accuracy rate of 1.7% than the traditional machine learning methods.
Keywords
Introduction
In accordance with the informational charts gathered from the World Health Organization (WHO) and it is estimated that about 285 million of the world population are affected by visual impairment [1]. Of these, about 0.39 billion are blind and about 0.24 billion have poor vision. There are different types of retinal diseases including Age-related macular degeneration, diabetic retinopathy, retinal detachment, reinitis pigmentosa and glaucoma. Glaucoma is a degenerative eye disease that destroys the optic nerve at an exponential rate owing to increased Intraocular Pressure (IOP). The ARMD degeneration can include both non-neovascular (such as drusen and retinal pigment epithelium) and neovascular (such as new blood vessels) abnormalities [2].
Proper diagnosis and early discovery are important for reducing the growth of disease to avoid blindness. But more extensive in person eye screening is hindered by the shortage of ophthomologist. Even transportable imaging equipment is available to enhance the medical services for more distant places. However, the automated analysis of retinal images is still a developing technique. Deep learning based automatic ocular pathology detection is one technique for overcoming this coverage gap [3]. Images of the fundus are now the most reliable and extensively used diagnostic technique for retinal disorders. Analyzing the retinal vasculature allows one to examine the course of diabetic retinopathy, hypertension, and cardiovascular-related disorders [4].
Figure 1 shows the various diseases and abnormalities represented in the RFMiD dataset. Diabetic Retinopathy [5] affects both eyes. Even for a practitioner who is very well prepared, early detection of DR may be a time consuming process. As a result, this can lead to therapy being postponed, misunderstandings, and other problems. There are 10,000 ophthalmologists available to care for India’s total population, despite the fact that 70% of the country’s population lives in rural regions (i.e., the ratio of 1 : 100,000 people) [6].

Retinal Disease Annotation present in RFMiD Dataset.
Neural network technologies have been used in the majority of eye clinical labs throughout the country in an effort to diagnose diabetic retinopathy at an earlier stage and to make the job of ophthalmologists more manageable [7]. The literature survey and its limitations were discussed in the following sessions.
Because of various obstacles that need to overcome, a fully automated processing necessary to detect the retinal diseases. In [8], Lv Yan, et al., introduced a new technique which integrated the strengths of two different classifiers, such as Random Forest and CNN. CNN was used to achieve the trainable hierarchical feature extraction, and an in depth experiment was carried out using retinal image databases such as STARE and DRIVE. The created system makes use of colored images stored in RGB space, with the green channel making up part of these images. Histogram equalization and Gaussian filtering are two of the steps involved in the process of improving an image. In [9], Alekhya Bhupati, et at., used ResNet architecture which also pioneered the use of a shortcut link to solve the issue of disappearing gradients in very deep networks. In [10, 11], introduced deep learning techniques to automate the detection of retinal diseaseusing STARE dataset. In [12, 13], Khan, et al., discussed about the residual connection that was established, the neighbourhood of the disc retained more information in greater detail, boasting a higher resolution as well as a very fine representation. However, the optic disc restricts a significant amount of the contextual information that is included in the patch, and as a result, it ignores the global information that is contained in the fundus image. In [14], Malaya Kumar Nath, et al., introduced a disc-aware Ensemble Network, to built a screening model for glaucoma. Here, the clinical assessment of glaucoma was determined by first dissecting the disease’s fundamental structure. In [15–18], a fully convolution neural network is used to detect the blood vessel extraction to determine the level of disease infected. The intensity range overlapping with that of the other areas is one of the obstacles, along with inter- and intra-subject variance, non-uniform lighting when recording, non-uniform contrast, and the coincidence of thin vessels with backdrop texture. The intensity range of retinal blood vessels often falls within the intensity range of other anatomical structures; as a result, it might be difficult to differentiate vessels from background. It has been found that people coming from various places do not have retinal images that are identical to one another.
In [19, 20] Images of the fundus are obtained by detecting the light rays that have been reflected from the retina’s layers. The illumination and contrast levels of the picture are not uniform because the intensity of the reflected light rays is not the same from various parts of the retina. Due to the fact that the texture of the backdrop area in the picture matches with the thin vessels, this problem is still an ongoing one that has to be solved. In [21], a manner similar to that of retinal blood vessel extraction, artery and vein classification was required in order to solve some pressing problems. Variations in intensity level, inter-image variation, central reflex variation within the same vessel, alternate nature of artery and vein vessels changes when examined away from optic disc, and vessel thickness variation are some of the problems that might arise. In [22–25] ResNet is used to perform the multi-scale representation of fundus images with the help of scale space approximation layers. The methods of down sampling and up sampling are used to accomplish this goal. The skip connection-based residual deep learning networks are an example of a strategy that combines elements from a variety of other methods. In order to bypass the connections and directly retrieve the information will be offered in work that will be done in the future.
In [26, 27], Hu Ki, et al., introduced a new Cross-connected CNN model that was built specifically for vessel segmentation has been shown to increase the overall performance of segmentation, and a reliable segmentation technique has also been provided. Also, VC-Net is introduced in vessel constrain for vessel segmentation. It is used in two newly created multicenter datasets like HRF and LES. Both of these advancements were made possible by the development of an improved Cross-connected CNN model. The parameters that were utilized to measure performance were accuracy, speed of prediction, and specificity. In [28, 29], It was proposed that the vessels of the retina might be segmented with the help of the Dense-UUNet model by using a training strategy that is based on the concept of random transformations. Random transformation is used to improve the quality of the images in order to make the training more effective.
Though these deep learning methods performs well in multi class disease classification, some limitations are noted as they have over fitting problem, complexity in architecture and also not suit for all the tasks that to be performed. Higher the number of layers, higher the performance metrics.
The main contribution of the proposed work is to analyse the multiclass disease detection in retinal image using ensemble deep learning algorithm. The proposed work includes the Improved Ensemble Deep Learning model with the 5-fold cross validation and a stacked logistic regression for predicting the disease and the performance metrics is analyzed using various parameters such as accuracy, sensitivity, specificity and recall rate.
The next chapters are as follows: Section 2 includes the materials and methodology of proposed work, Section 3 includes experimental results and analysis. Section 4 includes conclusion and future work.
Proposed methodology
Materials
The Retinal Fundus Multi-Disease Image Dataset (RFMiD) as shown in Fig. 2. is a freely accessible public dataset made up of 3200 fundus images in which 1920 images were used as training datasets, 640 images were used as evaluation set and 640 images used as test set [30]. The fundus images were captured by three different fundus cameras having a reesolution of 4288×2848 (277 images), 2048×1536 (150 images) and 2144×1424 (1493 images), respectively and cover 46 retinal ailments, some of which are very uncommon and difficult to diagnose [31, 32]. The dataset was released in conjunction with the Retinal Image Analysis for Multi-Disease Classification (RIADD) challenge that was presented by the ISBI 2021. The objective was to perform multi-label classification on retinal microscopy images of varying sizes. This challenge was created with the intention of bringing together members of the medical image analysis community in order to develop techniques for the automated categorization of ocular disorders, including both common and uncommon pathologies.

Retinal Fundus Multi-disease Image Dataset (RFMiD).
In contrast to past efforts, which concentrated on the detection of particular disorders, this challenge will make it possible to construct generalizable models for screening the retina.
The images used here may have poor contrast in particular situation as they are captured in different lighting position [33]. So those images need to be preprocessed before training the model. In this model, Contrast Limited adaptive Histogram Equalization (CLAHE) is used which divides an image into small rectangular region called “tiles”. The histogram equalization algorithm is given as:
The pre-processed images are then used as the input images for this model.
The images from the datasets are taken under different lightning conditions. There are several image augmentation method [34]. In this model, Generative Adversarial Network (GAN) is used for image augmentation which generates the data which is more realistic and diverse. By using up-sampling, the total number of training datas has been increased from 1920 to 3354.
Ensemble deep learning model
In recent years, deep learning plays a vital role in the the field of medical imaging and diagnostic, which is one of the most important techniques in the field of artificial intelligence. Specifically, deep learning methods are often used in the field of medical image classification [35]. This subfield of pattern recognition applications and automated illness detection systems comes within the broad category of deep learning applications. Here, Fig. 3. is the proposed work which shows that to improve the performance of deep networks, the retinal image database is first fed into image augmentation. The images are then up sampled to double their original proportions.

Proposed work architecture.
Residual Network (ResNet) is a deep learning architecture which is one of the most commonly used architecture in identifying multi class retinal disease detection. Here, the input is resized as 246x246 and it is processed through the ResNet architecture with a depth of 50 layers. The skip connection based residual deep learning networks are an example of a strategy that combines elements from a variety of other methods [36]. In order to bypass the connections and directly retrieve the information that is necessary from the fundus photos, a 5-fold validation technique is used. After the ReLu activation in a SCB, also known as a skip connection block, there is a totally linked layer that is created. Both the feed-forward network and the weight update method of the back propagation approach may reap the benefits of its use.
Here, the by using concatenated DenseNet architecture can avoid vanishing gradient problem in different blocks. Two strides are used in all layers of the architecture. The image is resized into 380×380 before proceeding into DenseNet architecture. The same size of inputs is processed for EfficientNet B4 architecture too.
The Bootstrap Aggregating (Bagging) uses random forest and bagged decision tree algorithms for classifying the images. By using Bagging algorithm, the accuracy rate is higher.
These four algorithms works together which produce different models that are trained on different subsets of the training datas.
The class weighting is used to avoid the imbalance caused in training the model. Here, class weighting is applied to each individual model in the ensemble and the predictions from each model is combined together to get the updated weights. The values of the weight were generated using random numbers, and then those values were optimised to achieve the best results. The back propagation approach was utilised to get the outcome, and the weight values that were optimised were established in order to accomplish that goal. In the Neural Network Algorithm, the properties of the optic cup image are taken into account as input data. This helps the algorithm to perform its intended function. The evolutionary algorithm was the one in charge of coming up solutions and keeping the performance of the recommended model up to date. The values of the weight are generated at random, and it is presumable that they are accurate. Calculating items like the sum and product of the inputs is one of the steps in the process of creating the output. The input values, which are represented as weights, are employed in the process of producing the output. The great accuracy that can be attained using deep learning techniques is the primary benefit of these methods. This high accuracy is accomplished through the use of several layers that execute automatic feature extraction. The network is self-teaching and does not need any manually constructed features to function properly. Even though the cost of the system’s computational complexity is quite high, the precision gained is very high. For applications that take place in real time, it is a principle that is generally understood and accepted that a system has to be effective in terms of both its accuracy and its level of complexity. The Deep Learning Neuron model had five layers, each of which was used for categorization.
The output that has been developed up to this point starts to become active from the layer that is subsequent to the one that has been obtained for the implementation. The deep learning Neural Network model’s forward activation flow that was run on the data adjusted the weights that were utilised for the backward error propagation and the forward activation flow. When using the paradigm of back propagation, the ultimate goal is to cut the amount of error down to as little of a number as is humanly feasible. Because the evolutionary algorithm is so easily available, the approaches have been improved by adding a wide range of unique weights.
Here, binary logistic regression is used on the top of the deep CNN which predicts all models for classifying into single class. This model is also trained with the above mentioned 5-fold cross validation of the training data set to avoid overfitting and it ensures the consistency in the final prediction.
Training of deep learning model
As noted before, the contribution to the IEDL-based CNN classifier that is currently being used is comprised of the extracted features at this stage. The image is assigned to one of these four groups according on its characteristics: Diabetic Retinopathy, CNV, Age Macular Disease and Normal (NR) fundus. CNN is based on the concept of deep learning, which consists of many diverse representations layered one on top of another. Due to the complex nature of CNN’s organisational structure, the network has shown an unparalleled capacity to differentiate between data and relevant highlights. In spite of this, the default classifier, which is known as the Softmax classifier, has just a limited capacity for generalisation, whilst the IEDL has a presentation that is only slightly improved. Therefore, it is essential and significant to complete such research on these two methods and as a consequence of this, has presented the IEDL classifier. The purpose of carrying out the convolutional approach is to make the existing system more effective in terms of the amount of data it can handle in a given amount of time. CNN makes use of a pair of concepts that it refers to as “local connection” and “parameter sharing.” Both of these concepts are interrelated.
According to every neuron that makes up a component map is linked to a cluster of neighbouring neurons that are located in the layer below it. This point was made very clear throughout the piece. In the layer underneath this one, an area that is somewhat similar to this one is referred to as the neuron’s response field. This procedure may be carried out several times until the desired outcome is achieved. It is essential to bear in mind that the kernel is the same for each and every spatial area of the data that was utilised in the production of each individual feature map. The finished feature maps are the fruit of collaborative labour from a significant number of distinct kernels. Mathematically, the feature value in the l
m
feature map of mth layer,
Average pooling and maximum pooling are the two most common types of pooling procedures [38]. In this model Max Pool is used which helps in capturing the most important feature in each feature map. After that, the characteristic of the filtered mixture is what it sent into the IEDL neural network as the input. Gaussian functions are used to represent IEDL networks in their realized form. The following is an explanation of the fundamental operational concept behind the IEDL:
The input is represented as a vector of real values in the form L ∈ R
n
. Following this, the output isepresented as a scalar function of the input vector φ : R
n
→ R, as shown in Equation 5.
That is to say, changing the parameters of a single neuron has a negligible influence on the input values that are located distant from the neuron center.
Experimentation with Python version 3.10 is being carried out on a personal computer with an i5 central processing unit as part of the ongoing research work. In this scenario, the performance of the this mhod is assessed by calculating the Negative Prediction Value (NPV), the Positive Prediction Value (PPV), the sensitivity, the specificity,he accuracy, the precision, the f-score, the False Discovery Rate (FDR), the False Recognition Rate (FRR), the False Positive Rate (FPR), the False Negative Rate (FNR), and the Mean Correct Classification. All of these metrics are taken into (MCC) [38].
The classification findings that were created wh the help of the provided technique are used for t purpose of analysing how well the performance measurements worked. When computing the outcomes of the recommended method, many metrics, including sensitivity, specificity, accuracy, kappa index, precision, PPV, NPV, MCC, recall, f-score, FDR, FRR, FPR, and FNR, are taken into consideration [39]. The actual amounts of positives that are assessed are referred to as sensitivity, and this term reflects the numbers that have been correctly identified. The results for the sensitivity are established by the ratio of the total number of false values and true values to the number of true positives, which in turn is defined by the number of true positives [40]. The equation (8) alone provides the sensitivity value.
Specificity: Equation 8 provides a measurement for all of the genuine negative values that are proportionate to the sum of all of the true negative and false positive values [41]. One way to think about this measurement is as a measurement of all of the real negative values.
Accuracy: When we talk about something being accurate, we are referring to the degree to which the value that was measured or the value that was calculated matches the value that was really there [27]. The percentage of accurately anticipated observations in relation to the total number of observations.
Where Accuracy is one of the same terms for accuracy rate and Accuracy is the hypothetical probability rate used to compute the probabilities based on the observed data.
Kappa index: It determines the degree of deviation between the values supplied and those expected for categorization [42]. The equation itself contains a description of itself.
In addition, the Mean Correlation Coefficient (MCC), Positive Predictive Value (PPV), and Negative Pdictive Value (NPV) are the assessment metrics that are appropriate for detecting abnormality as well as normalcy for the detection of retinal disease in high-rate populations.
The ROC curves illustrate the performance of each particular model as assessed by the proportion of true positives to false positives shown in Fig. 4. Here, it shows the AUROC score of 0.95 for the retinal abnormal datas categorization in the assessment that was conducted independently by the challenge organisers. When calculated the average between the macro-averaged AUROC and the map in order to get the score for the multi-label scoring, which came out to be 0.70. Thus it is clear that the ensemble model works well in AUROC rather than single deep learning models.

ROC curve of the proposed improved ensemble classifier.
The experiment is carried out for the image set, and the tabularization process begins using the findings collected from the experiments. The images that were used in the process of testing and training classification methods are included in the dataset. In order to perform an experiment, one must first go through training and test both normal and aberrant visuals. The categorization is carried out for both test and training images, and the results are used to enumerate performance metrics. The values represent the number of vessel pixels that have been evaluated as having a favourable outcome. The non-vessel pixels are mistakenly identified as if they were vessel pixels, resulting in a false positive. Pixels that are not vessels but are nonetheless recognised as negative make up true negatives. A false negative in which the pixels were incorrectly identified as non-vessel. The fundamental measurement as well as many performance indicators, such as specificity, precision, sensitivity, recall, f-score, and accuracy, is shown in Table 2.
Analysis of the proposed IEDL model performance with the existing CNN, KNN, and SVM classifiers with respect to precision, recall, F-Measure, Accuracy, specificity, sensitivity
Comparison of performance analysis for different database using different classifier algorithms
As seen from the table, the IEDL classifier technique that has been presented has an accuracy of 97.89% and provides a higher performance overall by taking into consideration other metrics as well. This is in comparison to the other methods that have been provided in Table 2. As a result, it can be observed that the IEDL system that was suggested has offered a superior retinal disease detection system, and this is visually shown in Fig. 5.

Performance metrics of the proposed system.
The correlation segmentation methods the usage of SAS was performed for the segmentation. After that, a hybrid feature extraction method using CNN, GLCM, LBP, and intensity features was used in order to get the best possible feature. After removing the unnecessary feature vectors, the remaining ones were divided into two categories: retinal blood vessels and non-vessels. The feature values were input into classifiers such as SVM, ELM, and Random Forest, which were used to differentiate between aberrant and normal blood arteries. The readily available technique in retinal disease detection has been connected with an effective performance in terms of specificity, sensitivity, accuracy, NPV, PPV, and Mathews Correlation Coefficient, which is superior to the methods that are currently in use.
The proposed IEDL model combines the four deep learning algorithms and a stacked logistic regression model as a bagging of 5 fold cross validation for predicting the anomalies and the above mentioned combination improves the performance of proposed model when compared with the other state-of-the-art models. This proposed model is applied to the RFMiD dataset and the performance parameters are analyzed and compared. Through the use of skip connections and the attention method, it is proved that the proposed model has achieved a higher level of performance jointly on both low and high-resolution fundus images. A higher accuracy rate of 97.78%, specificity rate of 97.23%, sensitivity of 96.45%, precision of 96.45%, and recall of 94.23% has been achieved which proves the betterment of the proposed method compared to all other existing models. In the future research, together with other modified deep learning algorithms and a transfer learning algorithm, the model will be applied to improve the performance metrics on low and high-resolution fundus for available datasets and in real time images.
