Abstract
Diabetes is among the major threats to human health, which is still incurable despite significant scientific and medical advances. There are ways to identify its presence before it seriously harms the body because it affects all bodily parts and organs. Diabetes also affects the retina of the eyes, rupturing blood vessels there and ultimately leading to irreversible blindness due to complications. This study suggests an enhanced activation function for diagnosing DR using fundus images automatically lowers processing time and loss. In this work, the proposed system design is constructed using a stacking-based Explainable AI model. The increased activation process within the various CNN models was trained and tested using the “Asia Pacific Tele-Ophthalmology Society 2019 Blindness Detection” (APTOS 2019) dataset. The pre-processing phase following data collection revealed data segmentation and augmentation. The deep learning prediction approach is used in the recommended work. Robust stacked model is generated by applying CNN, VGG, and custom-versed model methods. The classification of DR and the operation of explainable AI algorithms are explained in this recommended section. The LIME and SHAP approaches shape the stages of doctoral education. Based on the profile-rated, moderate, and severe phases of DR, the suggested figure illustrates the procedures. The accuracy of the current model was 76.78%. The study contributes by proposing a novel hybrid-stacked model architecture tailored for diabetic retinopathy detection. The future scope for this study could be the integration of additional explain ability techniques such as attention mechanisms or saliency maps to provide more detailed insights into the model's decision-making process.
Keywords
Introduction
Blindness can result from a disease known as diabetic retinopathy (DR), which affects the retinal blood vessels (Yang et al., 2023). Diabetes is currently one of the diseases with the swiftest rate of growth (Bhandari et al., 2023). Diabetes mellitus (DM), which affects roughly 382 million people globally now, is expected to affect 592 million people by 2025 (Shelke & Subasi, 2023). Data mining-based illness prediction techniques are being employed more frequently in actual disease diagnosis situations, which can considerably aid in predicting and making a diagnosis of DR. Both structured and unstructured visual data are present in the diabetic retina (Sahoo et al., 2023). Data mining techniques vary depending on the datasets employed in the DR task (Maaliw et al., 2023). While most DR patients are asymptomatic in the early stages, during this period neuronal retinal damage and clinically undetectable microvascular alterations progress (Yousefi, 2023). As a result, people with diabetes should undergo routine eye exams because prompt identification and treatment of the problem are crucial (Diao et al., 2023). Numerous more AI applications, including those for radiology, screening, and disease diagnosis, were utilized in the healthcare industry (Nguyen et al., 2023). Recent initiatives and studies have shown that “Deep Learning” (DL) in AI models, in particular, provides the most accurate results for uncovering hidden layers across several AI applications, particularly in the field of processing medical images (Ali, 2023). Rooted in deep learning models that can better the persistence of thought, categorise disorders, and enhance medical decision-making (Singh & Malhotra, 2023). High-performance identification and the introduction of AI techniques, such as machine learning and deep learning, have made it feasible to classify the retina to identify and segment the affected areas of the retina (ElSayed et al., 2023). For DR categorization and grading, machine learning techniques are frequently exploited (Alsuhibany & Abdel-Khalek, 2023). DR identification and evaluation using fundus images are now being done using deep learning (DL), according to the available literature in this field (Nazir et al., 2023). In DL, a subfield of AI, the high-level features are gradually extracted from the data using artificially generated neural networks with many processing layers. Clinicians can custom computers to help them identify and categorise DR (Fayyaz et al., 2023). Significant research has utilised several techniques, including support vector machines, k-NN classifiers, and convolutional neural networks (CNN) (Gu et al., 2023). The method described in this research enables ophthalmologists to identify four forms of DR in images of patients alongside healthy retinas. To find and classify potential lesions for DR calculation, the method benefits from a multi-class detection technique. Additionally, it can find the lesion's boundaries, the damaged area, and a heat map of the lesion. The research on motivated by the imperative need for transparent and interpretable AI systems in medical diagnostics, particularly in addressing diabetic retinopathy, a leading cause of blindness. By integrating explainable AI techniques into a hybrid-stacked model architecture, this study aims to provide clinicians with insights into the decision-making process of AI models, fostering trust and understanding in their predictions. The contributions of this research lie in the development of a novel model architecture tailored for diabetic retinopathy detection, the incorporation of explainable AI methods to enhance interpretability, and the empowerment of clinicians with valuable insights for more informed patient care. The remaining sections are arranged as follows: The literature review was described in Section 2, the study problem identification and motivation were described in Section 3, the proposed technique was described in Section 4, the results were discussed in Section 5, and the paper's conclusion was described in Section 6.
Literature Survey
SqueezeNet and Deep Convolutional Neural Network (DCNN) were combined in two rounds by Beevi (2023) to offer a novel multilevel severity categorization of DR. One of these disorders is DR is responsible for retinal lesions that impede vision. Recent deep learning techniques proposed for diabetic retinopathy classification are computationally expensive and time-consuming, often failing to extract effective nonlinear features and accurately classify different stages of the disease. The effectiveness of four various transfer learning models with each of the four independent data sets was examined by Mutawa et al. (2023). DR, classified as a silent disease due to its lack of early symptoms, poses a challenge in medical diagnosis due to the diverse retinal features present in different datasets. According to EyePACS test findings, DenseNet121 possesses the greatest accuracy and recall, both of which are 89.10%, whereas ODIR data reveal DenseNet121 to have an accuracy rate of 75.82% and a recall rate of 0.7743. In their suggested work, Saranya et al. (2023) want to generate a machine-learning algorithm for recognizing the initial phase of DR detection according to red lesions visible in the retinal images. Manual diagnosis of DR requires time-consuming and costly physical tests like visual acuity, pupil dilation, and tonometry, impacting patients’ time, cost, and effort. If DR is left untreated, it may eventually cause vision loss. In its initial stages, it may not manifest any symptoms or just cause minor vision problems. The suggested classifier by Weifeng and Luo (2023) is examined using three different dataset types: the Messidor, diaretdb1, and IDRiD datasets. Study develop a GO-DBN-WKELM algorithm to address the feature extraction problem for large-scale datasets and automatic diabetic retinopathy grading.
The outcome demonstrates that automated techniques are attracting more attention in the medical field due to their capacity to precisely examine clinical notes without a clear structure. A limited and unbalanced ensemble model of three CNN models was proposed by Inamullah et al. (2023) employing data diversity. Convolutional Neural Network Based Ensemble Model for Diabetic Retinopathy lies in the limited exploration of diverse datasets representing various demographic groups and disease severity levels, which may lead to biased or incomplete model training and evaluation. Evaluation of the results in the form of a matrix of confusion. (a) CM assesses the CNN-1 model's output (b) Class 1 and Class 3 achieve a significant outcome (c) The CNN-3 model balances out the differences that remain in CNN-2. (d) In the previous model, Class 4 results got worse. Alshayeji and Sindhu (2023) suggested a fully automated, comprehensive conventional machine learning system to carry out DR diagnosis and accurate disease-stage screening. Furthermore, DL classification utilising the pre-processed fundus images was carried out utilizing transfer learning methodology, and the outcomes were compared to those from other ML model results. By adopting a worldwide average pooling layer to extract features, Vij and Arora (2023) improved the DITL model aims to prevent overfitting and minimise losses by utilising leaky ReLU. The lack of exploration of transfer learning methods specifically tailored to address the challenges posed by imbalanced datasets in diabetic retinopathy severity classification.
The experimental results show the progress in diagnosis performance and updated DITL results in solid and trustworthy computer-assisted diagnosis systems to assist professionals in the appropriate identification of DR severity stages by minimising human errors and expenses. To assess fundus images and automatically identify between controls with no DR, mild DR, and moderate DR, severe DR, and proliferative DR, Sundari et al. (2023) suggested the deep CNN model. Early therapy is provided for the detection, protecting the patient from developing blindness. The need for further investigation into enhancing the model's interpretability and explain ability to facilitate better clinical decision-making and patient care. The proposed network by Khanna et al. (2023) is formed entirely from scratch, followed by the ensemble of the top five networks and the Convolutional Neural Network-Long Short Term Memory (CNN-LSTM) model. The limited exploration of explainable AI techniques to enhance the interpretability and transparency of the grading system's predictions. The fallouts of studies demonstrated that the presented model performs better in terms of categorization than many of the current models. To study and forecast diabetes from retinal input images, (Anil et al., 2022; Anil et al., 2023; Anil & Dayananda, 2018; Anil & Dayananda, 2023; Bahety et al., 2020; Nethravathi et al., 2020; Rajee et al., 2023) suggested a computer vision-based approach. The need for an efficient and optimized approach to enhance retinal images specifically tailored for diabetic retinopathy diagnosis while leveraging deep learning models. It reduces procedure complexity and produces better assessment metrics, which makes it appropriate for custom in the diagnosis of DR utilising retinal image analysis (Mahajan et al., 2022; Mohapatra et al., 2022; Pattanaik et al., 2022; Siddique, 2019; Siddique & Panda, 2019).
The objectives for this study were achieved through a systematic approach that involved several key steps (Chandan et al., 2023). Firstly, we conducted a comprehensive review of existing literature to understand the current landscape of diabetic retinopathy detection methods and identify gaps in explain ability. Next, we developed a hybrid-stacked model combining various deep learning techniques to enhance both accuracy and interpretability in diabetic retinopathy detection. We then collected and pre-processed a diverse dataset of retinal images to train and validate our model effectively. Through rigorous experimentation and validation, we demonstrated the model's effectiveness in accurately detecting diabetic retinopathy while providing interpretable explanations for its decisions (Chandra & Kumari 2023). Finally, we evaluated the model's performance against established benchmarks and real-world clinical scenarios, ensuring its practical relevance and reliability. Overall, our approach enabled us to achieve the objectives of developing an explainable AI model for diabetic retinopathy detection, enhancing both accuracy and interpretability in clinical applications (Ravish & Kumari, 2021).
Research Problem Definition and Motivation
Diabetes-related complications, such as Diabetic Retinopathy (DR), result from damage to the light-sensitive tissue at the back of the eye's blood vessels. Elevated sugar levels in the blood lead to the obstruction of the eye's blood vessels, cutting off their blood supply (Ravish & Singh, 2017). The initial stage, known as Non-proliferative DR, involves weakening blood vessel walls, causing bulges and occasional leaks into the retina. As the disease progresses, larger vessels dilate, resulting in uneven vascular diameter, blockage, and retinal expansion abnormal new blood vessels form, leading to leakage, scar tissue development, and potential complications like retinal detachment and glaucoma. The purpose of this study is to identify DR and automate severity assessment using high-resolution retinal images (Ravish et al., 2009). An artificial intelligence (AI) detection system categorizes the five categories of retinopathy defining DR, supporting Versed Model, Stacked Model, and Deep Learning-based DR detection models. Early detection is crucial as among the principal causes of blindness is DR. in working-age adults, emphasizing the significance of timely screening and intervention (Reddy et al., 2023). Retinal photography, often assisted by AI for identifying haemorrhagic and vascular lesions, primarily diagnoses severe forms of retinopathy after vision loss. However, it struggles with detecting inner retinal ischemic defects, mid-retinal vascular leakage, and abnormalities in the retinal pigment epithelium. Recent research emphasizes that retinal neurodegeneration, preceding micro antipathy, plays a pivotal role in the disease's progression. Before visible micropathy lesions, all retinal layers exhibit altered functions, challenging the notion of diabetic retinopathy solely as a microvascular complication. Limiting DR classification to this overlooks early retinal neuronal injury and hinders comprehensive understanding and treatment options (Reddy et al. 2022a). Similarly, peripheral neuropathy in diabetes is often not microvascular-related but stems from direct neuronal injury. Similarly, diabetic nephropathy's traditional markers result from cellular damage rather than microvascular issues. Acknowledging that DR affects not only micro vessels but also cells and tissues shaping the neuro-vascular system underscores the importance of studying broader retinal damage for improved clinical management (Reddy et al. 2022b).
Proposed Research Methodology
This section outlines several methods investigated to develop a strong and reliable framework for DR screening, in addition to a deep learning-based approach (Shankar et al. 2017). The dataset marks a turning point for study in this area, as was stated in the DR Detection competition at APTOS 2019. Though, the competition was focused on the problem of severity categorization, thus here analyse and provide binary judgements of refer ability. Although there is a pretty straightforward mapping between the two tasks from a human perspective, restricting machine learning to binary classification provides both theoretical and practical benefits (Shankar & Ravibabu 2019). The current study explained the diabetic eye disease identification utilizing the deep learning method, robust stacked model under the versed model, CNN, and VGG methods used in the discovery of DR. The LIME, SHAP is employed to classify the various phases of retinopathy due to diabetes.
The three stages of the suggested model are depicted in Figure 1 as pre-processing, prediction techniques, and classification. Data augmentation and data segmentation were seen during the pre-processing stage after the data were collected. The suggested work employs the deep learning prediction method. The custom-versed model, CNN, and VGG algorithms are applied to produce the robust stacked model. This suggested area explains how DR is classified and how explainable AI algorithms work. The phases of DR are moulded by the LIME and SHAP techniques. The pro01111posed figure shows the methods according to the profile-rated, moderate, and severe phases of DR.

Process Workflow Diagram.
The ‘Asia Pacific Tele-Ophthalmology Society 2019 Blindness Detection’ (APTOS 2019) dataset is the one employed for this experimental study. This dataset includes 3662 samples that were collected from a range of rural Indian residents. The Aravind Eye Hospital organised the data that structured the dataset. The lesion images were collected over an extended period and exposed to several conditions and environments. The samples were then examined by a group of skilled doctors who labelled them following the Diabetic Retinopathy International Clinical Disease Severity Scale known as ICDRSS recommendations. As with any other collection of data from the actual world, there will be some noise in both the images and the annotations. The photographs could have a lack of sharpness, blurriness, inappropriate exposure, or even all of these problems at once. The images were taken with several cameras from numerous clinics over an extended time, adding to the images’ diversity. The APTOS-2019 dataset comprises a total of 3662 samples, distributed across train, validation, and test datasets. Within the training set of data, comprising 2930 images, the class distribution reveals 48.9% for Non-Diabetic Retinopathy, 10.2% for Mild DR, 27.8% for Moderate DR, 8% for Severe DR, and 5.3% for Proliferative DR. The validation and test datasets each contain 366 images.
Exploratory Data Analysis
Exploratory data analysis (EDA), typically employs data visualisation techniques to highlight key characteristics, analyses and explore data sets. By figuring out the most efficient ways to change the information sources, researchers can more readily spot patterns, spot outliers, test hypotheses, and verify ideas using this strategy. Utilized the APTOS 2019 dataset for this analysis. Using EDA, discovered some bits of data about the dataset. ‘Train’, ‘Validation’, and ‘Test’ are three subsets that constitute the entire dataset. Most of the samples on the training dataset (48.9%) are non-diabetic Retinopathy samples. The last four DR severity levels formation of the dataset. ‘Moderate’ DR class labels composition 27.8% of the train dataset labels. ‘Mild’, ‘proliferative’, and’ severe’ conditions account for 10.2%, 8%, and 5.3% of the population, respectively.
Image Pre-Processing
Various pre-processing techniques have been employed for the study to identify DR. Future-proof, robust DR detector that explains the stack and verse models was evaluated using some popular methodology. Data segmentation and data augmentation techniques are common pre-processing steps that help to filter the data for DR detection study.
Figure 2 displays the obtained dataset screening samples with explanations of the pre-processing procedure, including image extraction, resizing, and image pre-processing. The pre-processing and post-processing above the given data are shown, and the segmentation is described using the shown train, validation, and test values.

Pre-Processing and Data Segmentation.
The complete set of data for DR utilized for the study is broken down into three sub-datasets: train, validation, and test. The training dataset had 2930 images, the validation dataset had 366 images, and the test dataset had 366 images. To reduce noise and asymmetry, only 80% of the train and validation datasets in this experiment. Following the data augmentation process, the training dataset consists of 2344 valid image filenames classified into five categories. Similar to this, the test dataset has 366 and 73 confirmed image filenames, respectively. To be utilised as proper images afterwards, they are altered using extension codes. Sizes for the batch and images are maintained at 16 and 224, respectively.
Data-Augmentation
Data augmentation techniques play a critical role in enhancing the robustness and generalization ability of machine learning models, particularly in scenarios where additional data is required to achieve desired outcomes. These techniques help mitigate overfitting by exposing the model to a wider range of data variations, such as fuzziness, low quality, and deformation, commonly encountered in real-world scenarios. In our study employed various augmentation methods using Keras’ ImageDataGenerator class, which dynamically generates augmented data during model training without the need to store the augmented images separately. By configuring the ImageDataGenerator with different parameters, we expanded the “train,” “test,” and “validation” subsets of the dataset, thereby increasing the overall dataset size for model development. It's worth noting that different augmentation parameters were applied to the “train” and “validation” subsets to ensure that the model remains unfamiliar with the test image patterns, thus facilitating better generalization performance. This approach not only ensures clarity and reproducibility but also enhances the model's ability to effectively learn from diverse data distributions encountered during training.
Diabetic Retinopathy Detection Model
The swift progress of artificial intelligence has spurred a surge of interest in advanced deep-learning techniques for medical image processing. Neural networks, fundamental to deep learning algorithms, mimic the signalling of organic neurons. Comprising layers of nodes—input, hidden, and output artificial neural networks operate based on each node's load and threshold, influencing its responsiveness to input. Nodes enable data transmission to subsequent tiers when meeting specific criteria, fostering information flow within the network and emphasizing the intricacies of AI in medical imaging. Each node can be thought of as a linear regression system, replete with its own set of inputs, weighting parameters (
In this framework, input values represented by the vector x are crucial for the neural network's functioning. Upon selecting input neurons, weights are assigned, indicating the relative importance of each characteristic. Heavier weights hold greater significance in influencing the final result. Each input undergoes multiplication by its respective weight, and its outcomes are aggregated. The activation function then processes this output, determining its nature based on a predefined threshold. Upon activation, the node transmits data to the subsequent layer, integrating the output into the input of the next neuron. This neural architecture serves as the foundation, particularly in constructing Convolutional Neural Networks for image classification. Stacking convolutional layers becomes imperative for discerning intricate patterns. The suggested model incorporates various CNN architectures, like VGG or ResNet, allowing flexibility in designing and combining predictive models. Experimenting with different combinations, such as feature concatenation or weighted averages, enhances the model's effectiveness, subject to performance evaluation on the validation set.
To enhance performance, neural networks demand training data, yet precise calibration is crucial for their utility. Addressing underfitting and overfitting challenges, a unique stacking model is devised. Model stacking employs multiple systems for predictions within a meta-level framework, proving effective with diverse learning models. There's no singular “best” stacking method; complexity increases with additional levels, weights, averages, etc. In an experimental study, a new model is constructed by stacking three existing ones—CNN, CNN-VGG hybrid, and a customized model. Keras’ average function blends these models to predict DR and its absence in input image data, forming a robust ensemble meta-model.
Versed Model
The model's design was built using the Keras Sequential System. Five convolution layers, two dropout layers, two max-pooling layers, and six dense layers construct the model. Gaussian dropouts are the earliest dropouts. It closely resembles Gaussian noise. It simply adds one mean to a set of random normal values. All of the input components are exploited in these processes. The traditional dropout changes to 0 for some input elements while scaling the others. One started by developing a unique model based on layers of neural networks.
The unique Versed Model Structure is presented in Figure 3. Each convolution has a “real” activation function, as do the five dense layers. Piecewise linear functions, such as “real,” clearly return the input value if it is positive or 0 otherwise. Equation (3) illustrates the ‘relu's mathematical formula.

Custom Versed Model Architecture.
The input value, in this case, is
The 9-layered full convolutional model was generated as a second model. In this CNN model, there are nine convolutional layers, one layer of global average pooling, and one activation layer. Convolutional neural networks, a sort of feed-forward neural network commonly applied for image processing, custom a grid-like structure to evaluate input. Figure 4 shows the entire CNN model that was described in this article.

Complete CNN Model.
The entire CNN model may be seen in Figure 4. The convolution layer is the most significant structural element of CNN. It is in charge of carrying the bulk of the network's operational load. After the convolution layers, global average pooling was employed. The completely connected stages in conventional CNNs are intended to be replaced by the pooling technique referred to as global average pooling. This layer of CNN aims to make a map from features that have been retrieved for every category that has been identified through the method of classification. Following that, one utilized an activation layer.
For Visual Geometry Group, the abbreviation VGG, and its design is an ordinary deep CNN with many layers. The number of convolutional layers is what is meant by the term “deep” in this context, and VGG-16 has 16 of them. VGG Nets’ design originates from the essential properties of CNN, which serve as their cornerstone.
Two fully connected (dense) layers, five max-pooling layers, along 13 convolutional layers constitution the model described in this work. 224224 represents the model's input. ‘real’ is an activation mechanism of the VGG net's subsequent layers. Figure 5 in the section below shows the construction of this specific model. The filters one has available to us increase by a ratio of two each time one moves up the stacks of a convolution layer.

VGG + CNN Model Design.
The recently developed discipline of explainable artificial intelligence (XAI) makes an effort to explain Machine Learning (ML) models in terms that people can comprehend. To create an elucidable AI model, one has two options. First, one may define every image. Second, by combining LIME and SHAP, one can build a comprehensive explanation system for the images. Here, utilising the prediction data gathered from a suggested model, developed two explanatory systems.
Explainer 1: LIME
LIME was utilised in the Explainer 1 system design. The test instance and an explainable model are the inputs for Explainer 1. A well-liked method for deriving meaning from machine learning model predictions is called Local Interpretable Model-agnostic Explanations, particularly in the context of image categorization. Explainer 1 is composed of three sections: segment creation, boundary creation, and heat map generation. Start by separating the supplied image into informative sections or segments. Any segmentation algorithm, including superpixel segmentation, can be exploited.
In dissecting the overall image, each section plays a distinct role. Constructing boundaries around these segments is essential for isolating influential parts impacting predictions. Outlining each segment visually distinguishes them, aiding in precise analysis. Leveraging pre-trained models or training a machine learning model, like utilizing a convolutional neural network provided datasets is crucial. Employing Custom LIME facilitates describing how the model predicts outcomes. Altering segments by removal or obscuration while keeping the rest unchanged aids in tracking changes. Predicting for disturbed images and contrasting results with the original calculates each segment's relevance. Utilizing a heat map visually signifies segment contributions, with colour intensity indicating significance.
Explainer 2: SHAP
To develop and implement Explainer 2, SHAP was harnessed, utilizing data from the stacking model's output. The “blur” option identified the top 4 predicted classes for test images. Employing SHAP, a computational framework rooted in cooperative behavioural economics, Shapley values were computed to elucidate each feature's contribution to predictions. Addressing the intricacies of machine and deep learning, Shapley's values offer insights into how components impact forecasting. This method quantifies the impact of each attribute by averaging score disparities between games with and without the factor, adjusted for frequency. Equation (4) depicts the mathematical formulation for calculating the Shapley value (
Here, the ‘
DRNet13 is a deep neural model designed for detecting DR in Explainable Artificial Intelligence (AI) systems. This model is predicated on a hybrid stacked model architecture, which combines multiple deep learning techniques. It aims to provide accurate and interpretable results for DR detection. The model's architecture includes several convolutional neural network layers followed by recurrent neural network layers. The DRNet13 model utilizes a combination of deep learning methodologies to achieve a high level of accuracy in DR detection while providing an explainable AI solution for healthcare professionals. The retinal images are fed into the DRNet13 model
Initiates 64-filter feature extraction
Enhance feature detection with 128 filters
Enhance feature detection with 256 filters
Normalizes the outputs from
Transforms the 3D tensor from the normalization layer into a 1D vector
Utilizes the flattened vector for initial classification, reducing dimensions to 1024 nodes. Here,
Abruptly disappears from neurons to prevent overfitting while maintaining the dimensionally. Here,
Continues the classification process, decreasing nodes from 1024 to 512 for the completely linked Layer 2
The Output Layer determines the class probabilities using a softmax function. Here,
Its robust architecture and effective training contribute to the accurate identification of DR, offering potential advancements in early diagnosis and treatment. The model's performance underscores its significance as a valuable tool in the realm of medical image analysis for improved healthcare outcomes.
Table 1 displays preprocessed retinal images for assessing the probability of Diabetic Retinopathy using CNN. The CNN comprises multiple layers, starting with a Convolutional layer employing 64 filters and ReLU activation for detecting low-level features. A subsequent Pooling layer reduces feature map dimensions to 112 × 112 × 64, aiding computational efficiency. Another Convolutional layer with 256 filters and ReLU activation seeks complex features, followed by normalization for better model training. The flattened output is processed through Fully Connected (FC) layers, reducing dimensionality and learning abstract representations. To prevent overfitting, a Dropout layer is applied, and the Softmax function ensures interpretable class probabilities, summing to 1.
DRNet13 a Deep Neural Model for DR Detection.
The results to identify DR are shown in this section. The proposed work demonstrates a DR explanation and explains one to generate LIME segments, boundaries, and heat maps. The SHAP approach is also demonstrated in this section. Types of DR and their severity levels required data production parameters. This section compares the accuracy of various models and compares various models with F1 score, precision and also recall for various classes. Although no image processing or analysis was done for this investigation, it was extremely difficult to get high accuracy rates with such a minimum number of characteristics.
The parameters applied for the Image Data Generator function for the “train” and “validation” datasets are displayed in Table 2. The scaling factor is this. Each pixel in the supplied image is multiplied using this factor. Here, we specified rescaling as “1./255”. Since 255 is the maximum pixel value that may be represented, scaling 1./255 changes all pixel values from [0,255] to [0,1]. A shear transition causes an image to become distorted. In contrast to rotation, the shear conversion only locks one axis while stretching an image on the shear tilt. This results in an unnoticeable lengthening of the image when rotated. The inclination is expressed in degrees by the shear range parameter.
Parameters Required for Data Generation.
Parameters Required for Data Generation.
By adjusting the zoom range parameter, a random zoom factor is produced. Zoom levels below 1.0 result in magnification, while zoom levels above 1.0 in demagnification. One exploited a 0.1 zoom range in this case. When the rotation range parameter is supplied, the generated image data is freely rotated by a tilt that lies somewhere between the region of rotation range and -rotation range. A value for the brightness shift can be randomly selected from a range of values by using the brightness range variable. A vividness of 0.0 denotes complete darkness, whereas a brightness of 1.0 denotes the maximum amount of light that is possible. Here, the range of brightness is [0.3, 1]. Four filling options are available. “Constant”, “nearest”, “reflect”, or “wrap” are their attributes. The “nearest” is typically utilized by default. The mode that was chosen is exploited to insert points that are positioned outside the inputs’ boundaries. Here, the generator underwent two distinct kinds of flips. The horizontal flip is one, and the vertical flip is the other. Using these parameters and the provided axis, the input images are randomly flipped.
The Table 3 presents a comparison of diabetic retinopathy severity classification models, categorizing severity levels ranging from no DR to proliferative DR based on their corresponding diabetic retinopathy types and the number of data samples. Additionally, key performance metrics such as ROC (Receiver Operating Characteristic), precision, and recall scores are provided for each severity level, reflecting the models’ ability to accurately classify diabetic retinopathy cases. These metrics offer valuable insights into the effectiveness of the classification models across different severity levels, aiding in the assessment of their overall performance and reliability in clinical settings.
Comparison of Diabetic Retinopathy Severity Classification Models.
Figures 6 (a) and (b) show a sample of projected data along with the original images that go with it. The LIME technique aims to elucidate critical image regions influencing a system's photo categorization, providing insights into its decision-making process. Focusing on these areas unveils how the model interprets images, enhancing transparency in complex feature-model interactions. LIME becomes vital for comprehending and validating image classification outcomes, utilizing heatmaps to select interest areas. The generated heatmap visualizes pixel importance, employing a colour gradient for clarity. Proper forecasting relies on the model's aptitude to identify crucial regions. To enhance the explanation, a heatmap displaying relative component importance can be produced, refining our understanding of the intricate workings of image classification systems.

Diabetic Retinopathy Explainer Lime. (a) LIME: Segments Creation; (b) LIME: Boundary Creation; (c) LIME: Heatmap Creation.
Figure 7 displays the plot from five different images, explaining five results (our five levels of DR 0-5). In the model's interpretability process, the influence of individual pixels is revealed through a color-coded scheme blue pixels decrease the model's output, while red pixels amplify it. The input images, depicted on the left, appear dark due to the prevalence of pixels exceeding 0. These images also serve as translucent grayscale backgrounds behind corresponding explanations. The total of the SHAP values encapsulates the disparity between anticipated and actual model outputs. Notably, labels with a low-confidence forecast exhibit a pink area equivalent to that of the correct label. This visually intuitive representation allows for a nuanced understanding of pixel impact, aiding in discerning the model's decision-making process and providing insights into areas of uncertainty, thus contributing to the transparency and interpretability of the model's predictions.

Diabetic Retinopathy Explainer Shap Result.
The comparison analysis part is clarified to compare the different models of accuracy and the various models of CNN + VGG, Robust Stacked, CNN, and Versed models show the several accuracy levels with exhibit the percentage points. The comparison is class 0,1,2,3 and 4 these all are employed to predict the various Models with F1 score, Precision, and also Recall this prediction shows the levels of graphical values. Below is an impression of the four different models’ comparative evaluation. In this work, various traditional assessment metrics are employed to evaluate various model efficiency. Here, one employed four of these characteristics. Accuracy, precision, recall then f1-score are the four.
The recall value for a capable classification model should ideally be high. Therefore, for a good classification model, accuracy and recall should both be equal to 1, which also means that FP and FN should both be equal to zero. One needs a measure that takes precision and recall into account simultaneously because of this.
The accuracy contrast of various predictive models is presented in Table 4. The accuracy of the specially formed versed model is 71.31%. The accurateness of the CNN and CNN-VGG net models was 73.50% and 78.14%, respectively. Conversely, though, the accuracy of the stacked model, which was lesser than the CNN-VGG hybrid model, was 76.78%. The optimum indicator to custom may not be accurate if the dataset is not spread equally. Consequently, one might not be capable of identifying the most effective strategy between these four using accuracy. The accuracies offer good values for the versed, CNN, and VGG net models, however, the model efficiency is very low. This suggests that these models occasionally make incorrect projections appear to be correct ones. One employed different evaluation parameters as a result.
Accuracy Comparison of Different Models.
Figure 8 compares the performance of these two models using the input data. Figure 8 (a) shows the graphical representations that demonstrate that nearly all models accurately forecast class 0, or “no-DR,” in maximum cases. The highest precision value, 0.9650, was attained by the VGG net model. The outcomes from the other two models for class 0 were also encouraging. Figure 8(b) shows the f1-score and recall have lower values than the precision here, indicating that CNN fared poorly when it came to class 1 prediction. A model with poor recall but high accuracy and precision will provide fewer results, but its predictions will be more probable to be right when compared to the train labels.

Comparison of Various Models with F1 Score, Precision, and also Recall for Class 0, Class1.
Figure 9 shows the comparison of several models for F1 score, Precision, and Recall. Figure 9 (b and c) explores that Class 3 and Class 4 could not be predicted by CNN or versed models, respectively. Classes 3 and 4 present challenges for CNN and similar models due to their nuanced characteristics, exhibiting intricate pathological variations that traditional models struggle to discern accurately. This complexity hinders precise predictions and classifications of specific conditions. In Figure 9 (a), four DR detection methods and their graphical results for Class 2 are displayed. Applied to test data, CNN-VGG net and robust stacked models showed good precision values, both achieving 0.40 precision. However, the other two models failed to predict effectively. Class 4 faced a similar issue, with only the stacked model successfully identifying the class, while the other three models exhibited poor performance. Despite lower accuracy than the CNN-VGG hybrid model, the stacked model is considered reliable for detecting retinopathy due to its ability to tolerate both underfitting and overfitting, showcasing good precision and recall.

Comparison of Various Models with F1 Score, Precision, and also Recall for Class 2, Class 3, and Class.
Table 5 displays the accuracy of each class in Detecting Diabetic Retinopathy Through the CNN-VGG model. The model exhibits varying accuracies across different classes, demonstrating its proficiency in distinguishing between distinct severity levels of Diabetic Retinopathy. Class 0 achieves the highest accuracy at 82.2%, indicating a robust performance in identifying cases with no diabetic retinopathy. Classes 1 and 2 also demonstrate notable accuracies of 79.3% and 76.8%, respectively. However, Classes 3 and 4 exhibit slightly lower accuracies at 75.1% and 77.2%, suggesting potential areas for model refinement in identifying mild and moderate retinopathy cases.
Diabetic retinopathy (DR) poses a severe threat to individuals with diabetes, often leading to irreversible blindness by damaging retinal blood vessels also causing fluid leakage, resulting in vision impairment. This study focuses on enhancing the interpretability and accuracy of a diabetic retinopathy detection system. A unique method was created to determine five phases of diabetic retinal degeneration, supported by an explainable AI model designed for border, segment, and histogram determination in retinal images. The combined model achieved a precision of 76.78%, addressing concerns of over- or under-fitting and improving prediction effectiveness. The highest precision value, 0.9650, was attained by the VGG net model. Class 0 performs well in detecting instances without diabetic retinopathy, with the greatest accuracy of 82.2%. Moreover, Classes 1 and 2 show noteworthy accuracy of 79.3% and 76.8%, respectively.
Employing a stacked model, which combines multiple AI models for enhanced performance, and incorporating explainable AI techniques, the study aims to provide transparent insights into the decision-making process of the detection system. By elucidating feature importance, interactions, and decision rules, the model becomes more trustworthy and acceptable to medical professionals, fostering collaboration between AI algorithms and healthcare practitioners in diagnosing and treating diabetic retinopathy. In summary, a stacked model coupled with explainable AI enhances interpretability, fostering trust and facilitating collaborative advancements with the assistance of diabetic retinopathy within the medical field.
Accuracy of each Class.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
