Abstract
Diabetic Retinopathy (DR) is a leading cause of vision impairment worldwide. Early detection through automated analysis of retinal images is critical. Nonetheless, the centralized training of deep learning models presents privacy concerns owing to the sensitive nature of medical data. This paper investigates the use of the Federated Proximal algorithm for collaborative DR classification across distributed data sources, addressing the challenges of data heterogeneity. We enhance Federated Proximal (FedProx) with momentum-based optimization techniques including Adam, Shuffling SGD, SGD momentum and RMSProp in order to accelerate convergence and improve generalization. Experimental results demonstrate that incorporating momentum techniques with Federated Proximal significantly improves accuracy and reduces loss on diabetic retinopathy dataset with shuffling SGD achieving the best performance with accuracy of 92.63% and 0.22 loss. These results highlight the potential of integrating proximal regularization and momentum strategies for privacy preserving, robust and effective DR detection in federated learning environments.
Introduction
Automated technologies have been shown to provide significant gains in accuracy when it comes to the detection of diabetic retinopathy, whilst older approaches have been shown to be significantly less dependable. Computer vision, artificial intelligence, machine learning, and deep learning are some of the cutting-edge technologies that are included into these systems. The purpose of these systems is to improve performance parameters such as precision and accuracy of the operations. Automated grading methods can perform processing on vast amounts of data exceptionally quickly and with a high degree of accuracy.
Convolutional neural networks (CNNs) and capsule networks are two examples of the advanced techniques that have been incorporated by the researchers. These techniques have achieved remarkable performance in terms of accuracy and other parameters, which has enabled them to detect and classify diabetic retinopathy lesions from fundus images. An example of this would be a study that utilized a capsule network in their research work and reported an accuracy of 97.98% for the diagnosis of healthy retinas and approximately 98.64% for the stages of diabetic retinopathy (Kalyani et al., 2023).
The early identification of diabetic retinopathy is of utmost importance because of the rapid advancement of the condition towards later stages, which frequently results in blindness in diabetes patients, particularly in developed nations (Atwany et al., 2022). As a result of the degenerative nature of this condition, which is caused by high blood glucose levels that cause damage to the retina, patients’ vision can be preserved by early detection measures. In the beginning stages of the disease, when symptoms are less severe, the manual method of diagnosis is particularly expensive because it involves a significant amount of resources and is extremely challenging. It has been demonstrated that automated image analysis that makes use of a variety of deep learning techniques is more effective (Hemanth et al., 2018).
The advancements in telemedicine are tackling a variety of accessibility issues in the screening of diabetic retinopathy, which has resulted in increased screening rates and decreased visual loss. There are several programs that have been developed to enable primary care settings to do efficient screening of the retina using telemedicine. These programs can detect diabetic retinopathy, which can be a vision-threatening condition, and they also increase the number of patients who go in for annual retinal examinations (Cuadros & Bresnick, 2009). In spite of this, there are obstacles to the widespread adoption of these initiatives that are caused by financial constraints (Zimmer-Galler et al., 2015).
With regard to the administration of data, centralized databases provide a number of issues, despite the fact that they offer major benefits for disaster recovery monitoring. A Pittsburgh system, for instance, revealed how centralized data of patients might considerably reduce the number of errors that occur during blood transfusions (Macivor et al., 2009). Centralization is a strategy that helps in the prevention of errors. There are difficulties associated with privacy and security in centralized systems. Increased data traceability and security against unwanted modifications are two benefits that can be gained from integrating blockchain technology with conventional wireless sensor networks. Blockchain technologies improve the systems by offering data integrity, privacy, and data longevity. Traditional WSN models include problems such as mistake recurrence and a lack of traceability on their end (Ramasamy et al., 2021).
When it comes to centralized medical data, one of the most significant challenges is striking a balance between preserving security and privacy standards while also fostering cooperation and data exchange among different entities. The challenges that are presented by intrusion detection systems are evident. Centralized models that are utilized for the management of security data have the potential to put systems at risk if they are not safe and structured in the appropriate manner (Frincke, 2000). Because of the constraints imposed by decentralized data sharing, the majority of businesses struggle with the human labor that is necessary to defend against malicious intent.
Reliable frameworks in machine learning are required for automated data analysis for secure centralized systems (Azam et al., 2023). This is necessary in order to effectively manage potential risks. It is important that these systems be constructed in such a way that they are able to manage massive datasets and enable easy interaction with pre-existing artificial intelligence frameworks. Additionally, they should guarantee data security by monitoring and analyzing it in real time.
The management of diseases and the early detection of diabetic retinopathy both bring substantial benefits to the healthcare domain. Centralization of data in healthcare also provides considerable benefits. On the other hand, they face a great deal of difficulty, particularly with regard to the privacy and security of their data. It appears that an integrated system that makes use of recent developments in artificial intelligence, telemedicine, and data exchange frameworks like blockchain technology would produce promising outcomes.
Federated Learning in Healthcare
A decentralized machine learning technique called federated learning allows multiple clients to train an algorithm collectively without sharing the underlying data. This is achieved via a collaborative training method. This is pertinent in the healthcare industry, which places a high priority on the privacy and security of patient information. Using this method, the algorithm is trained across several servers, each of which maintains its own local data without sharing it with any other servers. In order to maintain the confidentiality of sensitive patient information while also contributing to the overall effectiveness of the model, only model updates are provided for sharing.
The application of Federated Learning in the field of medical imaging offers a great deal of possible applications. A significant amount of sensitive information is involved in medical imaging, and this information is often stored in separate datasets across a number of different healthcare organizations. These institutions are able to work together to construct generalizable models by utilizing federated learning, which allows them to share model parameters rather than raw data. For a variety of medical imaging applications, such as illness detection and diagnosis, where substantial and varied datasets are used to increase the accuracy of the model, this is an extremely important consideration.
In the realm of healthcare, there is an increasing movement toward the Internet of Medical Things (IoMT), and federated learning is a technology that works well with this trend. Privacy and security of data are incredibly important concerns (Al Khatib et al., 2024). By storing data locally, federated learning provides a solution to these difficulties. Organizations are able to reap the benefits of a shared model that has been trained on a wide range of data thanks to this. While scalability is a known challenge, established works (Huba et al., 2022) have demonstrated scalable federated learning with reduced communication and computation overhead under diverse conditions. Additionally, successful federated learning deployment in financial domain (Azzedin et al., 2023; Nevrataki et al., 2023) confirm its effectiveness in privacy sensitive environment. In (Bhulakshmi & Rajput, 2024), authors have used Federated differential optimization strategy in diabetic retinopathy detection. New technologies, like edge computing and fog computing, which are helpful in real-time medical imaging applications, are also contributing to the improvement of federated learning (Shah et al., 2024).
In conclusion, Federated Learning offers a very promising approach that makes use of medical imaging data that is present across various institutions without even potentially compromising the privacy of patients. It is now much simpler to create models that are more precise and sturdy, which can be of assistance in imaging, making it a significant advance in the field of healthcare in the future.
Why Momentum Techniques?
Within the realm of federated learning, momentum approaches hold significant importance due to their capacity to improve optimization and increase the speed at which convergence occurs. In the context of federated learning, where data privacy and decentralization present some especially challenging difficulties, these strategies are of the utmost relevance. The importance of optimization in Federated Learning cannot be overstated due to the fact that it has a direct influence on the efficiency and efficacy of machine learning models that have been trained from a variety of data sources.
A significant number of research voids have been discovered in the existing body of literature. Concerning the effective application of these methods in a federated learning environment, there is a recurrent theme of a lack of full comprehension. When it comes to heterogeneous environments, the strategies involve selecting the optimal parameters and configurations to use. With regard to the robustness of federated learning systems, particularly in the face of adversarial attacks or disruptions in communication networks, there is a large gap that has not been adequately explored.
Further some empirical investigations are required in order to examine the effectiveness of different momentum strategies across diverse federated setups. It is possible that the integration of momentum approaches with other optimization procedures could result in solutions that are more robust, hence improving the performance of federated learning. Federated learning also suffers from client drift due to heterogenous local data which can slow convergence. Momentum techniques help stabilize local updates by accumulating past gradients, smoothing the trajectory towards the global optimum.
Contributions
Carried out a comparative analysis of momentum based techniques for Diabetic Retinopathy Dataset.
Provided novel insights into effectiveness of momentum techniques in federated optimization.
Conducted a systematic comparison of momentum based optimization in a federated environment.
Related Work
Federated Learning for Medical Image Analysis
Within the realm of medical image analysis, Federated Learning has emerged as a new and developing approach. When it comes to data privacy and security, it specifically addresses the difficulties that are presented. It is essential to make use of strategies that are based on momentum in order to improve the effectiveness and performance of FL frameworks in this particular domain.
Federated Learning allows multiple medical institutions to collaboratively train a model without sharing sensitive data of patient thus it preserves privacy. This is very much relevant in medical domain image analysis where data is distributed across various healthcare facilities. The federated learning approach leverages local models parameters and in turn contributes to a global model ensuring data remains decentralized (Zhou et al., 2024).
The heterogeneity of data, which refers to the non-independent and identically distributed nature of data across numerous clients, is one of the most significant challenges that Federated Learning for medical imaging must overcome. Failure to effectively address the heterogeneous nature of the data may adversely impact the model's performance. To enhance the effectiveness of federated learning in medical imaging, various distinctive strategies have been developed alongside optimization techniques. While keeping the benefits of a federated global model, customized federated learning makes it possible to train a model in a manner that is tailored to the client, hence boosting the model's performance (Wicaksana et al., 2022). Some other approaches incorporate causal learning, blockchain and explainable AI to enhance model efficiency and interpretability (Mu et al., 2024). Methods that are based on reinforcement learning have been proposed in order to dynamically alter hyperparameters that are specific to the client. This solves the problems of instability that are caused by heterogeneous data distributions (Guo et al., 2022). These improvements demonstrate that the incorporation of momentum-based techniques and other sophisticated methods can make a substantial contribution to the process of overcoming problems in federated learning for medical image processing.
Momentum-Based Optimizers in Centralized and FL Settings
There has been a significant rise in the significance of momentum-based optimizers in the process of improving learning algorithms in both centralized and federated learning environments. Momentum optimizers, such as Nesterov Accelerated Gradient (NAG), are utilized in centralized contexts with the purpose of enhancing the convergence speed and robustness of model training. Nevertheless, the application of these optimizers in federated learning, where data is dispersed and privacy is a major concern, presents a distinct set of obstacles and potential.
Conventional optimizers might not be able to meet the requirements of federated learning because of its inherent characteristics, which include the heterogeneity of data and the overhead of communication. The incorporation of momentum-based strategies into federated learning frameworks has been the focus of recent research that has targeted this issue. One example of a method that exhibits the utilization of NAG among distributed workers and the aggregator is the Federated NAG approach. When compared to traditional approaches, the overall amount of time spent training has been cut by 11–70%, and the accuracy of learning has enhanced by 3–24 percent (Yang et al., 2022).
Within the context of a Federated learning environment, the momentum federated learning approach implements momentum gradient descent in the local update phases. In comparison to the conventional federated learning methods, this method not only establishes global convergence but also speeds up the convergence process, as demonstrated by the tests presented in (Liu et al., 2020). Momentum based approaches improves representation learning and clustering in unsupervised learning scenarios as demonstrated by the Federated Momentum Contrastive Clustering framework. Additionally, it demonstrates flexibility to centralized settings by employing momentum contrastive clustering, which achieves a high level of accuracy on datasets (Miao & Koyuncu, 2024).
When momentum-based optimizers are incorporated into federated learning, these developments represent substantial strides toward improving the efficiency and scalability of the process, despite the fact that challenges still exist. One of the most important areas for continuing study is the maintenance of model accuracy across a wide range of data. Other important concerns include striking a balance between the speed of convergence and the efficiency of communication.
DR Detection Models
An innovative advance in the field of medical image processing is the utilization of diabetic retinopathy detection models which include CNNs and EfficientNet inside a federated learning environment. These models are effective because they improve accuracy while also protecting individuals’ privacy. The processing of visual data, such as fundus images, by CNNs makes them exceptionally well-suited for the identification of DR. This allows for more accurate and exact diagnosis. This is of the utmost importance for the early diagnosis of diabetic retinopathy, which can help avert blindness in diabetic patients. It has been possible to attain a high level of accuracy by developing a number of CNN-based models that have enhanced activation functions. The accuracy of a model called ResNet152 was 99.41% when it was applied to the Kaggle dataset (Bhimavarapu & Battineni, 2023).
Within the framework of federated learning, it offers considerable advantages by making it possible to train models across a number of different distributed datasets. This technique ensures that sensitive medical data remains localized within each participating institution, thereby enhancing the global model through collaborative endeavors. In particular, when it comes to addressing privacy problems connected with centralized patient data, numerous studies have demonstrated that integrating CNN architecture with FL frameworks is both useful and feasible (Zhou et al., 2024).
Many federated learning environments have incorporated EfficientNet, which is well-known for its scalable architecture. Both in terms of retaining data heterogeneity and obtaining high accuracy in medical image analysis, EfficientNet-B0 using federated learning methods has demonstrated superior performance compared to other models such as ResNet. In addition, federated learning frameworks extend outside the medical area to utilize the benefits of CNN in a variety of applications. These frameworks make it possible to conduct robust training across a wide variety of data sources within the context of the requirement for data centralization, which helps to maintain privacy while also improving model generalization.
Limitations in Current FL Optimization Strategies
Federated learning is a collaborative method that allows several clients to train a shared model without transferring raw data, hence protecting the confidentiality of the data. On the other hand, this strategy presents a number of important challenges in terms of optimization strategies. The reliance on centralized optimization approaches, which might result in a single point of failure, is one of the most significant limitations. The use of decentralized approaches presents challenges in terms of the synchronization of models, the effectiveness of communication, and the management of data distributions. When using a decentralized approach, it is necessary to adapt the communication protocol in order to guarantee that nodes are able to work together without the need for a central coordinator (Gao et al., 2023).
The heterogeneity that results from the presence of non-id data across a variety of devices is another challenge that Federated Learning needs to deal with. To ensure that the trained model performs well on all different kinds of data distributions is made more challenging by this factor. By clustering clients according to similarities in data distribution, an approach that is suggested in (Wolfrath et al., 2022) and given the moniker Heterogenity-Aware Clustered Client Selection is intended to increase convergence. When it comes to predicting data distributions, these methods still need to handle concerns around privacy, and they require effective techniques to manage device dropout problems.
In federated systems, one of the most significant issues is the preservation of privacy, which requires the system to strike a compromise between the correctness of the model and the privacy of the data. Several methods, including multiparty computation and privacy-preserving aggregation, have been included into federated systems in order to achieve computation that is both efficient and secure.
Another significant challenge is the communication efficiency between the communication nodes. Since there is regular communication of model updates, the system needs to ensure that it does not significantly reduce the training efficiency. One of the key focus is to develop efficient communication architectures (Shanmugam et al., 2023). While federated learning offers a promising approach to distributed model training, we also need to address issues of centralization, client selection strategies, privacy preservation and communication efficiency. Table 1 presents gaps in existing work and our contributions.
Proposed Methodology
Problem Definition
The purpose of this research is to investigate the process of automating the classification of DR from a collection of fundus photos by utilizing the Federated Proximal (FedProx) algorithm with momentum techniques for federated learning. The following configuration is used: Configuration: The system is configured with five clients and a central server that is responsible for coordinating the training process. Data Distribution and Model Architecture: A subset of the dataset is held by each client, according to the data distribution and model design. For the purpose of DR classification, a CNN is utilized as the global model. Proliferative, Mild, Moderate, and Severe are some of the prediction classes that are utilized. Techniques utilized for optimization: Momentum-based optimization techniques are incorporated into the research, such as the Adam optimizer, the RMS prop optimizer, the SGD momentum optimizer, and the Shuffling SGD optimizer.
FedProx Algorithm and Mathematical Modeling
FedProx Figure 1 and 2 is an extension of FedAvg that differs from FedAvg in that it modifies the local client objective. Within the context of dealing with heterogeneous data across a variety of clients, it is designed to successfully tackle the issues that are connected with distributed machine learning. Federated Proximal (FedProx) is an extension of FedAvg that modifies the objective of the local client. As part of the local training process, each client is responsible for finding a solution to the following optimization problem rather than minimizing the typical empirical risk. There is a proximal term that serves as a regularizer, making certain that the local models do not diverge from the global model by a significant margin. It is the proximal term that contributes to the stabilization of the training and the acceleration of convergence.
In basic federated learning setup, we have :
N devices/clients n = 1,2….N
Each client n has local data Dk. The goal is to minimize the global objective function.
Unlike Federated Averaging (FedAvg), FedProx adds a proximal term to stabilize training when data is non-IID: Each client solves
This means each client performs local optimization not just on their loss, but also on staying close to the global model.
Server Initialization: For each round Server selects subset Sends Each client Server aggregates the values.
using SGD or other local solvers, yielding
Dataset Description
For the purpose of our research, we made use of a publicly available dataset on Kaggle. Specifically, it is made up of high-resolution fundus images of both eyes of patients. The primary purpose of the dataset is to make it easier to perform automated detection and categorization of diabetic retinopathy. The severity of diabetic retinopathy is indicated by a grade ranging from 0 to 4 that is assigned to each individual photograph. There are four severity levels: 0 (no DR), 1 (mild), 2 (moderate), 3 (severe), and 4 (proliferative DR). There is an imbalance in the initial dataset. In an effort to enhance the performance of the model, we have incorporated a number of preprocessing steps, including CLAHE, as well as performed data augmentation.
Experimental Setup
In order to evaluate the performance of momentum based techniques with FedProx framework, we conducted experiments using a publicly available dataset on kaggle on diabetic retinopathy. All the images were resized and augmentation techniques along with preprocessing techniques were applied. A federated environment is constructed with 5 clients. The use of five clients was intended to simulate a multi-institutional clinical setting under controlled experimental conditions and to ensure manageable computational complexity. To reflect realistic deployment, the dataset is partitioned. We have adopted non iid data partitioning strategy based on label distribution skew. Each client was assigned a distinct class distribution and a dataset size. FedProx is employed to handle this non iid setting as it introduces a proximal constraint that stabilizes local updates. This setting is designed to simulate a collaborative clinical environment in which sensitive data remain localized while enabling collective model training. The federated training is carried with each client performing local epochs with a batch size of 32. Model updates are aggregated using the algorithm. For local model, we used CNN based architecture replacing the final classification layer with softmax output for five class prediction. Architecture of CNN consisted of three convolution layers with kernel size of 3*3 and Relu activation. Each convolution layer is followed by batch normalization and max pooling. Two fully connected layers with dropout regularization and a softmax output layer for five class DR classification. To study the impact of momentum-based optimizers, we tested each technique individually. The experiments were conducted on Google Colab platform. The learning rate was 1e-5 to ensure stable convergence, especially under federated optimization. The dataset was split into 80% for training and 20% for testing to evaluate the performance of model. Full client participation is employed in communication rounds. All five clients are selected in each round to ensure stable convergence. After each communication round, clients train locally and send their updates to the server. The server then aggregates these updates to update the global model.
This study evaluates the performance of proposed federated models using key metrics like accuracy and loss. Accuracy measures the proportion of correctly predicted DR classes and it serves as the primary indicator of the performance of the model. It is calculated after every communication round to observe how the global model improves over time. We have also used loss as an objective function during training which reflects how well the predicted class probabilities align with true classes.

FedProx architecture.

Flow diagram.

Using RMS prop.

Using SGD momentum.

Using shuffling SGD.

Accuracy curve.

Loss curve.

Mean Accuracy
This sections presents the experimental findings of momentum based optimization techniques with FedProx algorithm for diabetic retinopathy classification. Momentum based optimizers are employed to enhance convergence stability and training efficiency. Their selection is motivated by success in deep learning and federated optimization tasks. The optimizers evaluated include Adam, RMSProp, SGD with momentum and Shuffling SGD. Each technique is applied in federated environment with five clients. The global model is evaluated after each communication round using accuracy and loss as performance metrics. Across all the experiments, Shuffling SGD with FedProx algorithm outperforms others achieving an accuracy of 92.63% and lowest loss value of 0.22. To account for stochastic nature of Federated Learning, all experiments were repeated over five independent runs with different random seeds achieving this average accuracy of 92.63%
Gaps in Existing Work and Contributions.
Gaps in Existing Work and Contributions.
The proposed framework assumes a trusted server and a set of clients which may expose it to privacy threats such as model inversion and Byzantine behavior. Communication efficiency is not explicitly optimized. Frequent model exchanges could incur bandwidth overhead in large scale deployments. Evaluation is limited to a small number of clients and a single dataset. This configuration reflects a controlled experimental setting adopted in federated learning research to facilitate systematic evaluation. Scalability to larger networks and robustness against adversarial behavior are not addressed in this study.
Conclusion and Future work
The suggested method illustrates the efficacy of augmenting the FedProx algorithm with momentum-based optimization strategies for diabetic retinopathy classification in a federated setting. By including optimizers like Adam, Shuffling SGD, SGD with momentum, and RMS Prop, we tackle the issues of data heterogeneity and enhance convergence across remote data sources. Among the assessed approaches, Shuffling SGD achieved the maximum accuracy of 92.63% and a loss of 0.22, indicating its efficacy and resilience. These findings highlight the potential of integrating proximal regularization with momentum techniques to facilitate an accurate and privacy-preserving environment. Future studies will investigate the integration of communication efficient federated learning techniques, adoption of robust aggregation mechanisms and the extension of current configuration to a large-scale device environment.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
