Abstract
Background
Proliferative diabetes retinopathy (PDR) seriously affects the vision of patients. Exploring the key genes of retinal neovascularization is crucial for developing new biomarkers and therapeutic targets.
Objective
This study aimed to identify key genes associated with retinal neovascularization in Proliferative Diabetic Retinopathy (PDR), intending to develop new biomarkers and therapeutic targets. This would further our understanding of the progression of diabetic retinopathy and improve patient prognosis.
Methods
The gene data from 36 diabetic retinopathy patient samples and 45 samples from healthy volunteers or diabetic patients were selected from the GEO DataSets (Gene Expression Omnibus), specifically datasets GSE102485 and GSE160310. Utilizing the SVA algorithm to merge datasets and the limma package in R to identify differentially expressed genes (DEGs), we conducted a bioinformatic analysis of diabetic retinopathy. Functional insights were gained through DAVID database analyses, while STRING database-derived Protein-Protein Interaction (PPI) networks visualized in Cytoscape provided further context. Key genes were identified through LASSO regression and SVM analyses, with ROC curves assessing their diagnostic value. Single gene set enrichment analysis (GSEA) enhanced our understanding of the perturbed biological processes and pathways, advancing knowledge of diabetic retinopathy at the genomic level.
Results
A rigorous bioinformatic analysis yielded a comprehensive list of 1139 differentially expressed genes (DEGs), of which six pivotal genes—KDM5D, AC007040.11, AC015688.3, NLRP2, GYPC, and TMSB4Y—were identified as central to the study. These six genes consistently demonstrated a high diagnostic accuracy, with each exhibiting an area under the receiver operating characteristic (ROC) curve (AUC) exceeding 0.75. Gene Ontology (GO) enrichment analysis elucidated their primary roles in intricate biological processes, including inflammatory and immune responses, T-cell activation, cell apoptosis, and angiogenesis. Moreover, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment revealed their involvement in crucial signaling cascades such as cytokine-cytokine receptor interactions, cell adhesion molecule pathways, PI3K-Akt signaling, and hematopoietic cell lineage, further substantiating their significance in the pathogenesis of diabetic retinopathy.
Introduction
Proliferative diabetic retinopathy (PDR) stands as a pivotal and severe complication arising from diabetes mellitus, distinctly marked by the aberrant growth of neovascularization within the retinal tissue and the vitreous humor. This pathological process ultimately culminates in substantial visual impairment and, in dire cases, the irreversible consequence of blindness. As a manifestation of advanced diabetic retinal disease, PDR presents a substantial challenge in diabetes management, affecting a significant portion of the diabetic population worldwide. 1
With diabetes prevalence soaring globally, the incidence of diabetic retinopathy and its advanced form, PDR, has become a pressing public health issue. Epidemiological data indicate that nearly one-third of over 460 million individuals with diabetes show signs of retinopathy, with a notable fraction progressing to the more severe PDR.2,3 This trend underscores the importance of understanding and managing diabetic eye diseases within the broader context of the diabetes epidemic.
Recent advances in molecular biology and genetics have illuminated the pathogenesis of PDR. Hyperglycemia-induced retinal damage is central to the onset of PDR, triggering a cascade of events including retinal ischemia, neovascularization, and inflammation.4–6 A key factor in this process is the upregulation of vascular endothelial growth factor (VEGF), which serves as a major promoter of neovascularization. 7 Additionally, inflammatory cytokines, oxidative stress markers, and advanced glycation end products (AGEs) are significantly elevated in PDR patients, contributing to the progression of the disease.8–10
Genetic predisposition emerges as a pivotal factor in the development of Proliferative Diabetic Retinopathy (PDR). Genome-wide Association Studies (GWAS) have unveiled a multitude of genetic loci that significantly elevate the risk of PDR, underscoring the intricate interplay between genetic makeup and environmental factors, notably glycemic control. 11 Notably, the discovery of biomarkers for PDR has ushered in a new era of early diagnosis and precision-guided therapeutic interventions. Specifically, heightened concentrations of circulating biomarkers, including Intercellular Adhesion Molecule 1 (ICAM-1), Lipoprotein-Associated Phospholipase A2 (Lp-PLA2), and Pigment Epithelium-Derived Factor (PEDF), have been robustly associated with PDR, providing valuable insights into its pathogenesis.12–14 These biomarkers offer the potential for early diagnosis and could guide therapeutic strategies.
In terms of treatment, while traditional approaches like laser photocoagulation and vitrectomy remain standard, intravitreal injections of anti-VEGF agents have transformed the management of PDR.15,16 The focus is increasingly shifting towards early detection and prevention, with research exploring novel therapeutic targets based on identified biomarkers. In summary, PDR is a complex diabetic complication with a multifactorial etiology, encompassing metabolic, genetic, and environmental factors. Ongoing research is pivotal in expanding our understanding of PDR, enhancing early detection methods, and refining treatment strategies to mitigate vision loss in diabetic patients.
In this research, we embarked on a comprehensive analysis of gene expression in patients with PDR compared to those without the disease, to discover genes whose expression differed significantly between these groups. We meticulously analyzed the functions of these genes, pinpointed pivotal biomarkers linked to PDR, and evaluated their diagnostic accuracy. Furthermore, through gene set enrichment analysis (GSEA), we were able to identify several critical biological pathways that might play a role in the evolution of PDR. Our findings contribute to a deeper understanding of the molecular dynamics of PDR, enhancing our knowledge of its progression and offering potential avenues for improved prognostication and therapeutic intervention. However, it is important to acknowledge the limitations due to the heterogeneous nature of the sample groups in this study. The insights gained here will catalyzing more extensive clinical research and fundamental experimental studies, aimed at validating and expanding upon these initial discoveries, especially in light of the still elusive intricacies of the pathogenesis of PDR.
Materials and methods
Data download and correction
Data for proliferative diabetic retinopathy (PDR) was acquired from the GEO DataSets (Gene Expression Omnibus), specifically from datasets GSE102485 and GSE160310, which are chip data types. The GSE102485 dataset contained 25 PDR disease samples and 5 control samples, while GSE160310 comprised 11 PDR disease samples and 40 control samples. These datasets were merged and batch effects were corrected using the SVA algorithm, with subsequent analyses conducted on the corrected data.
Identification of differentially expressed genes (DEGs)
Utilizing the limma package within the R programming language (version 3.6), we systematically screened for DEGs between the normal and diabetic retinopathy cohorts within the corrected dataset. This selection was guided by rigorous statistical criteria, specifically a P-value threshold of < 0.05 and an absolute Log2 fold change (|Log2FC|) exceeding 1. Our analysis yielded a comprehensive list of 1139 DEGs, comprising 950 genes that exhibited upregulation and 189 genes displaying downregulation (Figure 1).

Research roadmap.
To gain insights into the biological functions and regulatory pathways of the pivotal genes among the 1139 differentially expressed genes (DEGs), we leveraged the DAVID database (https://david.ncifcrf.gov/home.jsp) for comprehensive annotation and visualization. This analysis encompassed Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis, tailored to specific genes of interest. Additionally, we constructed Protein-Protein Interaction (PPI) networks using the STRING 12.0 database, and employed Cytoscape 3.10.1 software to intuitively visualize these intricate networks.
Identification of key genes
To pinpoint the genes that are crucially implicated in retinal neovascularization in Proliferative Diabetic Retinopathy (PDR), we implemented a rigorous two-pronged approach. First, Lasso regression analysis utilizing the “glmnet” software package was conducted. Subsequently, an SVM feature selection algorithm from the “e1071” software package was applied. The intersection of genes identified by both methods was deemed as the key genes associated with retinal neovascularization in PDR, narrowing down the list from the initial 1139 DEGs.
Functional enrichment analysis of key genes
Gene set enrichment analysis (GSEA) was conducted using predefined gene sets (based on the KEGG database), categorizing genes based on expression differences between two sample groups, and determining if predefined gene sets were enriched at the top or bottom of the ranked list. GSEA software was used for gene set enrichment analysis to study the differences in KEGG pathways between high and low-expression groups of key genes.
Diagnostic effectiveness of key genes
The diagnostic efficacy of the identified key genes was quantified through the construction of Receiver Operating Characteristic (ROC) curves, with the area under the curve (AUC) serving as a metric to assess their performance in distinguishing between the normal and diabetic retinopathy states.
Data statistics and analysis
Bioinformatics analysis in this study was performed using R language (version 4.3.1). All statistical tests were two-sided, and a p-value of < 0.05 was considered statistically significant.
Results
Selection of differentially expressed genes
Post-correction of batch effects between chips using the SVA algorithm showed a reduction in variability (Figure 2). The analysis of differential gene expression yielded 1139 DEGs, with 950 genes being upregulated and 189 downregulated (Figure 3).

Evaluation of the batch effect before (a) and after (b) merging through the principal component analysis.

Differential expressed genes of Graves volcano plots (a) and heatmap (b).
Pathway analysis of the DEGs indicated significant enrichment in physiological processes, including inflammatory response, immune response, T-cell activation, cell apoptosis, and angiogenesis. In terms of cellular components, the enriched pathways were primarily in the plasma membrane, cell membrane, cytoplasm, and extracellular regions (Figure 4). At the molecular function level, enriched pathways included protein kinase activity, transcription factor activity, oxidoreductase activity, signaling receptor activity, and ion channel activity. KEGG pathway enrichment analysis revealed that the DEGs were mainly involved in cytokine-cytokine receptor interaction, cell adhesion molecules, the PI3K-Akt signaling pathway, and the hematopoietic cell lineage. The PPI network of the DEGs is shown in Figure 5.

Functional analysis of differentially expressed genes. (a) Biological process, (b) Cellular component, (c) Molecule function, and (d) KEGG.

PPI network of DEGs.
To explore biomarkers associated with retinal neovascularization in PDR, feature selection was performed using LASSO regression, with the regression coefficient path showing each gene's coefficient (Figure 6(a)). The optimal threshold for gene coefficients was determined based on the cross-validation curve as 10 (Figure 6(b)). Concurrently, key genes were evaluated using the SVM-RFE algorithm (Figure 6(c)), revealing six intersecting genes common to both methods: KDM5D, AC007040.11, AC015688.3, NLRP2, GYPC, and TMSB4Y. These six genes were selected for further investigation (Figure 6(d)).

(a) distribution of Lasso coefficients in Lasso regression; (b) confidence intervals for each lambda; (c) SVM feature selection algorithm, (d) Venn diagrams.
To evaluate the diagnostic efficiency of the six genes KDM5D, AC007040.11, AC015688.3, NLRP2, GYPC, and TMSB4Y, ROC curves were plotted using the merged and corrected dataset, and the area under the curve (AUC) was subsequently calculated. The results indicated that the AUCs for these genes were 0.7944, 0.8926, 0.8299, 0.7809, 0.8463, and 0.8494, respectively, all exceeding 0.75, demonstrating their high diagnostic efficiency (Figure 7).

Diagnostic efficiency of PDR-related biomarkers. (a) TMSB4Y; (b) NLRP2; (c) KDM5D; (d) GYPC; (e) AC015688.3; (f) AC007040.11.
To further explore the signaling pathways influenced by key genes and their potential molecular mechanisms impacting retinal neovascularization in PDR, GSEA was conducted. The results showed that the high expression group of GYPC was primarily enriched in 11 pathways, including
The low expression group of NLRP2 was mainly enriched in pathways like
The low expression group of AC007040.11 showed enrichment in pathways including
The high expression group of AC015688.3 was mainly enriched in pathways

GYPC single gene GSEA.

NLRP2 single gene GSEA.

AC007040.11 single gene GSEA.

AC015688.3 single gene GSEA.
This study identified 1139 differentially expressed genes (DEGs) in PDR, with a predominance of upregulated genes, which aligns with the current understanding of PDR as a condition characterized by significant molecular alterations. The majority of these DEGs are involved in processes such as inflammation, immune response, and angiogenesis, consistent with recent studies. For instance, Semeraro et al. highlighted the role of chronic low-grade inflammation and immune responses in the pathogenesis of DR and PDR, emphasizing microangiopathy and endothelial dysfunction. 17
Our pathway analysis revealed significant enrichment in cytokine-cytokine receptor interaction, cell adhesion molecules, and the PI3K-Akt signaling pathway, all of which are critical in cellular interactions and immune signaling in PDR. These pathways have been corroborated by previous studies, indicating their relevance in the disease process. For example, the PI3K-Akt pathway is known to play a crucial role in cell survival and angiogenesis, which are vital processes in retinal neovascularization. The enrichment of cell adhesion molecules highlights the importance of cell-cell interactions in maintaining retinal vascular stability, as also reported in the literature. 9
Our functional enrichment analysis revealed significant pathways related to cytokine-cytokine receptor interaction, cell adhesion molecules, and the PI3K-Akt signaling pathway, highlighting their potential involvement in the underlying mechanisms of the disease. These pathways are critical in cellular interactions and immune signaling in PDR, suggesting a complex interplay of molecular mechanisms. This supports findings by Petrovič et al., who discussed the involvement of immune mechanisms and the regulation of neovascularization in DR and PDR. 18
The identification of six key genes (KDM5D, AC007040.11, AC015688.3, NLRP2, GYPC, and TMSB4Y) provides promising targets for PDR research. Their high diagnostic efficiency, demonstrated by ROC curve analysis, suggests potential utility in early detection and monitoring of PDR. The GSEA results for these genes showed enrichment in pathways such as actin cytoskeleton regulation, natural killer cell-mediated cytotoxicity, and focal adhesion, integral to angiogenesis and immune responses in PDR. This aligns with studies by Xu and Chen, who showed the importance of prolonged immune response and DAMPs in DR development. 19
Comparing with existing literature underscores similarities and differences in our findings. The role of chronic inflammation and immune cell infiltration in PDR development is a recurring theme in both our study and previous research.20–22 However, our study extends this understanding by identifying specific genes and pathways pivotal in these processes, offering new insights into PDR pathophysiology.
In this study, we employed a rigorous bioinformatics approach to analyze gene expression profiles associated with proliferative diabetic retinopathy (PDR). Utilizing advanced computational tools, including the SVA algorithm for batch effect correction and the limma package for differential expression analysis, we identified 1139 differentially expressed genes (DEGs). Further functional annotation through DAVID and STRING databases allowed us to construct detailed Protein-Protein Interaction (PPI) networks. The integration of LASSO regression and SVM feature selection provided a robust framework for pinpointing six key genes with high diagnostic accuracy, each demonstrating an area under the ROC curve exceeding 0.75. Additionally, Gene Set Enrichment Analysis (GSEA) elucidated critical biological pathways influenced by these key genes, highlighting their involvement in angiogenesis and immune responses. These technical methodologies not only underpin the identification of potential biomarkers but also offer valuable insights into the molecular mechanisms driving PDR progression, thus providing a foundation for future therapeutic strategies.
Translating these findings into clinical practice involves several steps. First, the identified key genes and pathways can be explored as potential targets for therapeutic intervention. For instance, the PI3K-Akt signaling pathway, which plays a crucial role in cell survival and angiogenesis, could be targeted to inhibit abnormal neovascularization in PDR patients. Developing inhibitors that specifically modulate this pathway may offer a novel therapeutic approach. Furthermore, the significant enrichment of inflammatory and immune response pathways suggests that anti-inflammatory therapies could be beneficial in managing PDR. Drugs targeting cytokine-cytokine receptor interactions or specific inflammatory mediators identified in our study could help reduce retinal inflammation and neovascularization. For example, agents that inhibit the activity of key inflammatory cytokines involved in the identified pathways may prove effective in slowing or halting disease progression. Lastly, the identified DEGs and their protein interactions can guide the development of biomarker-based diagnostic tools. Early detection of PDR through non-invasive tests measuring the expression levels of these key genes could enable timely intervention, potentially preventing severe visual impairment. Future research should focus on the clinical validation of these biomarkers and the development of diagnostic assays that can be easily implemented in a clinical setting.
However, our study has limitations that need to be addressed. The primary limitation is the heterogeneity of the sample groups, as the PDR and control samples were derived from distinct patient cohorts. This could introduce variability that might affect the generalizability of our findings. To mitigate this, we applied the SVA algorithm to correct batch effects and harmonize the datasets, ensuring a more reliable analysis. Additionally, while our in-silico approach provides valuable insights, further experimental validation is necessary to confirm the biological relevance of the identified genes and pathways. Future research should focus on validating these results in larger, more diverse populations and exploring the therapeutic potential of the identified genes through in vivo studies and clinical trials.
In conclusion, our study contributes to the growing body of evidence on the molecular mechanisms underlying PDR and opens up new possibilities for its diagnosis and treatment. Further research in this area is essential to fully harness the potential of these findings. n addition to these findings, it is crucial to consider the broader context of diabetic retinopathy research, which has witnessed significant advancements in understanding the molecular and cellular mechanisms underlying the disease. For example, the role of oxidative stress and the contribution of genetic factors to the susceptibility and progression of DR have been areas of active research.23,24 Studies have shown that oxidative stress plays a crucial role in the pathogenesis of DR, leading to the activation of various signaling pathways that contribute to the disease's progression.9,25,26
Furthermore, the impact of systemic factors such as blood glucose control, hypertension, and lipid levels on the progression of DR and PDR has been well-documented.27,28 Tight glycemic control has been shown to reduce the risk of onset and progression of PDR, underscoring the importance of systemic management in conjunction with local ocular treatments.29,30
The potential of emerging therapies targeting specific molecular pathways in PDR is also noteworthy. Anti-VEGF therapy has revolutionized the treatment of PDR, and ongoing research is exploring other targets, such as inflammatory cytokines and growth factors, to develop more effective treatments with fewer side effects.15,31
This study not only identified novel biomarkers (KDM5D, AC007040.11, AC015688.3, NLRP2, GYPC, and TMSB4Y) for Proliferative Diabetic Retinopathy (PDR) but also demonstrated their high diagnostic efficiency (AUC values >0.75), indicating their potential utility in early disease detection and personalized treatment. Additionally, pathway-level insights gained through gene set enrichment analysis (GSEA) provided a deeper understanding of the molecular mechanisms driving PDR, including critical pathways such as cytokine-cytokine receptor interaction and PI3K-Akt signaling. These findings were timely and offered both clinical and research communities new avenues for improving patient outcomes and developing targeted therapies. By bridging molecular insights with potential clinical applications, this study contributed valuable knowledge for future research and clinical practice.
Conclusion
In our comprehensive investigation of Proliferative Diabetic Retinopathy (PDR), a severe complication of diabetes, we conducted a comparative analysis of gene expression between patients with diabetic retinopathy and healthy or non-retinopathic diabetic individuals. Through this rigorous process, we identified differentially expressed genes (DEGs), explored their functional roles, pinpointed key PDR biomarkers, assessed diagnostic efficacy, and unearthed pivotal pathways potentially involved in PDR progression via GSEA analysis. These insights substantially enrich our understanding of the pathophysiological intricacies of PDR. However, it's important to acknowledge the limitations inherent in our study, primarily stemming from the fact that the PDR and control samples were derived from distinct patient cohorts. This aspect, to some extent, dilutes the robustness of our conclusions. Recognizing these constraints, we view our current research as a foundational step, one that lays the groundwork for more targeted clinical investigations and fundamental experimental studies. Such future endeavors are crucial given the complex and still not fully elucidated mechanisms underpinning PDR. Our ultimate aim is to leverage these findings to forge new paths in the treatment and management of this challenging diabetic complication, thereby enhancing patient outcomes and quality of life.
Footnotes
Consent for publications
The author read and approved the final manuscript for publication.
Ethics approval and consent to participate
No human or animals were used in the present research.
Informed consent
The authors declare that no patients were used in this study.
Authors’ contributions
Shengnan Zhang and Chao Sun designed the study and performed the experiments, Wenqi Song collected the data, Bingyao Huang analyzed the data, Shengnan Zhang and Chao Sun prepared the manuscript. All authors read and approved the final manuscript.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Availability of data and material
The data that support the findings of this study are available from the corresponding author upon reasonable request
