Abstract
Background
Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by significant cognitive decline and memory impairment. This condition imposes a considerable economic burden on healthcare systems worldwide.
Objective
Current diagnostic approaches often lack specificity and sensitivity, necessitating innovative methods to identify potential biomarkers that could facilitate earlier intervention and improved patient outcomes.
Methods
In this study, we aimed to elucidate the role of the BSCL2 gene in the pathogenesis of AD by employing various bioinformatics techniques, including transcriptomic analysis and single-nucleus RNA sequencing (sn-RNAseq). We also integrated machine learning algorithms to identify potential biomarkers associated with AD. A weighted gene co-expression network was constructed to uncover co-expression modules linked to BSCL2, alongside AGPAT1 and EHD2, which demonstrated promising diagnostic potential with area under the curve (AUC) values exceeding 0.7.
Results
Our analyses revealed significant alterations in immune cell profiles across different AD subtypes, providing insights into personalized immunotherapy approaches. Furthermore, pathway enrichment analyses highlighted key biological processes involved in AD, including oxidative phosphorylation, neuroactive ligand-receptor interactions, and Notch signaling pathways. Notably, sn-RNAseq data indicated that BSCL2-related gene activity exhibited significant changes in neural lineages, suggesting its influence on neurodegenerative mechanisms.
Conclusions
This study underscores the potential of BSCL2, AGPAT1, and EHD2 as novel biomarkers for early detection of Alzheimer's disease. The insights gained from the co-expression analyses and immune profiling pave the way for personalized therapeutic strategies aimed at modulating the immune response in AD. Future research should focus on the clinical validation of these biomarkers and their role in the development of targeted interventions, ultimately enhancing our understanding of AD pathophysiology and improving patient management.
Introduction
Alzheimer's disease (AD) is a progressive neurodegenerative disorder marked by a gradual deterioration in cognitive function and memory, significantly impacting not only the diagnosed individuals but also their families and society as a whole.1–3 As the prevalence of AD continues to increase, it is accompanied by rising healthcare costs and a substantial economic burden.4,5 Current diagnostic methods, such as clinical evaluations and neuroimaging techniques, have notable limitations.6–8 They often lead to diagnoses only in the later stages of the disease. These challenges underscore the urgent need for innovative research aimed at identifying potential biomarkers and therapeutic targets to improve both the diagnosis and treatment of AD.
Recent studies have suggested that the BSCL2 gene may play a pivotal role in the molecular mechanisms underlying AD. 9 Evidence indicates that BSCL2 is involved in lipid metabolism and neuroinflammation,10,11 both of which are critical elements of AD pathology.12,13 These findings underscore the necessity for further investigation into BSCL2 and its associated biological pathways.
The integration of bioinformatics, single-nucleus RNA sequencing (snRNA-seq), and machine learning signifies a significant advancement in understanding complex diseases such as AD.14–16 By leveraging these technologies, researchers can analyze large-scale genomic and transcriptomic datasets, which facilitates the identification of potential biomarkers in AD. This approach also enables the exploration of gene interactions and the construction of regulatory networks, thereby offering a deeper insight into the biological processes associated with AD.
This study seeks to address the urgent demand for enhanced diagnostic and therapeutic strategies for AD through an investigation of the BSCL2 gene and its associated pathways. To this end, the research employs advanced bioinformatics and machine learning methodologies to elucidate the molecular mechanisms underlying AD and to identify biomarkers that facilitate earlier diagnosis and more effective treatment.
Materials and methods
Data collection and processing data acquisition
The overall analytical workflow is illustrated in Figure 1. We sourced the transcriptome data related to AD from the Gene Expression Omnibus database (GEO, https://www.ncbi.nlm.nih.gov/geo/), specifically obtaining the dataset GSE132903, which includes 97 AD samples and 98 control samples. To maintain consistency and ensure comparability across the dataset, all samples underwent normalization before any further analysis. We utilized the “limma” package to identify differentially expressed genes (DEGs) between the AD and control samples within the cohort, applying the criteria of p < 0.05 and |log2FC| > 0.25.

Study flowchart.
Construction of protein-protein interaction (PPI) network interacting with BSCL2
The Search Tool for the Retrieval of Interacting Genes (STRING) database (https://string-db.org/) was employed to investigate the proteins that interact with BSCL2 and to develop protein-protein interaction (PPI) networks. We began by entering “BSCL2” in the database's search feature under the “Protein by name” category, selecting “Homo sapiens” as the organism, and setting the minimum interaction score to “Medium confidence (0.400).” Additionally, we specified a first shell with a maximum of 15 interactors, while keeping the other parameters at their default settings. Using the results obtained, we constructed a PPI network for further examination. Subsequently, we utilized the “VennDiagram” package to visualize the overlaps between “BSCL2-Associated Genes” and DEGs from the dataset GSE132903. Furthermore, we employed the “circlize” package for gene chromosome localization, which allowed us to create a circular layout graphic that effectively displays the positions of genes on chromosomes.
LASSO-RF-XGBoost integrated framework identifies AD's diagnostic genes
To identify optimal diagnostic biomarkers for AD, we utilized several machine learning techniques, including least absolute shrinkage and selection operator (LASSO), random forests (RF), and eXtreme Gradient Boosting (XGBoost), for selecting feature genes. The LASSO algorithm, which stands for Least Absolute Shrinkage and Selection Operator, is a sophisticated type of penalized regression that significantly enhances both the interpretability of the model and its prediction accuracy. This is achieved through the application of a contraction penalty function, which effectively encourages variable selection by shrinking the coefficients of less important predictors towards zero, thereby simplifying the model and allowing for a clearer understanding of the relationships between the variables involved. 17 We executed this process using the “glmnet” R package, determining the minimal lambda as the optimal value. 18 We utilized the “randomForest” package in R to perform an extensive random forest analysis with the objective of identifying critical feature genes integral to our study. This advanced ensemble method generates multiple decision trees through a process known as bootstrap aggregation, which mitigates overfitting by averaging the outcomes of numerous trees, thereby improving predictive accuracy. Additionally, the method evaluates the importance of each gene by examining the mean decrease in the Gini index, offering valuable insights into the genes that exert the most influence within the framework of our analysis.18,19 Conversely, the XGBoost algorithm adopts an advanced methodology by leveraging gradient-boosted decision trees (GBDT) that are augmented with regularization techniques. These techniques are instrumental in managing model complexity while constructing highly accurate predictive models and assessing feature importance. The integrated regularization mechanism is pivotal in mitigating overfitting, thereby enhancing the model's ability to generalize to novel data. Additionally, the algorithm's proficiency in feature ranking facilitates the process of variable selection, enabling practitioners to efficiently identify and prioritize the most influential variables within their datasets.20,21 A Venn diagram illustrated the consensus genes identified by LASSO, RF, and XGBoost, which are three high-performance binary classifiers. The overlapping candidates were then moved forward to the validation stages. Lastly, we evaluated diagnostic accuracy using the “pROC” package by calculating ROC curves and corresponding area under the curve (AUC) values.
WGCNA reveals BSCL2-linked gene modules in AD
BSCL2 scores for AD samples versus controls were calculated using the “GSVA” package. The Weighted gene co-expression network analysis (WGCNA) package then analyzed dataset GSE132903, using the BSCL2 score as trait data to identify module genes strongly correlated with it. Samples were also clustered to show overall correlation within the dataset. Initially, the similarity between gene pairs is assessed using Pearson correlation coefficients, which are subsequently transformed into a weighted adjacency matrix. To preserve the scale-free topology of the network, a soft-thresholding power (β) is applied. The weighted adjacency matrix is then converted into a topological overlap matrix (TOM), which quantifies the connectivity among genes within the network. Utilizing the TOM-based dissimilarity measure (1-TOM), genes are clustered into distinct modules through hierarchical clustering, with a minimum cluster size threshold set at 50. Following the identification and merging of modules, a total of nine distinct co-expression modules were established. To investigate the association between these modules and clinical characteristics, Person correlation coefficients were computed between the module eigengenes and the clinical traits. The green-yellow module, exhibiting the strongest correlation with the clinical features, was selected for further analysis.
Unsupervised consensus clustering based on hub genes for AD subtype
Based on the expression profiles of genes in the green-yellow module linked to AD, we conducted a consensus clustering analysis using the R package “ConsensusClusterPlus” to categorize AD samples from the GSE132903 cohort into distinct molecular subtypes. To determine the optimal number of subtypes, we evaluated the consensus matrix (CM), the cumulative distribution function (CDF) curve, and trajectory plots. We selected a maximum cluster number of k = 2 for our analysis, guided by the CDF index. To validate this classification, we employed principal component analysis (PCA), which allowed us to visualize the distinct gene expression patterns across the identified subtypes. We then identified DEGs between the two AD subtypes using a threshold of p < 0.05 and |log2FC| > 0.5, with the results illustrated in a volcano plot. The cluster-specific DEGs underwent further analysis for GO biological functions and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment, utilizing the R packages “clusterProfiler” and “org.Hs.eg.db,” considering an adjusted p-value of <0.05 as significant. The enriched terms were then visualized using the “ggplot2” and “GOplot” packages.
Immune infiltration analysis through CIBERSORT and ssGSEA
To investigate the differential immune cell infiltration between two subtypes in AD, the CIBERSORT algorithm was utilized to classify and quantify the abundance of 22 distinct immune cell types, employing a permutation of 100. The infiltration patterns were visualized using stacked histograms to depict relative abundances and clustered heatmaps to highlight subtype-specific immune signatures. Furthermore, the “psych,” “ggcorrplot,” and “ggplot2” packages were employed to analyze the correlations between three hub genes and immune infiltrations.
In addition to the CIBERSORT-based quantification of immune cells, we further elucidated the immune microenvironment by employing single-sample gene set enrichment analysis (ssGSEA) with the “GSVA” R package. 22 This approach allowed for the simultaneous evaluation of 28 immune signatures, encompassing both cellular infiltration levels and immune-related functional activity clusters. Differential immune infiltration between Cluster 1 and Cluster 2 was depicted using box plots, while correlations between three key genes and immune signatures were analyzed and visualized through heatmaps utilizing the “ggcorrplot” package. Additionally, we examined the associations between three hub genes and immune cells using the “psych” and “ggplot2” packages, with the findings illustrated in lollipop and scatter plots.
Development and validation of a BSCL2/AGPAT1/EHD2-based nomogram for predicting AD occurrence
Nomograms have emerged as valuable clinical tools for multivariate diagnostic assessment and disease prognosis prediction. 23 Therefore, we constructed a nomogram model based on BSCL2, AGPAT1, and EHD2 to predict the occurrence of AD patients using the “rms” package in R. Each predictor variable was assigned a weighted score, with the summation yielding a total predictive score. The calibration curve was used to evaluate the consistency between the model's predicted values and the actual observed values. Meanwhile, we assessed whether decisions made based on the model could provide net benefits for patients by conducting a decision curve analysis (DCA) and plotting a clinical impact curve. Finally, we utilized the “pROC” package to construct receiver operating characteristic (ROC) curves to further evaluate the diagnostic accuracy of our model.
Single-gene GSEA analysis of hub genes
Gene Set Enrichment Analysis (GSEA) is employed to identify significantly enriched biological pathways within gene expression datasets by analyzing predefined gene sets. 24 Following the identification of hub genes, we conducted a single-gene GSEA analysis utilizing the “clusterProfiler” package to examine the biological pathways associated with BSCL2, AGPAT1, and EHD2. This analysis aimed to elucidate the biological functions, signaling pathways, and regulatory roles of these hub genes, thereby clarifying their molecular mechanisms and functional significance. The visualization of the enriched pathway components was performed using the “enrichplot” R package.
Construction of gene-miRNA-lncRNA-transcription factor network of the key genes
To enhance our understanding of the molecular mechanisms underlying AD, we developed an extensive network comprising hub gene-miRNA, miRNA-lncRNA, and hub gene-transcription factor (TF) interactions. Initially, the ENCORI database was employed to predict miRNAs that interact with key genes. Similarly, miRNA-lncRNA interactions were identified using the ENCORI database. Additionally, the ChIPBase database facilitated the identification of transcription factors (TFs) interacting with these hub genes. All network visualizations were created using Cytoscape, an open-source bioinformatics platform designed for the integrative analysis of complex molecular interaction networks. 25
GSEA/GSVA analysis of AD subtype function
To elucidate the molecular mechanisms differentiating AD subtypes, we employed Gene Set Variation Analysis (GSVA) utilizing hallmark pathways from the Molecular Signatures Database (MSigDB). 26 The analysis was conducted using the R package “GSVA,” and we evaluated the variations in GSVA pathway activity between the two identified clusters. Furthermore, we performed GSEA on cluster-specific pathways using the R package “clusterProfiler.” A p-value of less than 0.05 was considered the threshold for statistical significance.
Computational analysis of single-nucleus RNA sequencing (snRNA-Seq) data reveals AD cell signatures
The snRNA-seq dataset GSE157827, linked to AD, was retrieved from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). For the subsequent analysis, one control sample (GSM4775574) and one AD sample (GSM4775562) were selected. The (snRNA-seq) analyses were rigorously performed using the Seurat package, version 5.1.0, within the R programming framework. The raw data were systematically converted into Seurat objects to facilitate subsequent analysis. To address and minimize potential batch effects that might distort the findings, the Harmony package in R was utilized, thereby enhancing the accuracy and reliability of the snRNA-seq data analysis. Cells exhibiting more than 8000 genes or fewer than 500 genes were discarded from the analysis, along with cells where over 20% of the unique molecular identifiers (UMIs) originated from the mitochondrial genome. Data normalization was performed using the “NormalizeData” function. Principal component analysis (PCA) was utilized for the examination of single-cell samples, and overall dimensionality reduction was achieved through the Uniform Manifold Approximation and Projection (UMAP) algorithm. Cell type identification was carried out using the SingleR package.
Integrated single-cell assessment of BSCL2-associated signaling: AUCell-based scoring and CellChat network analysis
Utilizing the R package AUCell, each cell was rigorously evaluated based on a curated set of six genes (BSCL2, AGPAT1, EHD2, PNPLA2, CA10, EHD3) intricately associated with BSCL2. This assessment incorporated the calculation of AUC values, offering a quantitative metric of gene activity within each individual cell. This methodology enabled the creation of gene expression hierarchies for each cell, facilitating the estimation of the fraction of highly expressed genes. Cells that exhibited expression of a greater number of genes were marked by higher AUC scores. Following this, the FeaturePlot and VlnPlot functions were utilized to illustrate the distribution of AUC scores across single cells. Subsequently, we investigated the interactions between different cell clusters utilizing the “CellChat” package.
The research flowchart of this study is summarized in Figure 1.
Results
Integrated analysis reveals BSCL2 PPI networks in AD pathogenesis
To gain insights into the interacting partners of BSCL2, we constructed protein-protein interaction (PPI) regulatory networks for BSCL2-associated proteins using the STRING database. Through this analysis, we identified 39 high-confidence interacting proteins (Figure 2A). This network elucidates the complex interactions involving BSCL2 and provides a comprehensive framework for understanding its potential molecular mechanisms in AD. To further investigate the molecular regulatory mechanisms of BSCL2 in AD, we performed a systematic intersection analysis between the 39 BSCL2-interacting genes and the AD-related differentially expressed genes from the GSE132903 dataset. This integrated analysis identified six overlapping genes: PNPLA2, AGPAT1, BSCL2, CA10, EHD2, and EHD3 (Figure 2B). Independent samples t-tests were conducted to validate the expression differences of these six genes between the AD and control groups. The results demonstrated significant downregulation of five genes (PNPLA2, AGPAT1, BSCL2, CA10, and EHD3) and upregulation of EHD2 in AD patients (Figure 2C), thereby confirming their strong association with disease pathology. Furthermore, a map illustrating the chromosomal locations of these genes was developed to facilitate detailed visualization (Figure 2D). Collectively, these findings implicate networks centered around BSCL2 in the pathophysiology of AD.

BSCL2-centered protein interaction networks in AD. (A) BSCL2 PPI network analysis by the STRING database. (B) Venn diagram analysis between the 39 BSCL2-interacting genes and the AD-related DEGs. (C) The boxplot shows the expression levels of 6 overlapping genes between the control and AD groups. (D) Chromosomal localization map of 6 overlapping genes.
Machine learning and ROC validation of BSCL2/AGPAT1/EHD2 as AD diagnostic biomarkers
To identify diagnostic markers for AD, we conducted an analysis of six overlapping genes utilizing various machine learning techniques, including Least Absolute Shrinkage and Selection Operator (LASSO) regression, Random Forest (RF), and Extreme Gradient Boosting (XGBoost). LASSO regression notably identified four key genes, as illustrated in Figures 3A and 3B. Similarly, the RF algorithm evaluated candidate genes based on their Mean Decrease Gini scores and Mean Decrease Accuracy, subsequently ranking the top genes in descending order of relative importance (Figures 3C–3E). Furthermore, XGBoost models were developed using the six crosstalk genes, and the significance of these genes within the models was evaluated (Figures 3F–3H). By employing a Venn diagram (Figure 3I), we synthesized the findings from the three algorithms, leading to the identification of three hub genes—BSCL2, AGPAT1, and EHD2—as potential candidate biomarkers.

Machine learning validation of BSCL2/AGPAT1/EHD2 as AD diagnostic biomarkers. (A, B) Construction of a LASSO regression model. (C-E) Utilization of the RF algorithm for the development of screening and diagnostic markers. (F-H) Application of the XGBoost algorithm for biomarker screening. (I) Venn diagram analysis of hub genes identified across LASSO, RF, and XGBoost methodologies.
To further evaluate the diagnostic efficacy of the BSCL2, AGPAT1, and EHD2 genes in AD, we performed receiver operating characteristic (ROC) curve analyses utilizing the GSE132903 dataset. The analyses revealed that EHD2 demonstrated the highest diagnostic accuracy, with an AUC of 0.754, followed by BSCL2 with an AUC of 0.735, and AGPAT1 with an AUC of 0.717 (Figure 4). These findings suggest that the three genes possess significant potential as diagnostic biomarkers for AD.

ROC analysis validates BSCL2, AGPAT1, and EHD2 as potential diagnostic biomarkers for AD. (A) ROC analysis for BSCL2 in dataset GSE132903. (B) ROC analysis for AGPAT1 in dataset GSE132903. (C). ROC analysis for EHD2in dataset GSE132903.
Identifying BSCL2-associated co-expression modules in AD using WGCNA
To identify co-expression modules associated with BSCL2, AGPAT1, and EHD2 in AD, we constructed a weighted gene co-expression network using WGCNA on the GSE132903 dataset. We began by clustering the samples and removing outliers, which resulted in a final set of 195 samples for our analysis (Supplemental Figure 1A). Next, we established a scale-free network by applying a soft threshold of β = 19 (Supplemental Figure 1B). We then utilized the dynamic hybrid cutting method to create a hierarchical clustering tree, merging similar modules and assigning them different colors, which led to the identification of 9 consensus modules (Supplemental Figure 1C and 1D). Notably, the green-yellow module exhibited the strongest correlation and the most significant p-value with BSCL2 in the context of AD (Figure 5A). This analysis revealed 149 genes within this module, and the scatter plot demonstrates the relationship between module membership and gene significance for the green-yellow module related to AD, which were selected for further investigation (Figure 5B).

Identification of BSCL2-associated gene co-expression networks in AD using WGCNA. (A) The module-trait relationships are illustrated in a heatmap where each cell displays the corresponding correlation coefficient and p-value. (B) The scatter plot is provided to depict the correlation between module membership and gene significance in the green-yellow module.
Identification of BSCL2-associated subtypes in AD
To explore the molecular mechanisms of BSCL2-associated genes in the pathogenesis of AD, we performed unsupervised consensus clustering on 97 brain tissue samples from AD patients, sourced from GSE132903, utilizing the ConsensusClusterPlus R package and focusing on the genes within the green-yellow module. We began by systematically assessing potential cluster numbers ranging from k = 2 to k = 10. Our thorough evaluation involved analyzing the consensus matrix heatmap (Figure 6A), the cumulative distribution function (CDF) curve (Figure 6B), and the delta area plot (Figure 6C), which collectively indicated that k = 2 was the most suitable number of clusters. Consequently, we identified two distinct molecular clusters: cluster 1 comprised 56 samples, while cluster 2 included 41 samples. To further substantiate the robustness of our AD subtyping, we conducted principal component analysis (PCA), which revealed clear differences in transcriptional profiles between the two clusters. The PCA plot demonstrated a distinct separation between Cluster 1 and Cluster 2, reinforcing the reliability of our clustering results (Figure 6D).

Identification of two molecular phenotypes through unsupervised learning in AD. (A-C) Two molecular clusters were identified through unsupervised clustering and illustrated using various visualization techniques, including a heatmap, CDF plot, and delta plot. (D) The three-dimensional PCA scatter plot further highlights the presence of two distinct clusters that emerged from the unsupervised consensus clustering process. (E) A volcano plot was generated to display the DEGs between the identified subtypes. (F) bubble plots and chord diagrams were employed to visualize the results of the Gene Ontology (GO) functional enrichment analysis conducted on the DEGs across the subtypes. (G) KEGG pathway analysis was performed to investigate the pathways associated with the DEGs between the subtypes.
We conducted a differential expression analysis between Cluster 1 and Cluster 2, identifying 394 DEGs, with 127 upregulated and 267 downregulated genes (p-value < 0.05 and |log2FC| > 0.5), as visualized in the volcano plot (Figure 6E), followed by GO and KEGG pathway enrichment analyses to explore their biological functions (Figure 6F). GO analysis showed distinct DEG distributions across categories, highlighting significant Biological Processes like synaptic transmission and neurotransmitter regulation, while KEGG revealed involvement in neural pathways such as GABAergic and Glutamatergic synapses (Figure 6G).
Differentiation of immune characteristics between cluster1 and cluster2
To gain a more comprehensive understanding of the distinct biological and immunological characteristics of the immune microenvironment subtypes, we employed the CIBERSORT algorithm to analyze and compare the infiltration levels of 22 immune cell subsets between Cluster 1 and Cluster 2. The visualizations, including a boxplot (Figure 7A) and a stacked plot (Figure 7B), demonstrated significant differences in immune cell infiltration between the clusters. Notably, Cluster 2 exhibited a higher infiltration of regulatory T cells (Tregs) and resting natural killer (NK) cells, while showing lower levels of activated NK cells and eosinophils compared to Cluster 1.

Assessment of immune cell infiltration and its correlation with hub genes. (A) Box plot of 22 types of immunity infiltrating cells between cluster 1 and cluster 2. (B) Stacked Histogram of immune cell distributions between cluster 1 and cluster 2 using the CIBERSORT algorithm. (C) Heatmap of correlation between three hub genes and 22 types of immune cells. (D–F) The correlations of hub genes (BSCL2, AGPAT1, and EHD2) with immune cells were determined using p < 0.05 as the screening criterion. (G) Correlation analysis of BSCL2 and Monocytes. (H) Correlation analysis of AGPAT1 and Eosinophils. (I) Correlation analysis of EHD2 and M1 Macrophages.
We conducted an in-depth analysis of the complex interactions between the three hub genes and immune cells. The heatmap presented in Figure 7C delineates the correlations between these hub genes and various immune cell types. Specifically, BSCL2 demonstrated a positive correlation with activated NK cells, activated dendritic cells, eosinophils, and follicular helper T cells, while exhibiting a negative correlation with monocytes, resting CD4 memory T cells, resting NK cells, and M0 macrophages, as depicted in Figure 7D. AGPAT1 showed a negative correlation with eosinophils and a positive correlation with CD8 T cells, as illustrated in Figure 7E. Furthermore, EHD2 was negatively correlated with resting NK cells and M1 macrophages, but positively correlated with plasma cells and activated dendritic cells, as shown in Figure 7F. We conducted additional validation of these associations using scatter plot visualizations, which highlighted the most significant correlations between genes and immune cells. Specifically, BSCL2 displayed the most pronounced negative correlation with monocytes (Figure 7G), AGPAT1 exhibited a notable negative correlation with eosinophils (Figure 7H), and EHD2 showed the strongest positive correlation with M1 macrophages (Figure 7I). These gene-specific immune signatures may contribute to subtype-specific pathological mechanisms in the progression of AD.
The BSCL2-AGPAT1-EHD2 triad distinctly regulates effector memory CD4+T cells, identifying immunometabolic subtypes in AD
To further elucidate the functional implications of two subtypes and their distinct immune-gene interaction patterns, we conducted single-sample gene set enrichment analysis (ssGSEA) on samples from the GSE132903 dataset. This analysis characterized immune cell infiltration patterns across 28 immune cell types within Cluster 1 and Cluster 2. The findings, illustrated through box plots, revealed that activated CD8 T cells, activated dendritic cells, central memory CD4 T cells, effector memory CD8 T cells, immature B cells, macrophages, myeloid-derived suppressor cells (MDSCs), memory B cells, monocytes, plasmacytoid dendritic cells, and type 17 T helper cells were significantly more prevalent in Cluster 2 compared to Cluster 1. Conversely, CD56bright natural killer cells and effector memory CD4 T cells were significantly less prevalent in Cluster 2 relative to Cluster 1 (Figure 8A).

Correlation analysis of hub genes and immune cell infiltration. (A) Box plot showing the immune cell heterogeneity of the two subtypes by ssGSEA algorithm. (B) Representative correlation heatmap between the result of ssGSEA algorithm and hub genes. (C-E) Correlations between the hub genes (BSCL2, AGPAT1, and EHD2) and infiltrating immune cells. (F-H) Scatter diagram of correlation analysis between the hub genes (BSCL2, AGPAT1, and EHD2) and effector memory CD4 T cells.
We analyzed correlations between BSCL2, AGPAT1, EHD2 expressions and 28 immune cell types (Figure 8B). BSCL2 positively correlated with effector memory CD4 T cells but negatively with several others, including gamma delta T cells and neutrophils (Figure 8C). AGPAT1 showed positive correlations with memory B cells and macrophages, but negative with type 2 T helper cells and effector memory CD4 T cells (Figure 8D). EHD2 showed positive correlations with various immune cells, including plasmacytoid dendritic cells and natural killer cells, but negative correlations with regulatory T cells and several other T cell subsets (Figure 8E). All three hub genes had significant yet differing correlations with effector memory CD4+ T cells and plasmacytoid dendritic cells. Specifically, BSCL2 was positively correlated with effector memory CD4+ T cells, while AGPAT1 and EHD2 were negatively correlated (Figure 8F—H).
The diagnostic nomogram for BSCL2-AGPAT1-EHD2 trio in AD
To improve the prediction of AD risk, we created a diagnostic nomogram that incorporates three key genes: BSCL2, AGPAT1, and EHD2, utilizing the ‘rms’ R package (Figure 9A). This nomogram provides a quantitative estimate of an individual's likelihood of developing the disease based on these molecular markers. The calibration curve showed that the nomogram has strong predictive accuracy, with the estimated risk probabilities closely matching the actual incidence rates of AD, which indicates that the model performs reliably (Figure 9B). Additionally, the decision curve analysis (DCA) revealed that the nomogram offers a significant net benefit across a range of threshold probabilities from 0.125 to 0.875, suggesting that using this model for decision-making could be advantageous for patients at risk of AD (Figure 9C). The clinical impact curve analysis further confirmed the nomogram's excellent predictive performance, as the curve representing the “high risk number” closely followed the curve for “high risk with event number” within the threshold range of 0.4 to 1 (Figure 9D), highlighting its substantial clinical utility for stratifying AD risk. Furthermore, the receiver operating characteristic (ROC) analysis demonstrated the nomogram's strong predictive efficiency, achieving an AUC of 0.875 in the GSE132903 dataset (Figures 9E), which indicates that the model is effective for diagnosing AD.

Construction of the nomogram model. (A) Nomogram establishment for the diagnostic model of AD. (B) Calibration curves to evaluate the predictive power of the nomogram. (C) Decisions curve based on the nomogram model may benefit AD patients. (D) Assessing the clinical impact of the nomogram model by clinical impact curves. (E) ROC curve analysis of risk prediction model in GSE132903 dataset.
Single-gene GSEA analysis of hub genes
To further explore the potential functional mechanisms of BSCL2, AGPAT1, and EHD2 in the pathogenesis of AD, we conducted single-gene GSEA to identify significantly enriched pathways. The high-expression group of BSCL2 was primarily enriched in pathways related to the neurotransmitter release cycle and neurotransmitter receptors and postsynaptic signaling. In contrast, the low-expression group of BSCL2 showed significant enrichment in pathways associated with ribosome function and selenoamino acid metabolism (Figures 10A, B). For AGPAT1, high expression was linked to pathways involving cell adhesion molecules (CAMs), the Notch signaling pathway, and glutathione metabolism, while its downregulation was associated with the G2/M DNA damage checkpoint and various cell cycle checkpoints (Figures 10C, D). Regarding EHD2, its high expression was connected to the Notch and Hippo signaling pathways, whereas its downregulation was related to the glutamate neurotransmitter release cycle, aggrephagy, and amino acids that regulate mTORC1 (Figures 10E, F).

GSEA analysis of hub genes (BSCL2, AGPAT1, EHD2) in AD. (A, B) The GSEA analysis identified the three pathways associated with BSCL2 upregulation and downregulation. (C, D) The GSEA analysis identified the three pathways associated with AGPAT1 upregulation and downregulation. (E, F) The GSEA analysis identified the three pathways associated with EHD2 upregulation and downregulation.
Pathway enrichment analysis between the two clusters
To gain deeper insights into the functional mechanisms of BSCL2, AGPAT1, and EHD2 in AD, we conducted an integrated GSEA and GSVA to compare the molecular profiles of Cluster 1 and Cluster 2. The GSEA ridge plot analysis revealed that the DEGs between these two clusters were predominantly enriched in several key pathways, including oxidative phosphorylation, neuroactive ligand-receptor interactions, autoimmune thyroid disease, the Notch signaling pathway, and cell adhesion molecules (CAMs), as illustrated in Figure 11A. Following this, we performed GSVA enrichment analysis using the HALLMARK gene set to further clarify the functional differences between the two clusters at the pathway level. The results indicated that Cluster 1 showed significant enrichment in pathways related to the inflammatory response, IL6/JAK/STAT3 signaling, Wnt/beta-catenin signaling, and the Notch signaling pathway. Conversely, Cluster 2 exhibited significant enrichment in pathways associated with E2F targets, pancreatic beta cell functions, and the G2 M checkpoint, as shown in Figures 11B and C.

Identification of clinically actionable subtypes in AD through the convergence of neuroinflammatory and cell-cycle pathway dysregulation. (A) The results of KEGG enrichment analysis by the GSEA algorithm. (B) Pathway enrichment analysis based on the Hallmark database using GSVA. (C) The results of the GSVA analysis show the differences in signaling pathways between cluster 1 and cluster 2.
Interconnected networks of miRNAs, lncRNAs, and TFs regulate the expression of BSCL2, AGPAT1, and EHD2 in the context of AD
To further explore the regulatory mechanisms of BSCL2, AGPAT1, and EHD2 in AD, we began by examining the interaction relationships between these three hub genes and their potential microRNAs (miRNAs) using the ENCORI database. This analysis revealed 16 miRNAs that may interact with these genes (Figure 12A), providing a foundation for further research. Building on this, we constructed a miRNA-long non-coding RNA (lncRNA) interaction network to identify potentially regulatory lncRNAs, resulting in the prediction of 14 lncRNAs associated with 9 key miRNAs (Figure 12B). Furthermore, we utilized ChIPBase to identify transcription factors (TFs) that interact with these hub genes and visualized the TF-gene regulatory network using Cytoscape. This network diagram, which includes 14 TFs (Figure 12C), clearly illustrates the significant transcription factors that may influence the expression of BSCL2 and EHD2 within the context of AD.

miRNA-lncRNA-TF networks regulate BSCL2/AGPAT1/EHD2 in AD. (A) The key genes-miRNA interaction network. (B) The miRNA- lncRNA interaction network. (C) The key genes-TFs interaction network.
Single-nucleus transcriptomic dissection of BSCL2-associated gene activity across neural lineages in AD
To more accurately delineate the expression of BSCL2-associated genes in AD, we conducted a comprehensive analysis of single-nucleus RNA sequencing (snRNA-seq) data derived from the GSE157827 dataset. Following rigorous quality control and normalization procedures, we assessed various metrics, including the number of detected genes (nFeature), the total sequence count per cell (nCount), log10 of genes per unique molecular identifier (log10GenesPerUMI), and the mitochondrial ratio (mitoRatio), as illustrated in Figure 13A. Subsequently, we performed principal component analysis (PCA) incorporating cell cycle scores (G1/S/G2 M), as shown in Figure 13B. Through an unsupervised clustering analysis of the single-cell transcriptomic datasets and visualization via Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction, we identified 18 distinct cell clusters (Figure 13C), which were subsequently annotated into three primary lineages—astrocytes, monocytes, and neurons—using established canonical markers (Figure 13D). Following the annotation of cell types, we quantified the lineage-specific activity of BSCL2-associated genes. By analyzing a curated set of six BSCL2-related genes, we derived single-cell activity scores ranging from 0 to 0.3. This methodology facilitated precise comparisons of BSCL2-related gene activation across different neural lineages (Figure 13E). Subsequent analysis revealed a preferential enrichment of BSCL2-related genes in neurons and astrocyte, whereas Macrophage clusters exhibited relatively lower enrichment of these genes (Figure 13F).

Single-nucleus transcriptomics reveals lineage-specific enrichment of BSCL2-associated genes in AD neural subtypes. (A) Quality control of single-cell RNA sequencing data for AD and Contral samples. (B) PCA-based clustering using cell cycle scores (G1/S/G2 M). (C) UMAP diagram shows the distribution of 18 cell subsets. (D) UMAP visualization was employed for cell type annotation. (E) The distribution of the BSCL2 related genes in cell clusters. (F) Violin plots show the BSCL2-associated gene expression across neural cell types.
Cell-type-specific communication networks in AD
To investigate cell-type-specific communication and the variations in signal transduction across different cell types, we conducted heatmap analyses of outgoing and incoming signals utilizing the “CellChat” tool. The heatmaps elucidated distinct patterns of signal transduction for astrocytes, macrophages, and neurons. Specifically, astrocytes predominantly exhibited outgoing signals through NCAM, APP, CADM, LAMININ, SEMA6, PTN, JAM, MAG, and SEMA4, while their incoming signals were primarily mediated via NRXN, NRG, NCAM, CADM, SEMA6, VISTA, PTN, JAM, MAG, and SEMA4. Macrophages demonstrated outgoing signals involving TGFb, MPZ, VISTA, GALECTIN, CD39, and GAS, with incoming signals facilitated by APP, GALECTIN, CD39, MPZ, GAS, and TGFb. Neurons displayed outgoing signals through NRXN, NRG, NEGR, APP, CNTN, EPHA, and PTPRM, and incoming signals via NEGR, CNTN, EPHA, LAMININ, SEMA6, VISTA, and PTPRM (Figure 14A, B). These variations in signal transduction patterns underscore the intricate roles of astrocytes, macrophages, and neurons in AD.

Distinct ligand-receptor pathways influence astrocyte, macrophage, and neuron interaction. (A) The outgoing signaling patterns of annotated neural cell types. (B) The incoming signaling patterns of annotated neural cell types. (C) Dot plot presenting the distribution of distinctive signaling molecules in astrocytes, monocytes, and neurons.
Subsequently, we utilized dot plots to visualize intercellular communication patterns mediated by ligand-receptor pairs among neural cell types and conducted an analysis of their signal transduction characteristics under AD conditions. The analysis uncovered significant alterations in ligand-receptor signaling among various cell types, implicating several key pathways (Figure 14C). These findings underscore the complexity of intercellular communication networks in AD and offer novel insights into the understanding of the disease's pathological mechanisms.
Discussion
This study investigates the molecular mechanisms associated with the BSCL2 gene and its involvement in AD, with a particular emphasis on lipid metabolism and neuroinflammation. Previous research has indicated a potential link between BSCL2 and AD pathology, 9 suggesting its viability as a biomarker or therapeutic target. Through the application of bioinformatics, machine learning, and network analysis, this research identifies key genes and biomarkers pertinent to AD. The findings underscore BSCL2, AGPAT1, and EHD2 as potential diagnostic markers, while also elucidating molecular pathways implicated in the disease.
Through protein-protein interaction (PPI) and differential expression analyses, the research identifies overlapping genes—BSCL2, AGPAT1, and EHD2—that are involved in lipid metabolism and neuroinflammation.27–30 Subsequent research is necessary to validate these associations and evaluate their viability as therapeutic targets. Machine learning methods have become important tools for identifying diagnostic biomarkers of AD.14,31,32 In this research, we used algorithms such as LASSO, Random Forest, and XGBoost to identify BSCL2, AGPAT1, and EHD2 as potential diagnostic markers. We applied receiver operating characteristic (ROC) analyses and obtained promising area under the curve (AUC) values. These results suggest that the biomarkers could enable earlier diagnosis and intervention, potentially improving patient outcomes.
The study's investigation into the immune response profiles across various AD subtypes has revealed significant variations in immune cell infiltration. Importantly, the research highlights the pivotal roles of regulatory T cells and resting natural killer cells within distinct clusters. Recent literature has emphasized the involvement of the immune system in AD, with specific immune profiles being associated with disease progression and severity.33–35 These observed differences in the immune landscape have substantial implications for our understanding of AD pathogenesis, suggesting that modulating the immune response could serve as a promising therapeutic approach. By correlating immune cell profiles with specific genotypes, this research advances the development of targeted therapies aimed at enhancing immune responses and reducing neuroinflammation associated with AD. Specifically, our findings indicate a positive correlation between BSCL2 and effector memory CD4+ T cells, whereas AGPAT1 and EHD2 exhibit a negative correlation with these cells.
Pathway enrichment analysis conducted using GSEA and GSVA identified significant enrichment of specific pathways, such as oxidative phosphorylation, neuroactive ligand-receptor interactions, and the Notch signaling pathway. These pathways are pivotal for comprehending the biological processes altered in AD and may inform future therapeutic strategies. Recent studies have underscored the importance of pathway analyses in drug development and therapeutic targeting, emphasizing the need to integrate pathway information with clinical data to enhance our understanding of AD heterogeneity.36–39 By elucidating these pathways, novel targets for intervention may be identified, potentially altering the trajectory of AD progression.
In addition to pathway enrichment analysis, single-nucleus RNA sequencing (snRNA-seq) represents an innovative approach for investigating the cellular components implicated in AD.40,41 This technique facilitates a more precise characterization of gene expression patterns across diverse neural cell types.42,43 Consequently, distinct cell types, including astrocytes, monocytes, and neurons, have been identified, each contributing uniquely to the pathology of AD.44–46 Notably, the preferential expression of BSCL2-associated genes in neurons and astrocytes, as revealed by snRNA-seq, underscores the pivotal roles these cells play in the context of AD and suggests potential pathways for therapeutic intervention.
This study recognizes several limitations: the restricted sample size may compromise the generalizability of the results, and the absence of clinical validation impedes the applicability of the findings in clinical settings. In summary, this study has elucidated critical biomarkers and molecular pathways linked to AD, highlighting the potential of BSCL2, AGPAT1, and EHD2 as both diagnostic markers and therapeutic targets. The findings from this research enhance our understanding of AD pathogenesis. Future research endeavors should focus on experimentally validating these results and further investigating the identified molecular interactions.
Supplemental Material
sj-docx-1-alr-10.1177_25424823261423079 - Supplemental material for Exploring BSCL2 and associated genes in Alzheimer's disease by integrative analysis of bioinformatics, sn-RNAseq and machine learning approach
Supplemental material, sj-docx-1-alr-10.1177_25424823261423079 for Exploring BSCL2 and associated genes in Alzheimer's disease by integrative analysis of bioinformatics, sn-RNAseq and machine learning approach by Xiaoqiong An, Yijia Wang, Manni Cao, Zhenzhen Yi, Xiangguang Zeng, Wenfeng Yu and Zhenkui Ren in Journal of Alzheimer's Disease Reports
Footnotes
Acknowledgements
The authors particularly thank the participants and their contribution to this study.
Ethical considerations
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Author contribution(s)
Funding
This research was supported by grants from the National Natural Science Foundation of China (82160225), the Science and Technology Fund Project of Guizhou Health and Health Commission (grant no. gzwkj2021-356), the Basic Science Technology Project of Guizhou Province Qian Ke He Basic-MS [2025] 026, and the Special Project of Academic New Seedling Cultivation and Free Exploration Innovation-Post-project subsidy of the National Natural Science Foundation of China, “Thousand Levels” of Guizhou Province High level Innovative Talents (grant no.gzwjrs 2023-012), Key Advantageous Discipline ConstructionProject of Guizhou Provincial Health Commission in 2025.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
The datasets used during the present study are available from the corresponding author upon reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
