Abstract
BACKGROUND:
Lung cancer is one of the most common cancers worldwide, with the incidence increasing each year. It is crucial to improve the prognosis of patients who have lung cancer. Non-Small Cell Lung Cancer (NSCLC) accounts for the majority of lung cancer. Though its prognostic significance in NSCLC has not been often documented, Endoplasmic Reticulum (ER) stress has been identified to be implicated in tumour malignant behaviours and resistance to treatment.
OBJECTIVE:
This work aimed to develop a gene profile linked to ER stress that could be applied to predictive and risk assessment for non-small cell lung cancer.
METHODS:
Data from 1014 NSCLC patients were sourced from The Cancer Genome Atlas (TCGA) database, integrating clinical and Ribonucleic Acid (RNA) information. Diverse analytical techniques were utilized to identify ERS-associated genes associated with patients’ prognoses. These techniques included Kaplan-Meier analysis, univariate Cox regression, Least Absolute Shrinkage and Selection Operator regression analysis (LASSO) regression, and Pearson correlation analysis. Using a risk score model obtained from multivariate Cox analysis, a nomogram was created and validated to classify patients into high- and low-risk groups. The study employed the CIBERSORT algorithm and Single-Sample Gene Set Eenrichment Analysis (ssGSEA) to investigate the tumour immune microenvironment. We used the Genomics of Drug Sensitivity in Cancer (GDSC) database and R tools to identify medicines that could be responsive.
RESULTS:
Four genes – FABP5, C5AR1, CTSL, and LTA4H – were chosen to create the risk model. Overall Survival (OS) was considerably lower (
CONCLUSION:
Our study has produced a gene signature associated with ER stress that may be employed to forecast the prognosis and therapeutic response of non-small cell lung cancer patients.
Keywords
Introduction
Lung cancer is the most significant cause of cancer-related mortality worldwide among the most prevalent malignancies [1]. NSCLC is the most common kind of lung cancer, incorporating around 85% of cases [2]. Surgery is considered the mainstay of treatment for early-stage NSCLC, while some patients have recurrence within a few years [3]; for advanced NSCLC patients, chemotherapy, radiation, targeted medicines, and immunotherapy are indicated [2]. Personalized and adaptive strategies, such as genetic profiling, patient stratification, adaptive management, combination therapies, proactive side effect management, patient education, multidisciplinary teams, clinical trials, and psychosocial support, are essential for overseeing non-small cell lung cancer treatment plans. However, each patient’s reaction to these therapies is unique and unexpected. Furthermore, many patients have recurrence following standard treatment, burdening patients and society. Although current studies have focused on biomarkers related to NSCLC in clinical practice [4], which are used for monitoring and treatment, their reliability is controversial.
Despite the rapidly emerging new treatment methods for NSCLC in recent years, the overall 5-year survival rate is still low [5]. Scientists have raised their interest in tumour heterogeneity and believe it might correlate with resistance to cancer therapies [6]. Genetic variety, resistance development mechanisms, phenotypic plasticity, cell state transitions, mutation-driven resistance, epigenetic variability, and tumour microenvironment impact tumour heterogeneity in cancer treatment. Drug efflux, drug inactivation, changed drug targets, DNA repair, evasion of apoptosis, immunological evasion, and altered cell signaling are some of the processes that might result in the development of resistance to cancer therapies. Cancer is an evolving disease that gradually becomes more heterogeneous during progression [7]. Genetic mutations, epigenetic modifications, clonal development, and microenvironmental variables all contribute to tumour evolution, which generates a variety of cancer cell populations that affect the course of the illness and the effectiveness of treatment. Enhancing cancer therapy and patient survival requires tailored and flexible therapeutic approaches. The majority of the tumour may be made up of a diverse collection of cells with different molecular fingerprints and levels of treatment susceptibility as a result of this heterogeneity [8]. Heterogeneity lays the foundation for resistance to treatment; however, tumour heterogeneity is complex to investigate due to current technologies. A next-generation sequencing technique called single-cell sequencing offers the possibility of comprehending cellular variations and the function of a single cell in its surroundings [9]. Another cutting-edge technique that has a lot of promise for breaking down the intricate clonal structures of cancers is single-cell sequencing. Therefore, single-cell sequencing offers a way to understand tumours’ heterogeneity deeply.
With the investigation of the molecular mechanism of the tumours through rapidly developed technologies, scientists realize endoplasmic reticulum stress (ERS) might serve a role in the evolution of tumours. The ER is a large, central organelle involved in lipid metabolism, protein synthesis, and calcium storage, among other processes in the cell [10]. ERS is characterized as an activation of ER homeostasis disruption, encompassing the unfolded protein response and calcium perturbation [11]. Calcium perturbation is a critical component of endoplasmic reticulum (ER) stress that influences cell responses. Toxins and oxidative stress are two things that cause ER stress. Upon identifying ER stress, the Unfolded Protein Response (UPR) triggers corrective signalling pathways that improve protein degradation, restore calcium homeostasis, and enhance cell survival. Recent studies suggest that ERS might be necessary for the growth and metastasis of tumours [12].
Additionally, a connection has been shown between medication-induced apoptosis in lung cancer and ERS [13]. Nevertheless, only some single-cell sequence-based investigations have looked at the features of ERS-related genes and how useful they are for NSCLC patients. Managing data securely in medical cancer rehabilitation centres is vital, given the escalating mortality risks. IoT applications in healthcare, like sensors, present security challenges due to vulnerabilities in real-time data transmission. Deep Federated Collaborative Learning (DFCL) addresses these issues by preserving privacy while managing sensitive data. Recent studies underscore DFCL’s efficacy, demonstrating enhancements in accuracy (up to 19.8%), leading diagnoses (26%), and hospital dictionary analysis rates [14].
The problem statement and the main contribution of the paper are discussed as follows:
Lung cancer, a joint global disease, is gaining ground, particularly non-small cell lung cancer (NSCLC). Despite its increasing prevalence, endoplasmic reticulum stress has been linked to tumour malignancies and treatment resistance in NSCLC, emphasizing the need for a better prognosis. An ER stress-related gene profile has been developed as a result, and it may be utilized to estimate and forecast the risk of non-small cell lung cancer. Then, 1014 NSCLC patients’ clinical and RNA data were collected using the TCGA database. Several methods were employed to identify genes linked to ERS and prognosis, including Pearson correlation analysis, LASSO, Cox regression, and the Kaplan-Meier method. Multivariate Cox analysis was used to classify the patients in a developed and evaluated nomogram using the risk score model. Sensitive drug screening involved R tools and the Genomics of Drug Sensitivity in Cancer database. Using four genes – FABP5, C5AR1, CTSL, and LTA4H –, the study developed a risk prediction model for patients with non-small cell lung cancer. The model performed better than clinical factors in predicting outcomes and showed specific sensitivity to high-risk populations. This genetic signature can potentially forecast both the prognosis and outcomes of treatment.
This paper is organized as follows:
Section 2 details the methodology and the proposed The Cancer Genome Atlas-Lung Adenocarcinoma (TCGA-LUAD) and The Cancer Genome Atlas-Lung Squamous Cell Carcinoma (TCGA-LUSC) data for lung cancer prediction. Section 3 presents the results of our empirical analysis and discusses the findings. Section 4 addresses the practical implementation and integration challenges. Finally, Section 5 summarizes the key findings, conclusions, and recommendations.
Data sources and data processing
Using the accession IDs GSE117570 [15], GSE37745 [16], and GSE61676 [17], we ran a keyword search for NSCLC at the Gene Expression Omnibus (GEO) [18, 19, 20] (
Data set information
Data set information
In this work, we used the Seurat FindAllMarkers programme to identify marker genes distinctive to a subset. Using threshold parameters, multiple testing correction, pre-processing procedures, data subsetting, parallel processing, assay specification, slot usage, and other parameters, the Seurat FindAllMarkers function helps identify marker genes in single-cell RNA sequencing data. The FindAllMarkers function in Seurat is essential for locating differentially expressed marker genes in single-cell RNA sequencing data. This function also helps with disease diagnosis and treatment targeting by revealing functional differences and genes that drive specific biological processes. Moreover, the expression of subset-specific marker genes was visualized using the DotPlot and ViolinPlot tools. The DoHeatmap function was performed to illustrate each cell subset’s former 10 or 20 specific genes by heatmap.
AUCell: Analysis of ‘gene set’ activity in single-cell RNA-seq data
To score the cells with intersection genes of ERSR genes and differentially expressed genes (DEGs) of cell subsets, we utilized the AUCell_calcAUC algorithm of the AUCell package for the GSE117570 dataset [25]. One tool that grades cells according to gene set expression – such as DEGs and ERSR genes – is the AUCell_calcAUC algorithm. It quantitatively assesses gene set activity by calculating the Area Under the Curve (AUC) for each gene set inside individual cells. Cells with elevated ER stress may be identified using AUC score visualization. The high-scoring cell subsets were extracted for a subset analysis and generated with ten cell subsets.
Identification of differentially expressed genes (ERSRDEGs) associated with endoplasmic reticulum stress in TCGA
The ERSR gene expression profiling data from TCGA were analyzed to identify Endoplasmic Reticulum Stress-Related (ERSRDEGs). The study obtains RNA-seq data, normalizes expression, performs differential expression analysis, focuses on ER stress pathways, and validates the significance of the genes found to be differentially expressed about endoplasmic reticulum stress (ERSRDEGs) using data from The Cancer Genome Atlas (TCGA). Limma package in R was performed with the thresholds
Functional enrichment analysis
Using the Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathway enrichment assessment and Gene Ontology (GO) performance, ClusterProfiler software 4.1.1 was used to examine the association between genes of DEGs from high-scored cell subgroups and ERSRDEGs with
Construction of the prognosis model
Univariate analysis was performed using Cox logistic regression models to find genes linked to prognosis. Univariate analysis of survival studies requires the use of Cox logistic regression models, sometimes referred to as Cox proportional hazards models. They simplify interpretation, manage suppressed data, and offer dynamic risk evaluations. They are adaptable to different forms of survival data and quantify the effect size of individual predictor factors on survival outcomes. The glmnet tool was used to perform LASSO analysis [32], which allowed patients to be classified as high- or low-risk. Essential methods for comprehending gene expression and survival outcomes include LASSO Analysis with glmnet and Cox Logistic Regression Models. They measure gene effects, offer hazard ratios, and pinpoint prognostic indicators. LASSO Analysis with glmnet minimizes complexity in high-dimensional data, streamlines models, and guards against overfitting. The predictive model’s accuracy was evaluated using the survival Receiver Operating Characteristic (ROC) tool to determine the Area Under the Curve (AUC) while generating survival curves using the Kaplan-Meier technique [33]. Simultaneously, the GSE37745 dataset was utilized to ensure the depth of the predictive model. With its reliable predictive models, comprehensive clinical annotations, gene expression profiles, and biological insights, the GSE37745 dataset is an essential resource for cancer research. The reproducibility and trustworthiness of results are ensured by its enormous sample size, public accessibility, and thorough analysis. Using Cox regression to create a predictive model, assign points to predictors, and summaries and compute scores for each patient is known as a nomogram. Clinical factors like age, stage, and therapy account for personalized forecasts, and the entire score predicts clinical outcomes like survival likelihood.
Independence analysis of prognosis model and construction of nomogram
According to the univariate and multivariate analysis findings, a nomogram was created to examine the prognostic usefulness of ERSRDEGs. The nomogram, which included crucial clinical characteristics and calibration plots, was produced using the rms R tool [34]. Model correctness was assessed using calibration [35, 36]. A unique assessment technique called Decision Curve Analysis (DCA) examined the clinical net advantages of predictive models [36]. Additionally, we assessed the precision of the Cox regression models using DCA. While waiting, we evaluated the achievement of Cox regression techniques using ROC curves and AUC statistics. A modified ROC curve, demonstrating a high AUC (
Immune infiltration analysis
Using the CIBERSORT approach, we calculated the amount of 22 apoptotic cells conquering tumours in tissue samples [37]. Using CIBERSORT, a technique for gene expression analysis, the study intends to examine immune infiltration in NSCLC tumours. The process entails taking gene expression data from microarrays or RNA-seq, decomposing it into proportions of immune cell types, and analyzing the resultant to link it with ER stress-related gene expression patterns and clinical outcomes. A risk score model for patients with NSCLC was validated using the GSE37745 dataset. The list had twenty-two distinct cell types: NK cells resting, active, plasma cells, dendritic cells activated, M1 and M2 macrophages, dendritic cells resting, mast cells activated, eosinophils, neutrophils, and T cells at rest. T cells include gamma delta, CD8, CD4 memory, follicular helper, CD4 naive, and regulatory cells. The 22 cell types were shown with box plots to show differences between high- and low-risk groups. Additionally, the immune infiltration analysis of lung cancer was performed by single sample gene set enrichment analysis (ssGSEA) using GSVA package to investigate the infiltration of the 28 types of immune cells [38].
Drug sensitivity
We evaluated each patient’s medication sensitivity using the Genomics of Medication Sensitivity in Cancer (GDSC) database (
Analysis of the correlation between critical genes and reaction to immunotherapy
To investigate the correlation between the four essential genes (C5AR1, CTSL, FABP5, and LTA4H) and response to immunotherapy, we collected 79 immune checkpoint genes from current studies [41] (Table S2). Among the 79 immune checkpoint genes, 69 genes in the TCGA data collection were chosen. Applying the Spearman correlation, the relationship between the four essential genes and the 69 immunological checkpoint genes was investigated; a
Statistical analysis
The statistical analysis used R (v.4.1.1). The link between immune cells and genes was discovered using the Spearman coefficient. The Spearman coefficient is a non-parametric metric for determining the relationships between immune cells and genes. It works well with noisy and adaptable biological data, working with various data formats such as immune cell counts and gene expression levels. Consistent trends are identified, facilitating the interpretation of results, particularly with high-throughput data. Multiple hypothesis assessments were addressed using the Benjamini-Hochberg (BH) approach. A reliable technique for evaluating many hypotheses, increasing statistical power, lowering false discovery rates, and boosting result interpretation is the Benjamini-Hochberg (BH) methodology. It is appropriate for high-throughput studies and ensures that meaningful discoveries are not mostly false positives by balancing discovery and error control. It entails figuring out the goal, the setting, the number of hypotheses, the uniform distribution of
Flowchart of the study.
Cell cluster analysis of GSE117570 dataset. (A) t-SNE was employed to envisage the distribution of the 14 cell subsets, and different colours represented different cell subsets. (B) Violin plots of the top 5 DEGs of each cell subset. The vertical axis with different colours represented different cell subsets. The upper horizontal axis represented the number of each cell subset, and the lower horizontal axis represented the expression level of each gene. (C) Bubble Map of the top 5 DEGs of all cell subsets. The larger the bubble, the larger the percent of differentially expressed genes. Gene expression levels increase with darker blue hues. (D) A heatmap displaying each cell subset’s top ten DEGs. Gene expression levels increase with a deeper red colour. Gene expression levels decrease with increasing purple hue. (E) A heatmap displaying all cell subgroups’ top 20 DEGs. Gene expression levels increase with a deeper red colour. Gene expression levels decrease with increasing purple hue. (F) Bar plots of the cell subsets distribution of 4 patients. Different colours represented different cell subsets.
DEGs of cell subsets
As shown in Fig. 1, we initiated our work with the cell subsets analysis of the GSE117570 dataset by Seurat package and generated 14 cell subsets (Fig. 2A). According to the cell subsets analysis, the differentially expressed genes (DEGs) for cell subsets were found. We extracted the top 5 DEGs of each cell subset. We visualized them with violin plots (Fig. 2B) and BubbleMap (Fig. 2C) and then figured out that the top 5 DEGs were explicitly highly expressed in each cell subset. The Seurat method is one approach for determining the top 5 differentially expressed genes (DEGs) in single-cell RNA sequencing data. Using the FindClusters and FindAllMarkers functions, it is pre-processed, normalized, and scaled before being grouped according to gene expression profiles. Furthermore, we visualized the top 10 (Fig. 2D) and top 20 (Fig. 2E) DEGs with heatmaps and also discovered that the top 10 and top 20 DEGs were specifically highly expressed in each cell subset. We elucidated the result with bar plots to determine the distribution of each cell subset in patients (Fig. 2F). We found that the proportion of each cell subset varied from patient to patient, which indicated heterogeneity within patients.
Darker blue in data visualizations indicate increased gene expression levels resulting from technological and biological causes. Technical techniques include colour gradients and normalization; biological mechanisms include transcription factor activity, epigenetic changes, gene amplification, post-transcriptional regulation, and signalling pathways. The development of non-small cell lung cancer and resistance to existing treatments can be better understood by researching genes related to endoplasmic reticulum stress pathways. Key regulatory nodes and therapeutic targets can be identified by analyzing these pathways using gene expression analysis, functional assays, protein interaction studies, pathway analysis, cellular and molecular phenotyping, in vivo models, and clinical correlation.
Expression analysis of ERSRDEGs of cell subsets
After analyzing the intersection genes of ERSR genes and DEGs of cell subsets, we generated the Venn diagram (Fig. 3A) and obtained 111 genes within the intersection. We elucidated the 111 genes with violin plots (Fig. 3B) and BubbleMap (Fig. 3C) and figured out those genes mainly in cell subsets 1, 5, 6, 7, 10, 11 and 13. Targeted cancer therapy is now possible because the study identifies 111 genes between ERSR genes and DEGs. These genes are relevant to tumour biology and might be therapeutic targets for cancer detection and treatment. Based on the analysis of the 111 genes, the majority of them are positively correlated, while some of them are negatively correlated (Fig. 3D) (
Distribution and correlation analysis of intersection genes from the GSE117570 dataset. (A) Venn diagram of ERSRDEGs and cell subsets DEGs. (B) Violin plots of intersection genes. The vertical axis with different colours represented different cell subsets. The lower horizontal axis showed every gene’s expression level, while the upper horizontal axis represented the total number of each cell subgroup. (C) Bubble Map of intersection genes of cell subsets. The larger the bubble, the larger the percent of differentially expressed genes. Darker purple hues indicate higher gene expression levels. (D) Gene intersection heatmap. Red and blue represented positive and negative connections, while the cross inside the circle showed no significant link.
Enrichment analysis of high-score cell subsets and DEGs of high-score cell subsets from the GSE117570 dataset. (A) Diagram of 1064 high-score cells with AUC cutoff value as 0.1. (B) Bar plot of high-score cell subsets. (C) In BP, GO saturation evaluation of high-score cell subsets’ DEGs. (D) DEGs of high-score cell subsets in Cellular Components (CC) were subjected to GO enrichment assessment. (D) The MF’s DEGs of high-score cell subgroups were subjected to GO enrichment evaluation. (F) DEGs of high-score cell subsets were analyzed using the KEGG pathway.
Reanalysis of high-score cell subsets. (A) t-SNE was utilized to visualize the distribution of the ten cell subsets, and different colours represented different cell subsets. (B) Bubble Map of top 5 DEGs of each cell subset. The percentage of genes with differential expression increases with the size of the bubble. Gene expression levels increase with a richer red colour. (C) Violin plots of the top 5 DEGs of each cell subset. The vertical axis with different colours represented different cell subsets. The lower horizontal axis showed every gene’s expression level, while the upper horizontal axis represented the number of each cell subgroup. (D) The ERSRDEGs Bubble Map. More genes with differential expression are represented in the giant bubble. Darker green hues indicate higher gene expression levels. (E) Violin plots of ERSRDEGs. The vertical axis with different colours represented different cell subsets. The lower horizontal axis showed every gene’s expression level, while the upper horizontal axis represented the number of each cell subgroup.
We used the AUCell package to analyze the GSE117570 dataset to select the cell subsets with high-score DEGs. The cutoff value for AUC was set as 0.1. According to the AUCell_calcAUC algorithm, 1064 cells were chosen as high-score (Fig. 4A). The 1064 cells were mainly distributed in cell subsets 1, 5, 7, and 13 (Fig. 4B). The cluster profile software was utilized to perform functional enrichment analysis for GO and KEGG. The results of the GO enrichment study were visualized employing BubbleMaps, taking into account three factors: molecular function (MF) (Fig. 4E), cellular parts (CC) (Fig. 4D), as well as biological processes (BP) (Fig. 4C). We found that the regulation of high-score DEGs involved the presentation of peptide or polysaccharide antigens and their antigen processing MHC protein complex binding, MHC class II, vacuolar lumen, antigen processing and presentation, secretory granule lumen, antigen processing and presentation of peptide antigen, MHC class II protein complex binding, tertiary granule, and cysteine-type peptidase activity. In the KEGG enrichment study, the pathways related to rheumatoid arthritis and tuberculosis were where DEGs were most enriched.
Reanalysis of high-score cell subsets
We used the criteria for screening cell subsets with high-score DEGs to reanalyze all the cell subsets and then generated ten subsets (Fig. 5A). The top 5 DEGs were elucidated by BubbleMap (Fig. 5B) and violin plots (Fig. 5C), and all the DEGs were presented in Table 2. In addition, we visualized the ERSRDEGs in the selected 10-cell subsets with BubbleMap (Fig. 5D) and violin plots (Fig. 5E), and the ERSRDEGs were presented in Table 5. High-score DEGs and ERSRDEGs are all highly expressed in cell subset 7.
Differentially expressed genes in high-score cell subsets
Differentially expressed genes in high-score cell subsets
GO enrichment analysis
Differentiation of ERSRDEGs between normal and NSCLC samples. (A) ERSRDEGs’ volcano plot. 1028 showed an increase in regulation, whereas 1034 showed a decrease. The heatmap displaying the top 58 ERSRDEGs is shown in (B). (C) GO enrichment analysis in BP for ERSRDEGs. (D) ERSRDEGs CC’s GO enrichment analysis. (E) GO evaluation of enrichment in MF using ERSRDEGs. (F) KEGG pathway study of ERSRDEGs.
Risk score system established using TCGA data. (A) The forest plot for univariate Cox regression analysis. (B) depicts LASSO coefficient profiles for the four ERSRDEGs linked with survival. (C) Use cross-validation to determine the number of variables in the LASSO regression framework. (D) Kaplan-Meier survival curves for the different training group risk groups. (E) The training group’s ROC curve for the risk score model. (F) Kaplan-Meier survival curves for the different risk groups in the testing population. (G) The evaluation group’s ROC curve for the risk score framework.
Nomogram created and validated using TCGA dataset. An examination of univariate Cox regression using the forest plot (A). The multivariate Cox regression analysis’s forest plot is shown in (B). A nomogram’s calibration curve (C). The nomogram that uses TCGA data is (D). The nomogram’s (E–G) ROC curve with a 3-, 5-, or 10-year OS. (H–J) DCA of Nomogram with 3-, 5- or 10-year OS.
After analyzing the TCGA data with the R Limma programme, 2062 ERSRDEGs with expression differences between tumour and normal tissues were identified (Fig. 6A). ERSRDEGs may be found by analyzing TCGA data using the Limma tool. RNA-seq expression data from tumour and standard samples are obtained and normalized. Then, a design matrix is made, a linear model is fitted, Bayes moderation is used, and differential expressions are found. These included 1028 and 1034, which were upregulated and downregulated, accordingly. Then, we selected the intersection of the 2062 ERSRDEGs and DEGs generated from high-score cell subsets reanalysis and got 58 genes. Those genes were shown in a heatmap (Fig. 6B). For the 58 genes, we performed GO enrichment analysis, and the results showed 15 enriched GO keywords (Table 3). The most enriched categories in the BP category were reaction to lipopolysaccharide, response to bacterial molecules, and positive regulation of soft tissue cell growth (Fig. 6C). Categorized by “cellular component” observed, they were mainly associated with specific granules, secretory granule membranes, and collagen-containing extracellular matrix.
KEGG enrichment analysis
KEGG enrichment analysis
Differentially expressed endoplasmic reticulum-related genes information
Furthermore, the “molecular function” category identified GO keywords mostly in transcription factor binding, fatty acid binding, and DNA binding unique to RNA polymerase II. According to KEGG enrichment, the TNF and IL-17 signalling pathways were the most highly enriched (Table 4). According to the KEGG enrichment analysis, TNF and IL-17 signalling pathways are significantly enriched when genes of interest are over-represented. These pathways are linked to autoimmune illnesses, cancer, infections, and inflammatory responses in conditions including psoriasis and rheumatoid arthritis.
Four genes were filtered using univariate Cox regression to calculate the risk score (Fig. 7A). The risk score model (Fig. 7B and C) was developed using Lasso-Cox analysis. Univariate Cox Regression and Lasso-Cox Analysis are steps in the process of creating a risk score model. Initially, genes linked to survival are screened for, significant genes are chosen, and relevant genes are chosen using LASSO with cross-validation. The chosen genes are then used to fit the final model. The formula is “Risk score
Construction and validation of the nomogram
We used univariate (Fig. 8A) and multivariate Cox regression (Fig. 8B) to examine the independence of the four discovered genes as prognostic indicators for NSCLC. A simple way to discuss how each gene affects survival is to use univariate Cox regression, although this technique may overstate results because it needs to account for other factors. In contrast, multivariate Cox Regression accounts for multicollinearity and confounding by simultaneously analyzing several genes. The results showed a relationship between the outcome of NSCLC patients and the risk score, N stage, and T stage (Table 6). A nomogram was generated using the risk score, N stage, and T stage to estimate the patients’ 3-, 5-, and 10-year OS. Patients’ 3-, 5-, and 10-year OS could be predicted using the nomogram, according to Fig. 8E–G’s ROC curve and Fig. 8H–J’s AUC. The calibration curves also supported the nomogram’s efficacy in prognosticating NSCLC patients (Fig. 8C). Calibration curves are necessary to evaluate how well nomograms predict patients with non-small cell lung cancer. They show that there is good calibration when they visually compare projected probability with actual outcomes. To ensure precision, deviations from the 45-degree line are measured. Models are validated for internal and external validation, biases are found, and calibration curves guide modifications.
Univariate and multivariate cox analysis
Univariate and multivariate cox analysis
We employed ssGSEA and CIBERSORT to evaluate each sample’s immune cell enrichment levels. According to CIBERSORT profiling, the proportion of T cells with gamma delta capabilities was the only variable showing significant variation (
Furthermore, as illustrated in Fig. 9B, the ssGSEA data revealed significant differences (
Employing ssGSEA and CIBERSORT to analyze immune cell infiltrates. (A) Box plot of 22 immune cells’ distribution (CIBERSORT analysis). The colour purple symbolized high-risk groups, while the colour pink represented low-risk groups. (B) A box plot was utilized to display the distribution of 28 immune cells’ spatial organization, with the analysis being done using ssGSEA. (C) Correlation analysis of the 22 immune cells in NSCLC. The purple color represented positive correlation; the green color represented negative correlation; the cross in the circle represented no significant correlation. (D) Correlation study of NSCLC immune cells, totalling 28. The purple color represented positive correlation; the green color represented negative correlation; the cross in the circle represented no significant correlation. (E) Heatmap of the correlation of the four selected genes and 22 immune cells. The pink color represented positive correlation; the purple color represented negative correlation. (F) Heatmap showing how the four chosen genes and the 28 immune cells are correlated. The pink color represented positive correlation; the purple color represented negative correlation. *: 
Drug sensitivity in high- and low-risk populations as determined by TCGA data. (A–L) Distribution plots of IC50 of 12 drugs with significant differences between high and low-risk groups. The blue colour represented high-risk groups, and the red colour represented low-risk groups.
The correlation analysis of the four key genes and reaction to immunotherapy based on the GSE117570 dataset. (A) The heatmap of the four essential genes and 69 immune checkpoint genes. Blue represents a negative association, while red represents a positive one. (B) Bar plots of the expression level of C5AR1 gene and reaction to immunotherapy. (C) Bar graphs showing the CTSL gene expression level and immunotherapy response. (D) Bar charts showing the FABP5 gene’s expression level and response to immunotherapy. (E) Bar graphs showing the LTA4H gene expression level and immunotherapy response. *: 
Analysis of the correlation between critical genes and reaction to immunotherapy
As per the findings of the correlation study, there was a substantial association (
Discussion
Owing to the ageing and expanding global population, cancer ranks among the leading causes of mortality [42]. Lung cancer remains the leading cause of death related to cancer [43]. As more targeted medications and immunotherapies have been developed, lung cancer patients have more treatment options, and their OS and quality of life have somewhat improved [44, 45]. However, the long-term survival of NSCLC patients is still poor [5], which leads to a deeper investigation of tumorigenesis and tumour progression that will benefit those patients. Although long-term use of targeted drugs and immunotherapies may result in toxicities and autoimmune disorders, they can increase the quality of life and survival rates for lung cancer survivors. Implications for long-term survivability include patient education, monitoring, supportive care, and research into novel treatments.
Different kinds of cells, including tumour, stromal, immune, and fibroblast cells, are controlled in tumours and closely interact with and impact one another. It takes a multifaceted approach to understand the relationships between tumour cells, stromal cells, immune cells, and fibroblasts [46]. This includes functional assays to detect the impact of ER stress, multiplex imaging, single-cell RNA-seq, and spatial transcriptomics. To ensure equal access, protect sensitive data, prevent genetic discrimination, and conduct ethical clinical trials, personalized medicine for tumour cells needs to take ethical factors like equity, privacy, data security, informed consent, genetic discrimination, clinical validity, psychological impact, autonomy, and ethical research practices into account. Therefore, it is essential to examine the cellular makeup of the tumour tissue cells and their interactions. Analyzing the cellular composition and interactions in tumour tissues is necessary to comprehend the tumour’s course and the therapy’s efficacy. It assists in identifying essential participants in the Tumour MicroEnvironment (TME), exposing novel targets for treatment and methods for adjusting immune responses. To predict prognosis and guide therapy, we built a 4-gene risk model and found ERSRDEGs between tumour and standard samples in our study. A 4-gene risk model for NSCLC becomes more accurate and predictive when ERSRDEGs are included. To improve patient classification and treatment plans, the model identifies high-risk patients with significant connections with outcomes and incorporates multifactorial analysis for nuanced risk forecasts. Gene signatures are now helpful for prognosis prediction and clinical decision support. There has been little research on ER stress-related predictive risk models for NSCLC, but there have been several on hepatocellular carcinoma [47], oesophagal cancer [48], as well as diffuse glioma [49]. The risk signature conducted in our present study was validated to present considerable prognostic accuracy and feasibility.
Furthermore, a nomogram was developed to predict the likelihood of a 1-, 3-, and 5-year survival for non-small cell lung cancer individuals. Risk ratings and clinical characteristics made up this nomogram. Additionally, we looked into how various risk groups’ immune systems infiltrated, providing guidance for immunotherapy in NSCLC. Additionally, we examined drug sensitivity and assessed prospective medications for use in NSCLC patients according to various risk categories. For NSCLC patients, our identified signature may be a unique biomarker for individualized care and the best possible follow-up.
Four ERSRDEGs – FABP5, C5AR1, CTSL, and LTA4H – were found in our study using LASSO Cox regression analysis. FABP5 may have a role in lipid homeostasis and tumour immunology, as evidenced by prior studies that have connected it to various tumours’ growth, metastasis, and proliferation [50, 51, 52]. Yang et al. found that FABP5 regulated the maturation of natural killer cells in the lung, which might control lung cancer metastasis [53]. Furthermore, FABP5 has been suggested as a possible therapeutic target for hepatocellular carcinoma as well as breast cancer [54, 55]. C5AR1 was found to be highly expressed in NSCLC patients [56]. C5AR1 was involved in the non-inflammatory tumour microenvironment and was reported to be related to immune evasion in gastric cancer [57], which might explain our finding regarding the reaction to immunotherapy. According to reports, glioma invasion and proliferation are inhibited by the downregulation of CTSL, which may present a unique approach to glioma treatment [58].
Similarly, our research suggests that CTSL may affect the prognosis of NSCLC. According to a recent study, LTA4H overexpression in laryngeal squamous cell carcinoma impacts the tumour’s growth, migration, and metastasis. LTA4H is also considered an exciting cancer therapy target [59]. All of FABP5’s and LTA4H’s HRs were less than 1, and our multivariate Cox regression demonstrated that their high expression levels were linked to better survival, indicating that they may be used to treat non-small cell lung cancer. Their precise involvement in NSCLC still needs to be determined by further research.
Molecularly classified tailored immunotherapy has been effective in treating NSCLC. A significant development in treating NSCLC is individualized immunotherapy, which offers individualized care based on the tumour’s molecular profile. Higher response rates and longer overall lifespan are possible with this strategy’s advantages, which include improved clinical outcomes, combination medicines, less toxicity, and greater effectiveness. This investigation examined the variations in 22 tumour-infiltrating immune cells among the risk categories. The more abundant of T cells, namely gamma delta cells, in the high-risk group may have contributed to their poorer OS despite contradictory findings from previous research [60]. Moreover, macrophages were present in the high-risk group and may be associated with unfavorable OS. As a result, a higher status of immune cells invading the tumour might indicate a poorer outcome.
Moreover, tumour-infiltrating immune cells could also predict the response to immunotherapy. Our risk profile may indirectly indicate the efficacy of immunotherapy. However, it is yet unclear how ER stress affects the anti-tumor immune response in NSCLC.
We also investigated potentially sensitive anti-NSCLC drugs. Notably, we excluded numerous small molecule medications responsive to the high-risk population. Some of the drugs, such as DMOG, KIN001.135, Nutlin.3a, and PD.0332991, were under investigation for druggability. Several medications have received approval for the treatment of certain tumours. Prostate cancer [61], recurrent ovarian epithelial carcinoma, fallopian tube, and primary peritoneal cancer [62] were among the indications for which AG.014699 was authorized. For some types of lymphomas, such as marginal zone lymphoma, follicular lymphoma, and diffuse large B-cell lymphoma [63], lenalidomide was authorized [64]. To treat kidney cancer [65], pancreatic neuroendocrine tumours [66], and gastrointestinal stromal tumours, sunitinib was authorized. Trials for other medications were being conducted. Verifying the synergistic impact of combination therapy techniques in non-small cell lung cancer requires more practice and clinical studies.
Our present study has several gaps. First, we needed to find a way to compile data on more precise clinical factors, such as tumour biomarkers and treatment modalities. Conducting extensive multicenter cohorts and obtaining outside verification of the signature in additional databases will be essential. In addition, due to the experimental tools for detecting and monitoring ER stress responses not being extensively utilized in cancer research [67], more functional experiments are essential to acknowledge the future interaction between the tumour immune microenvironment and ER stress.
Conclusion
In summary, by utilizing a well-validated nomogram based on the four ERS-related genes, NSCLC patients could more accurately predict their prognosis. Our findings also pointed out exciting references for NSCLC immunotherapy and drug treatment. In future research, we will use clinical samples and in vitro molecular investigations to better explore its predictive usefulness and the underlying mechanism.
Funding
There is no specific funding for this research.
Consent for publication
All authors have checked the outcomes, endorsed the final draft of the manuscript, and consented to its publication.
Data availability
The findings of this study can be supported by experimental data provided by the corresponding author upon request.
Supplementary data
The supplementary files are available to download from https://dx-doi-org.web.bisu.edu.cn/10.3233/THC-241059.
Footnotes
Acknowledgments
The authors would like to show sincere thanks to those techniques who have contributed to this research.
Conflict of interest
The authors stated that they have no competing interests related to this study.
