Abstract
Objective
A frequent post-translational alteration of proteins called glycosylation has been strongly linked to the development and progression of cancer. Targeting glycosylation may improve cancer treatment outcomes. This study intended to investigate relationship and prognostic significance of glycosylation-related gene set features with colon cancer survival, immunity, drug sensitivity, etc.
Methods
524 colon cancer patients were included from TCGA database as training cohort. GSE29621 was external validation cohort. Univariate analysis, LASSO, multivariate regression analysis, K-M survival curve, and ROC curve analysis were used to construct and validate a glycosylation-related gene-based riskscore prognostic model. The CIBERSORT method and TIDE algorithm were utilized to analyze and evaluate differences in immune levels between high- and low-risk groups and their response to immunotherapy. Based on the DSigDB database, potential drugs with potential targeting effects on the prognostic model genes were predicted.
Results
According to a series of regression analyses, we constructed a prognostic model with six glycosylation genes. The model showed favorable prognostic prediction ability in both training and validation sets. Relevant to high-risk group, low-risk group presented better survival rates, higher immune cell infiltration levels, lower TIDE scores, and a higher proportion of patients with potential response to immunotherapy. In addition, potential anti-tumor drugs such as 67526-95-8, verteportin, uracil, Pemetrexed disodium, and fisetin were screened through the DSigDB database.
Conclusion
In summary, a validated prognostic model for colon cancer was constructed with glycosylation genes. The model could act as an independent prognostic factor for colon cancer.
Introduction
One of the most frequent gastrointestinal malignant tumors, colon cancer claims many lives each year. 1 According to recent studies, there are up to 106,970 new cases of colon cancer every year. 2 Despite major advancements in its detection and treatment, the incidence and fatality rates for colon cancer remain high and steadily growing due to various complex reasons, such as high tumor heterogeneity, fewer effective treatment targets, and late diagnosis and screening.3–5 Many studies have shown that protein post-translational modifications are linked to cancer development, such as colon cancer, 6 pancreatic cancer, 7 and breast cancer. 8 Although treatments targeting post-translational modifications in cancer have been confirmed in breast cancer, 8 role of post-translational modifications in colon cancer is not yet fully understood. Therefore, further exploration of the relationship between post-translational modifications and colon cancer may help to identify potential prognostic indicators and guide clinical treatment research for colon cancer.
Glycosylation is a frequent and significant post-translational alteration of proteins. 9 In eukaryotes, most glycosylation processes occur under the action of glycosyltransferases or glycosidases, specifically by connecting or attaching polysaccharide structures (the key component of glycoconjugates) to proteins and lipids to form glycoconjugates, which play roles in structure support, protein and lipid modification, regulation of physical and biological properties and functions of proteins.10–13 Glycosylation can be classified into two categories based on types of glycosidic bonds: N- and O-linked. 14 The emergence and progression of cancers are often strongly correlated with aberrant elevations in branching N-glycans or truncated O-glycans (Tn and sialyl Tn antigens), as well as improper sialylation and fucosylation.10,15 For example, existing studies have found that frequent changes in N-glycan and O-glycan branching in cancer cells lead to overexpression of β1,6-branched N-glycans, which is resulted from enhanced β1,6-N-acetylglucosaminyltransferase activity modulated by RAS/RAF/MAPK signaling in cancer. 16 High level of glycosylation-related genes is linked to higher mortality rates in hepatocellular carcinoma patients. 14 Moreover, glycosylation also has a regulatory role in immune evasion, changes in immune checkpoint molecule structure and function, and other aspects.17,18 Huang et al. 19 found that highly glycosylated B7H3 abnormal expression promotes protein stability and immune suppression, thus driving immune evasion of triple-negative breast cancer cells. Lowering the level of B7H3 glycosylation may improve the patient's anti-tumor immune response. Similar investigations reported that targeting glycosylation of immune checkpoint B7-H4 can improve the immunogenicity of cancer cells, enhance the phagocytic function of dendritic cells, and induce the production of IFNγ by CD8+ T cells, thereby effectively reducing tumor growth. 20 In particular, recent studies in tumor immunotherapy have found that the glycosylation of asparagine 58 promotes the binding of the anti-PD-1 monoclonal antibody pembrolizumab to PD-1, represses binding of PD-1 to PD-L1, and thus inhibits tumor immune evasion. 21 These findings suggest that further research on glycosylation in tumors establishes a more comprehensive tumor modulatory network, identify more specific early diagnostic biomarkers, and develop new treatment targets.
We utilized public database data to assess glycosylation-related genes in colon cancer, and established and validated a new prognostic model. We analyzed differences in immune microenvironment, tumor mutations, and drug sensitivity between different risk groups. The findings of this work may offer fresh perspectives on the function of glycosylation in colon cancer, as well as fresh suggestions for prognostic prediction and the validation of novel therapy targets.
Materials and methods
Acquisition of publicly available gene expression and clinical data
We included gene expression profiling and clinical data of 524 colon cancer patients from The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) as training set. We obtained an additional 65 colon cancer patient data from GEO database (https://www.ncbi.nlm.nih.gov/) (Accession number: GSE29621) as an external validation set. Based on previous studies, 14 we extracted 184 glycosylation-related gene sets from MSigDB (http://www.broadinstitute.org/gsea/index.jsp). The data was collected in October 2023.
Establishment and verification of a glycosylation-related prognostic model in colon cancer
Patients who were included in the TCGA database and followed for more than 30 days were used to construct a prognostic risk model. In order to screen genes associated with prognosis in colon cancer, univariate Cox regression analysis was used to identify survival genes in glycosylation-related genes by survival package (P < 0.05 was used as the screening cut-off value). Then, we further narrowed down the range of survival-related metabolic genes using LASSO penalty method and adjusted the optimal parameter λ through 10-fold cross-validation. We calculated the optimal cutoff value using the “survminer” package. After selecting six genes, we performed multivariate Cox regression analysis and utilized the following formula to build a new riskscore model:
ROC curve analysis 23 can be used to select the correct model, rule out a runner up model, or determine the optimal threshold within the same model, providing a complete description of the continuous variation representation of sensitivity and specificity. The closer the area under the curve (AUC) is to 1, the stronger the diagnostic impact. We used time-varying receiver characteristic ROC curves drawn by the timeROC R package to evaluate the sensitivity and specificity of the training set and external validation cohort. 24 We calculated proportion of various clinical factors (such as gender, age groups, tumor stage, T, N, and M stages) in different risk groups.
Construction and evaluation of a colon cancer nomogram
We utilized univariate and multivariate Cox regression analyses to assess independent prognostic value of riskscore. Then, the clinical factors identified by the prognostic model and multivariate regression analysis were integrated, and the prognostic nomogram was constructed using the “rms” R package. Calibration curves were generated to evaluate deviations between the nomogram and the ideal model. The “timeROC” R package was used to generate the ROC curve and calculate the AUC value to assess the prediction accuracy of age, TNM staging, and established models. We also plotted decision curve analysis (DCA) curves for 1-, 3-, and 5-year periods to display probability threshold range and clinical net benefit calculated by the predictive model, thereby evaluating the clinical benefit of the model.
Enrichment analysis
We utilized “edgeR” package to calculate differentially expressed genes (DEGs) (FDR < 0.05 and |log(FC)| > 1) between high-risk and low-risk groups. We conducted KEGG and GO enrichment analyses of genes in the PPI network using the “clusterProfiler” package to identify relevant functional pathways between high-risk and low-risk groups. “enrichplot” R package was for visualization. P < 0.05 were regarded as statistically significant for GO terms and KEGG pathways.
Immune cell infiltration analysis in tumors
CIBERSORT is a tool for deconvolution of human immune cell subtype expression matrix based on linear support vector regression principle for the abundance of tumor-infiltrating immune cells (TIIC). 25 Using CIBERSORT method, we compared differences in immune infiltration levels of colon cancer samples between different risk groups. We implemented TIDE algorithm to assess possibility of immunotherapy in high-risk and low-risk group patients and computed proportion of patients with immune response.
Tumor mutation analysis
We analyzed and plotted mutation status of high- and low-risk groups based on SNV mutation data for colon cancer with “maftools” package. We compared similarities and differences in mutation types, SNV classes, and mutation rates between high- and low-risk groups and selected top ten genes with the highest mutation rates.
Protein-drug interaction analysis
DSigDB is a free online resource containing 17,389 drugs and 19,531 genes. 26 Based on the Enrichr online gene set enrichment tool in DSigDB, we screened potential drugs that significantly interacted with model genes.
Results
Establishment and verification of riskscore model
We included 524 colon cancer samples from TCGA and obtained 184 glycosylation-related genes from MSigDB database (Supplementary Table 1). To develop robust risk features for clinical research, we used univariate Cox regression analysis to identify 17 survival-related genes from the 184 glycosylation-related genes, with a threshold of P < 0.05 (Supplementary Table 2). Next, we used the glmnet package to eliminate multicollinearity with LASSO penalty regression (Figure 1(a) and (b)). The candidate genes obtained from the LASSO analysis were utilized to construct a multivariate regression model using the survival package. Finally, we identified six genes with non-zero coefficients, among which MOGS and POFUT2 were prognostic risk factors, while DDOST, PMM2, RPN2, and B3GNT6 were prognostic protective factors (Figure 1(c)).

(a and b): Display of coefficient distribution (a) and partial likelihood deviation distribution (b) in LASSO analysis. (c): Multivariate regression shows model genes and their significance. *: P < 0.05. **: P < 0.01.
The formula

(a): Survival outcomes and corresponding score values for each sample in the TCGA training cohort. (b): Survival outcomes and corresponding score values for each sample in the GEO validation cohort. (c): Expression level statistics of model genes in each risk group of the training cohort. (d): Expression level statistics of model genes in each risk group of the validation cohort. (e): Survival curve for the training set. (f): Survival curve for the validation set. (g): ROC curve displaying 1-year, 3-year, and 5-year AUC values in the training cohort. (h): ROC curve displaying 1-year, 3-year, and 5-year AUC values in the validation cohort.
Age, gender, tumor stage, and T, N, M stages are important clinical factors that reflect the characteristics of cancer patients. No detectable differences were seen in age and gender composition between two risk groups of colon cancer patients, but there were notable differences in proportion of tumor stage and T, N, M stages. For example, the analysis of T-stage disclosed that the number of patients with T1, T2, and T3 stage in high-risk group was higher, while the number of patients with T1 stage was lower; similarly, similar events were noted in analysis of N-stage and M-stage (Figure 3(a)). In addition, in univariate and multivariate analyses of patients in various risk groups, we compared riskscore with known clinical prognostic indicators for cancer patients (i.e., age, gender, tumor stage, and T, N, M stages) to explore factors with independent prognostic value. In the univariate analysis, N-stage, M-stage, tumor stage, and riskscore were independent (Figure 3(b)). In multivariate analysis, age, tumor stage, and riskscore were independent (Figure 3(c)). Therefore, based on these two results, we believed that tumor stage and riskscore were independent prognostic factors that affected colon cancer patients. On the basis of regression coefficients of each influencing factor in the multivariate analysis, we established a nomogram that included riskscore, prognostic clinical characteristics, and total score (Figure 3(d)). Calibration curve illustrated that compared with ideal model, our constructed nomogram could better forecast 1, 3, and 5-year OS of colon cancer patients (Figure 3(e)). Furthermore, ROC curve containing clinical features, RiskScore, and the nomogram depicted that AUC was the highest, indicating that the nomogram we constructed had a better predictive effect on prognosis of colon cancer patients (Figure 3(f)). This was validated in the DCA curve (Figure 3(g)).

(a): relative proportion of clinical features in the two risk groups. (b): Univariate regression analysis of the independence of riskscore and each clinical factor in predicting patient prognosis. (c): Multivariate regression analysis of the independence of riskscore and each clinical factor in predicting patient prognosis. (d): Nomogram of various factors based on the independence test, which can predict patient 1/3/5-year survival rates. (e): Calibration curve shows the predictive ability of the nomogram. (f): ROC curve including clinical features, RiskScore, and nomogram. (g): DCA curve of 1/3/5-year survival rates.
To discover potential functional differences between groups, we first used edgeR package to compute DEGs between two groups (Supplementary Table 3). Subsequently, we utilized clusterProfiler software package to perform GO and KEGG to elucidate biological processes and signaling pathways that DEGs may be involved in. GO analysis presented that DEGs between groups were significantly associated with GO terms such as nucleosome assembly, DNA replication-dependent chromatin assembly, keratinization, and protein-DNA complex assembly (Figure 4(a)). We also found that DEGs were significantly associated with signaling pathways such as neutrophil extracellular trap formation, neuroactive ligand-receptor interaction, and viral carcinogenesis (Figure 4(b)). Overall, the activation or inhibition of these biological processes and signaling pathways may be the potential reasons for different survival rates among different risk groups.

(a): GO enrichment analysis explores the biological processes that DEGs in different risk groups may regulate. (b): KEGG enrichment analysis explores the signaling pathways that DEGs in different risk groups may participate in.

(a): Exploration of immune cell infiltration levels between high-risk and low-risk groups based on the CIBERSORT method. (b): Violin plot of TIDE score results between high-risk and low-risk groups. (c): Statistics of immune therapy response in TIDE between high-risk and low-risk groups. *: P < 0.05. **: P < 0.01. ***: P < 0.001.
To quantify differences in tumor immune microenvironment of each sample, we used CIBERSORT method to assess immune cell infiltration levels of patients in two groups. Notably, except for resting NK cells, resting memory CD4 T cells, activated memory CD4 T cells, resting dendritic cells, eosinophils, and neutrophils all had higher infiltration in low-risk group (Figure 5(a)). To predict likelihood of an immune therapy response, TIDE algorithm was used to reveal the TIDE scores for patients. Combined with TIDE algorithm analysis, we disclosed that low-risk group had a lower TIDE score compared to high-risk group (Figure 5(b)). Additionally, the statistical results of the number of immune response patients presented that low-risk group had a higher proportion of potential immune therapy responsive patients (Figure 5(c)). These findings implicated that immune levels of low-risk group patients were higher and may be more likely to benefit from immune therapy.
Comparison of mutation status
Analysis of mutation status in different groups revealed that missense mutations had the highest proportion in both groups. Secondly, we noted that SNPs had the highest proportion in both risk groups in the variant type, with C > T and C > A being the main SNP mutation types detected. The stacked bar charts showed the top ten mutation genes, with TTN, APC, and MUC16 having the highest mutation rates in high-risk group, and TTN, APC, and SYNE1 in low-risk group (Figure 6(a) and (b)). Taken together, these findings suggested that gene mutations occurred commonly in both groups, and these mutations may be important influencing factors in the development of colon cancer.

(a): Summary of mutation status in the high-risk group. (b): Summary of mutation status in the low-risk group.
To dissect clinical application of prognostic genes, we used DSigDB to investigate relationship between prognostic genes and drug sensitivity. Using the model genes as targets, we conducted drug target enrichment analysis on Enrichr (https://maayanlab.cloud/Enrichr/), based on DSigDB. Top five drugs ranked by P-value were: 67526-95-8 (thapsigargin), verteporfin, uracil, pemetrexed disodium, and fisetin (Figure 7(a)). Model gene and drug target analysis presented that RPN2 was enriched in multiple drug terms simultaneously (Figure 7(b)), suggesting that RPN2 may be a likely therapeutic target for multiple drugs.

(a): Correlation between model genes and drugs. (b): Statistics of targeted drug term enrichment for model genes.
Existing studies have shown that glycosylation is linked to occurrence, development, and immune therapy of colon cancer.27–30 We first utilized univariate regression analysis to screen a set of glycosylation genes significantly associated with colon cancer OS from TCGA training cohort. Then, we constructed a prognostic prediction model with LASSO and multivariate Cox regression. In addition, we validated prognostic predictive property of the model in TCGA training and GSE29621 validation cohorts. Furthermore, we assessed relationship between riskscore and clinicopathological features, immune infiltration, and immune cells. We discussed the independence of age, tumor stage, and riskscore as prognostic indicators for colon cancer.
With the expansion and improvement of databases and multi-omics sequencing techniques, more and more cancer prognostic models are built. Luo et al. 31 constructed an 8-gene prognostic risk model by analyzing expression profiles of RNA splicing regulatory genes in colon cancer and validated the effect of RNA splicing factor AKAP8L on colon cancer proliferation and migration. Huang et al. 32 reported a riskscore model based on prognostic co-stimulatory factor-related features, which reliably forecasts response rate of colon cancer patients to immune therapy. In this study, based on glycosylation genes, we established a 6-gene riskscore model to predict prognostic survival rate, response to immune therapy, and sensitivity to drug therapy of colon cancer patients in different risk groups. It is noteworthy that all six genes are pivotal in cancer. DDOST is linked to prognosis, immune microenvironment, and immune cell infiltration of hepatocellular carcinoma and glioma.33,34 Shapanis et al. 35 verified through proteomics analysis, immunohistochemistry, and mass spectrometry that increased DDOST is implicated in shortened metastasis time and poor prognosis of cutaneous squamous cell carcinoma. MOGS is a gene encoding mannosyl-oligosaccharide glucosidase, which is expressed in endoplasmic reticulum, participates in N-glycosylation, and is significantly upregulated in colon adenocarcinoma.36,37 POFUT2 is a fucosyltransferase responsible for transferring fucose to serine or threonine.38,39 Wang et al. 40 showed that POFUT2 can independently forecast prognosis of colon adenocarcinoma patients. In renal cell carcinoma, the high expression of PMM2 is considered to be significantly associated with poor prognosis. 41 RPN2 is correlated with the malignant phenotype of various tumors and in bladder cancer, it facilitates growth and metastasis of cancer cells by mediating PI3K-AKT pathway. 42 In colorectal cancer, the downregulation of B3GNT6 expression has been identified as a potential predictive indicator of poor outcomes in patients. 43 By reviewing previous studies, we emphasized the roles of MOGS, POFUT2, DDOST, PMM2, RPN2, and B3GNT6 in cancer. Our results suggested that MOGS and POFUT2 were prognostic risk factors, while DDOST, PMM2, RPN2, and B3GNT6 were prognostic protective factors. In summary, these results indicated that MOGS, POFUT2, DDOST, PMM2, RPN2, and B3GNT6 were pivotal in prognostic prediction of colon cancer.
Glycosylation is crucial in cell differentiation, tumor progression, and immune regulation in malignant tumors.44,45 Abnormal glycosylation can produce antigens, which can be important targets for T cells in tumors. 46 Therefore, targeting glycosylation may provide new treatment opportunities for cancer patients. We used CIBERSORT method and TIDE algorithm to compare and evaluate differences in immune infiltration and likelihood of immune therapy response and calculated proportion of patients with immune response. Patients in low-risk group had better prognosis, and infiltration degree of T cells (resting memory CD4 T cells and activated memory CD4 T cells) in tumors was higher. In addition, we found that compared with the high-risk group, low-risk patients had lower TIDE scores, and a higher proportion of patients who may respond to immune therapy. These findings suggested that patients in low-risk group had higher immune levels and may be more likely to benefit from immune therapy.
In addition to predicting immune therapy response, this study used the DSigDB database to predict potential new drugs that may have targeted effects on the prognostic model genes. 67526-95-8, verteporfin, uracil, Pemetrexed disodium, and fisetin were the top five drugs ranked by P-values predicted by the DSigDB database. 67526-95-8, also known as thapsigargin, is a widely used myocyte endoplasmic reticulum Ca2+-ATPase inhibitor, which facilitates migration of colorectal cancer by upregulating lncRNA malat1.47,48 Verteporfin is a small molecule compound that can inhibit the interaction between YAP-TEAD in tumors; in the tumor microenvironment, verteporfin inhibits PD-L1, which is closely related to enhanced T lymphocyte infiltration.49–51 Pemetrexed disodium, a folate antagonist, can be used to treat non-squamous non-small cell lung cancer and can also treat malignant pleural mesothelioma combined with cisplatin.52–54 Researchers are currently studying the use of Pemetrexed disodium to treat other types of cancer. Fisetin is a naturally occurring flavonoid with multiple biological effects, including anti-cancer activity.55,56 Xu et al. 57 found that fisetin can effectively inhibit the activation of mTOR kinase signaling in prostate cancer cells and induce autophagic programmed cell death. In conclusion, these medications may be able to stop cancer from progressing. These conclusions direct clinical therapeutic development study for colon cancer.
We first employed bioinformatics analysis to establish a riskscore prognostic model for colon cancer containing six glycosylation genes. Based on the model, colon cancer patients were well grouped, and there were substantial differences in prognosis, immune microenvironment, and immune therapy response after grouping. High-riskscore indicated a poorer prognosis and lower T cell immune infiltration levels. In addition, anti-tumor drugs such as 67526-95-8, verteporfin, and fisetin had stronger targeted effects on model genes. However, there are still some limitations in this study. First, this study only explored the prognostic significance of glycosylation-related genes as risk markers in colon cancer at the bioinformatics level, and further experimental verification is needed in the future. Secondly, although the prognostic model constructed by glycosylation-related genes has good predictive performance, it still needs to be further verified in clinical practice. Overall, our model may help predict prognosis and personalized therapy of colon cancer.
Supplemental Material
sj-pdf-1-thc-10.1177_09287329241296771 - Supplemental material for Prognostic significance of glycosylation-related genes as risk markers in colon cancer
Supplemental material, sj-pdf-1-thc-10.1177_09287329241296771 for Prognostic significance of glycosylation-related genes as risk markers in colon cancer by Hui Chen, Liang Luo, Chen Wu, Gang Qiang Ying, Yuan Wang, Jin Chen, Hang Yu Zhou and Fangxing Peng in Technology and Health Care
Supplemental Material
sj-pdf-2-thc-10.1177_09287329241296771 - Supplemental material for Prognostic significance of glycosylation-related genes as risk markers in colon cancer
Supplemental material, sj-pdf-2-thc-10.1177_09287329241296771 for Prognostic significance of glycosylation-related genes as risk markers in colon cancer by Hui Chen, Liang Luo, Chen Wu, Gang Qiang Ying, Yuan Wang, Jin Chen, Hang Yu Zhou and Fangxing Peng in Technology and Health Care
Supplemental Material
sj-pdf-3-thc-10.1177_09287329241296771 - Supplemental material for Prognostic significance of glycosylation-related genes as risk markers in colon cancer
Supplemental material, sj-pdf-3-thc-10.1177_09287329241296771 for Prognostic significance of glycosylation-related genes as risk markers in colon cancer by Hui Chen, Liang Luo, Chen Wu, Gang Qiang Ying, Yuan Wang, Jin Chen, Hang Yu Zhou and Fangxing Peng in Technology and Health Care
Footnotes
Acknowledgements
Not applicable.
Ethics approval and consent to participate
Not applicable.
Author contributions
Conceived and designed the study: FX P and H C. Collected the data: L L and C W. Analyzed and interpreted the data: GQ Y and Y W. Manuscript writing and editing: J C and HY Z.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Availability of data and materials
The data and materials in the current study are available from the corresponding author on reasonable request.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
