Abstract
Heterogeneity in patients with colorectal cancer (CRC) leads to different strategies in clinical decision making. Identifying distinctive subgroups in patients contributes to develop more individualized treatments. This study constructed a novel prediction model for the prognosis of CRC patients based on the value of risk score combining the expression status of immune-related genes and coefficients. In this study, we built an interactive network of prognosis-related immune genes and transcription factors and adopted several methods to verify the accuracy of model. Moreover, we assessed the correlation between risk score and immune infiltration. The results suggested that the model was well fit and the risk score could be an independent predictive factor for CRC patients. This model has high application value in the clinic.
Introduction
According to the estimated cancer statistics in 2018, colorectal cancer (CRC) was ranked third in terms of incidence and second for mortality, worldwide (Bray et al., 2018). Although the diagnosis and treatment of CRC have been well developed in recent years, it remains one of the most common malignancies. The 5-year relative survival rates of CRC patients were 91% for stage I and 82% for stage II, but the rate of patients with stage IV was as low as 12% (Miller et al., 2019). Patients whose prognoses in different stages are variable need individualized therapy, and doctors are required to reasonably assess the risk status of CRC patients to make an appropriate diagnosis and formulate appropriate treatment strategies (Brenner et al., 2014; Kalyan et al., 2018). As convenient and efficient tools, prediction models are regularly used in a wide range of diseases to estimate a patient's risk status in the form of a score so that the patient's risk can be visually evaluated (Fujii, 2018; Cho et al., 2019; Long et al., 2019). In malignant diseases, appropriate models which indicate patients' natural survival or prognosis after different treatment options to a reasonable extent provide accordance for medical personnel to take rational measures in the process of cancer prevention, treatment, and follow-up. Therefore, identification of a sensitive and specific model to assess risk status is in urgent demand for CRC patients.
Immune system, including immune cells and proteins expressed by immune-related genes, is believed to be closely associated with progression of cancer (Desrichard et al., 2016; Hale et al., 2016). Researchers have shown the association in CRC with evidence from disease data and in vivo or in vitro experiments (Becht et al., 2016; Guo et al., 2019; Wu et al., 2019). Lee et al. (2019) used immunoregulatory protein expression to calculate an immune score, which predicted recurrence in breast cancer patients. In gastric cancer, a molecular prediction model combining the value of immune–risk status and chemosensitivity was built (Duan et al., 2019). Compared with the analysis of clinical risk factors, immune–risk scores focus on interactions in immune system and preferably reflect the current physiological conditions of the body. Additionally, the immune–risk score itself might have a deep correlation with clinical factors. However, a model that can be practically applied as a clinical assistant in CRC patients is still lacking.
In this study, we hypothesized that there is a high correlation between the expression of immune-related genes and CRC patients. We measured the expression level of immune genes based on public databases and obtained key genes with significant prognostic value. Using these relative genes, we developed a prognostic model to predict the risk of CRC patients and verified the accuracy of risk score on the basis of clinical data. We constructed a new network of immune genes and transcription factors to explore the interactions. Through statistical approaches, a novel prediction model based on the value of immune–risk score was identified in this study.
Materials and Methods
Data extraction
Gene expression and clinical data of CRC cohorts were available at The Cancer Genome Atlas (TCGA). CRC patients with less than 30 days of overall survival (OS) were excluded from this study. Furthermore, patients without full clinical parameters on age, gender, and tumor node metastasis (TNM) stage were removed from correlation test of clinical factors. The lists of immune-related genes and cells were obtained from the Immunology Database and Analysis Portal (ImmPort) and Tumor Immune Estimation Resource (TIMER) (Li et al., 2017; Bhattacharya et al., 2018). To build a network with immune-related genes, transcription factors were downloaded from Cistrome Project, which was based on the integrative analysis of TCGA expression and public ChIP-seq profiles. Data were identified and downloaded on August 10, 2019. A total of 488 tumor tissues and 42 normal or paracancerous tissues were selected for gene expression analysis. A total of 445 CRC patients were included for survival analysis, and 395 patients had full clinical parameters.
Construction of an interactive network
Differentially expressed immune genes related to prognosis were selected. Similarly, differentially expressed transcription factors were identified. Correlativity was confirmed after correlation analysis, and then data were input into Cytoscape (3.7.1) to visualize the network (Shannon et al., 2003).
Generation and test of the prediction model and immune–risk score
Corresponding coefficients for different immune genes in the model were confirmed after statistical estimation. Moreover, immune–risk scores for CRC patients were calculated. According to the median value, the patients were divided into two cohorts, high risk and low risk, and the predictive value of the scores was validated based on survival curve, receiver operating characteristic (ROC) curve, and risk curve. The correlations of immune–risk score with clinical parameters and immune infiltration were also analyzed. The project was performed in accordance with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement (Collins et al., 2015) (Supplementary Table S1).
Statistical analyses
All statistical analyses were performed in R (version 3.6.0). Wilcoxon test was used to identify genes differentially expressed in tumor tissues and normal or paracancerous tissues at threshold of fold change of 2 and false discovery rate of 0.05. Pearson test was used to validate the correlation between immune genes and transcription factors at a threshold of correlation coefficient of 0.4. Univariate Cox proportional hazards regression was used to estimate hazard ratios (HRs). Coefficients of prediction model were generated by multivariable Cox analysis. According to the median value of risk score, patients were defined as high risk or low risk. The survival difference of OS was evaluated by log-rank test. Univariate Cox analysis and multivariable Cox analysis were carried out to determine whether the immune–risk score was an independent risk factor for the prognosis of CRC patients. Finally, we estimated the relativities of risk score with clinical features, which were divided into binary data through T test and with immune infiltration through Pearson test. Confidence intervals (CIs) were set at 95%, and p < 0.05 was considered significant in all statistical analyses.
Results
Screening of prognosis-related immune genes
In total, 2498 immune-related genes were obtained from ImmPort. After expression analysis, differentially expressed genes, including upregulated and downregulated genes in tumor tissues, were filtered based on the gene expression data of CRC from TCGA (p < 0.05, Fig. 1A). For further selection, 32 genes had marked prognostic significance according to the calculated HR values of CRC patients. Among them, 30 genes might be related to a worse prognosis compared with the other 2 genes (F2RL1 and NR3C2) in CRC patients (p < 0.01, Fig. 1B).

Selection of candidate genes.
Correlation between prognosis-related immune genes and transcription factors
We obtained 318 transcription factors from Cistrome Project in aggregate. Similarly, transcription factors that were not differentially expressed between tumor tissues and normal or paracancerous tissues were excluded, and others were selected for further analysis. Of the included transcription factors, 12 factors which positively regulated 16 prognosis-related immune genes, including one related to low-risk prognosis, were confirmed. Negative regulation, which met the requirements, was not found. All the interrelationships are shown in the network, which was visualized in Cytoscape (p < 0.05, Supplementary Fig. S1).
Constructing the prediction model
Above, we had already obtained candidate prognosis-related immune genes. Based on the selected genes, multivariable Cox analysis was performed to build the model and identify the coefficients. Sixteen genes were finally included in the model, and each coefficient numerically represented the weight of gene expression. The individual risk score was calculated in view of a combination of the expression status of defined genes and corresponding coefficients (p < 0.05, Table 1).
Prediction Model for Survival
CI, confidence interval; HR, hazard ratio.
Test of the model
To further confirm the reliability of model, we stratified the cohort into high- and low-risk groups according to the median of scores. In survival curve, CRC patients with low-risk scores had a better prognosis compared with high-risk patients, as the 5-year OS rates were 0.859 and 0.481, respectively (p < 0.001, Fig. 2A). The area under curve in ROC curve was 0.771, which indicated good effectiveness of the model (Fig. 2B). Compared with low-risk patients, the worse prognosis of patients in high-risk cohort was also validated in risk curve (Fig. 2C, D).

Validating the prediction model. Patients were stratified into high- and low-risk groups according to the median of scores.
Correlation between the scoring system and clinical parameters
We investigated the inner correlation between the immune–risk score and clinical parameters of CRC patients. It was shown that some gene expression was closely associated with clinical characteristics of patients, such as glucose regulated protein, which was closely related to TNM stage (p < 0.05, Supplementary Table S2). There was no significant relativity for risk score in prediction model with age, gender, and TNM stage of patients (p > 0.05, Supplementary Table S2).
Evaluating the prognostic value of the model
Considering the above correlation analysis results, immune–risk score and clinical parameters (age, gender, and TNM stage) were included in univariate analysis on patient survival to evaluate the instructive effect of model on prognosis. It was suggested that except for gender factor, the age, TNM stage, and risk score of CRC patients were correlated with disease prognosis (p < 0.05, Fig. 3A). The results of multivariable analysis showed that the risk score might be an independent predictive factor for CRC patients (HR = 5.505, 95% CI = 2.778–10.911, p < 0.001, Fig. 3B).

Assessing risk factors of prognosis.
Assessment of immune infiltration
To explore the correlation between this immune gene model and immune infiltration, we downloaded tumor infiltration data of CD4+ T cell, CD8+ T cell, B cell, neutrophil, dendritic cell, and macrophage in CRC patients from TIMER. Although the risk score was related to the infiltrated cells, all the coefficients were extremely low and the corresponding trends with the change in scores were not obvious; thus, this model was not suitable for the study of immune infiltration (p < 0.05, Fig. 4).

Evaluating potential correlation between model and tumor immune infiltration. Estimation of the relativities for risk score with CD4+ T cell
Discussion
Risk score model as an auxiliary diagnosis and treatment strategy is commonly used in clinic. Positive detection rate will be markedly increased and screening cost will be greatly reduced for CRC patients if more precise stratification is carried out and the high-risk group can thus receive further examination. At present, many CRC models like Harvard Cancer Risk Index and Asia-Pacific Colorectal Screening score were constructed based on patients' lifestyle and physical examination with the advantage of simple data acquisition and low cost (Colditz et al., 2000; Yeoh et al., 2011). Microsatellite analysis, which is closely connected with DNA mismatch repair gene and genetic genealogy of patients, was applied in some models to estimate individual hazard and familial inheritance (Chen et al., 2006). With the reduction of sequencing cost and the extension of gene screening, heterogeneity of individual gene expression has gradually been considered in medical decision-making process (Guo et al., 2017).
Patients' immune status and tumor immune microenvironment play roles in CRC growth and metastasis (Fakih et al., 2019). Immunotherapy can be an innovation point to prevent and cure CRC (Jager et al., 2016). In this study, we used multiple databases to determine the immune genes related to CRC patient survival and constructed a regulatory network with correlative transcription factors. After analysis, we established a prediction model combining the expression of 16 immune-related genes and corresponding coefficients, and the specific value of score reflected the risk of patient. Furthermore, the model was examined with several methods and confirmed to accurately simulate the prognosis of CRC patients. As an independent risk factor, the score had a strong guiding significance in prevention and treatment of CRC. However, our results showed that the risk score was incapable of sensitively changing with the fluctuations of single immune cell; hence, infiltrated immune cells could not be directly linked to the model.
Among the immune-related genes covered in model, CD22 had a high correlation coefficient and was independent of included clinical characteristics. CD22, a B cell-specific glycoprotein, is expressed on the surface of maturing B cell and negatively regulates signaling through B cell antigen receptor (Haas et al., 2018; Matsubara et al., 2018). Currently, disease models in most studies on CD22 focus on lymphoblastic leukemia, and well-expected results have been obtained (Ma et al., 2012; Fry et al., 2018). Uromodulin-like 1 (UMODL1) gene maps to chromosome 21q22.3 in the minimal critical region. Previous research suggested UMODL1 was associated with congenital high myopia (Nishizaki et al., 2009). It was also reported to be upregulated in non-small cell lung cancer, metastatic prostate cancer, acute myeloid leukemia, and other malignant diseases (Tan et al., 2016). In our study, the correlations between CD22 or UMODL1, which were independent of clinical factors and the prognosis of CRC patients, were revealed. Thus, the molecules may be directly involved in the regulation of CRC, but there are few reports on them. Our group will focus on their potential mechanisms as the next step for further confirmation.
Several factors might potentially limit the confirmation of the predictive model in our study. First, data used in the model construction came from TCGA database, and more cohorts of CRC patients should be included for verification. The validation in our study lacked the support of external data and assessed the applicability of the model based on original data. Limited by the constraint of sample size, internal validation that splitting data as evaluation part was not appropriately relied on the simplex TCGA database. Although the model adopted different methods for test, overfitting may still exist. Therefore, the bias of the model fitting may arise despite the autologous validation. Model, especially coefficients, need to be adjusted in conjunction with recent clinical data for continuous improvement. Second, some of the immune genes in the model have seldom been focused on in CRC and are suitable for exploration after a preliminary experiment. Third, models of the same type based on gene expression can be intercompared and make a clear difference. Despite these limitations, this model based on the risk score is practical in the clinic and allows the rapid evaluation of CRC patients in a stable setting.
Conclusion
We constructed a model based on the value of the immune–risk score combining coefficients and the expression status of prognosis-related immune genes with validation with several methods. This model can be applied to predict CRC prognosis, and optimized strategies in prevention and treatment can be implemented for patients.
Footnotes
Disclosure Statement
No competing financial interests exist.
Funding Information
This work was supported by grants from the National Natural Science Foundation of China (No. 81672361 and No. 81972702). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the article.
Supplementary Material
Supplementary Figure S1
Supplementary Table S1
Supplementary Table S2
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
