Abstract
This study aims to investigate the correlation between the chondroitin sulfate proteoglycan 4 pseudogene 12 (CSPG4P12) polymorphism and the risk of colorectal cancer (CRC). This case–control study involved 850 patients with CRC and 850 health controls. The genotypes of CSPG4P12 (rs2880765, rs6496932, and rs8040855) were determined by the TaqMan-MGB probe method. Logistic regression model was employed to evaluate the association of CSPG4P12 single-nucleotide polymorphisms (SNPs) with the risk of CRC by calculating the odds ratio (OR) and 95% confidence interval (CI). The CSPG4P12 exhibited lower expression in CRC tissues. Our data showed that the rs6496932 variant increased CRC risk (CA vs. CC: p = 0.006; CA + AA vs. CC: p = 0.005). In contrast, the rs8040855 variant reduced the risk of CRC (CG vs. CC: p < 0.001; CG + GG vs. CC: p < 0.001). Stratification by gender and age revealed that the rs8040855 variant decreased CRC risk; however, the rs6496932 variant increased CRC risk among males (CA vs. CC: p = 0.024; CA + AA vs. CC: p = 0.014) and younger individuals (CA vs. CC: p = 0.004; CA + AA vs. CC: p = 0.010). When stratified by smoking and drinking status, the rs8040855 variant decreased CRC risk among nonsmokers (CG vs. CC: p < 0.001; CG + GG vs. CC: p < 0.001) and nondrinkers (CA vs. CC: p = 0.002; CA + AA vs. CC: p = 0.004). The rs6496932 variant increased CRC risk among nonsmokers (CA vs. CC: p = 0.016; CA + AA vs. CC: p = 0.036) and nondrinkers (CG vs. CC: p < 0.001; CG + GG vs. CC: p < 0.001). Haplotype analysis showed that the CSPG4P12 Trs2880765Crs6496932Grs8040855 haplotype reduced the risk of CRC compared with the reference haplotype (CSPG4P12 Ars2880765Crs6496932Crs8040855) (OR = 0.46, 95% CI = 0.26–0.82, p = 0.049). These findings highlight the potential of these genetic variants as biomarkers for CRC susceptibility, offering insights into personalized prevention strategies.
Introduction
Colorectal cancer (CRC) is the second leading cause of cancer-related deaths worldwide (Sung et al., 2021). In China, 2022 statistics report 592,232 new CRC cases and 309,114 deaths, making it the fifth leading cause of cancer-related mortality (Xia et al., 2022). Both genetic and environmental factors play important roles in the development of CRC (Baidoun et al., 2021; Thanikachalam and Khan, 2019) with ∼35% of CRC risk attributed to genetic factors (Lichtenstein et al., 2000). Numerous case–control studies have highlighted strong associations between genetic variants and the risk of CRC, with single-nucleotide polymorphisms (SNPs) being a key area of investigation (Zhang et al., 2014).
Pseudogenes are derivatives of the homeodomain gene, sharing a high degree of sequence homology and potentially competing with the homeodomain gene for miRNA binding (Sur et al., 2013). Chondroitin sulfate proteoglycan 4 (CSPG4) is widely expressed in various malignant tumors (Beard et al., 2014), including breast cancer, melanoma (Hu et al., 2022b; Price et al., 2011), and thyroid cancer (Egan et al., 2021). Chondroitin sulfate proteoglycan 4 pseudogene 12 (CSPG4P12) is a pseudogene-derived lncRNA that is homologous to its parent gene CSPG4 (Wiest et al., 2006).
The expression and function of lncRNAs are likely influenced by SNPs (Kumar et al., 2013). For example, H19 rs2839698 is associated with the risk of bladder cancer, and SNPs in PCGEM1 (rs6434568 and rs16834898) are associated with prostate cancer risk in Chinese men (Verhaegh et al., 2008; Xue et al., 2013). Our previous study indicated that CSPG4P12 SNPs affect the expression of CSPG4P12 in lung cancer (Hu et al., 2022a). Given the potential role of CSPG4P12 in cancer development, we hypothesized that CSPG4P12 genetic variants may contribute to the risk of CRC development.
In the present study, we genotyped SNPs (rs2880765, rs6496932, and rs8040855) in GSPG4P12 and evaluated their association with the risk of developing CRC. Our study provides additional evidence supporting the limited evidence that CSPG4P12 polymorphisms are associated with CRC risk in the northern Chinese Han population.
Materials and Methods
Study population
A total of 410 pathologically diagnosed colon cancer cases and 440 rectal cancer cases were collected at the Affiliated Tangshan Gongren Hospital of North China University of Science and Technology from 2008 to 2016. The controls consisted of individuals undergoing routine health checkups at the same hospital during the same period, with no history or indications of cancer. Controls were frequency matched to the case group on a 1:1 basis by sex and age (±5 years). General demographic and clinicopathological information were gathered through medical record reviews and questionnaires, with informed consent obtained from all participants. Peripheral venous blood (2 mL) was collected from each participant for further analysis. This study was approved by the Ethics Committee of North China University of Science and Technology (No. 2022027).
SNP selection
In the initial design of this study, we selected SNPs from the National Center for Biotechnology Information dbSNP database, focusing on those with a minor allele frequency >0.05 in the Chinese Han Beijing population. To further evaluate the potential impact of these SNPs on CSPG4P12 expression levels in the colorectum, we utilized the GTEx database. This analysis led us to select three SNPs (rs2880765, rs6496932, and rs8040855) for further investigation. However, as the databases were updated during the course of our research, we found that the rs6496932 variant did not significantly affect CSPG4P12 expression levels. While a decreasing trend in expression was observed in both the sigmoid and the transverse colon, these changes were not statistically significant. Despite this, we decided to include the rs6496932 SNP in our study to maintain the integrity of our initial selection criteria.
DNA extraction and genotyping of CSPG4P12
DNA was extracted using a DNA extraction kit provided by Tiangen Biochemical Technology (China). Genotyping was performed using TaqMan MGB probe method. The primers and probes used in our study were custom designed and synthesized by Ranrun Jikang (China) Biotechnology using the following process: (1) Target Sequence Selection: The target sequences for the CSPG4P12 SNPs (rs2880765, rs6496932, and rs8040855) were identified using available genomic data (Ensembl); (2) Primer and Probe Design: Using the target sequences, primers, and probes were designed with the assistance of Primer Express® software (Thermo Fisher Scientific, USA) to ensure specificity and efficiency in amplifying the regions of interest; (3) Validation: The designed primers and probes were validated through in silico analysis using BLAST (basic local alignment search tool) to confirm their binding specificity and to avoid potential off-target effects. Oligo Analyzer was used to predict and avoid the formation of secondary structures and primer-dimers; (4) Synthesis: Following validation, the primers and probes were then synthesized by Ranrun Jikang. The PCR primers and probes used for detecting CSPG4P12 rs2880765, rs6496932, and rs8040855 variants are listed in Table 1. PCR was carried out using an ABI 7900HT Fast Real-Time PCR instrument (Thermo Fisher Scientific). The 5 µL reaction system included 0.2 µL of each probe (2 µM), 0.15 µL of each upstream and downstream primer (10 µM), 1 µL of genomic DNA (0.1–20 ng), and 2× PCR mix (TaqMan Universal Master Mix II, ABI, USA). For the genotyping of CSPG4P12 rs2880765, rs6496932, and rs8040855 polymorphisms, the thermal cycling reactions included a preliminary melting step at 95°C for 10 min, followed by 50 cycles of denaturation at 95°C for 15 s and annealing at 58°C for 1 min. The results were analyzed using ABI SDS 2.4 software. For quality control, negative controls and two randomized duplicate samples were included in each assay.
Sequence Information for Fluorescent Probes and Companion Primers
Bioinformatics methods
The online GEPIA program (http://gepia.cancer-pku.cn) was used to analyze the differential expression of CSPG4P12 between colorectal cancer tissues and adjacent normal tissues. The online GTEx program (https://www.gtexportal.org) was used to assess the effects of rs2880765, rs6496932, and rs8040855 on the expression of CSPG4P12.
Haplotype and linkage disequilibrium analyses
Haplotypes for the different SNPs were constructed using SHEsis online software (http://analysis.bio-x.cn/myAnalysis.php). To analyze linkage disequilibrium (LD), including generating the LD plot, we used Haploview 4.2 software (USA).
Statistical methods
SPSS 23.0 software (SPSS, Inc. Chicago, IL) was used for data analysis. The distribution difference of categorical variables (e.g., sex, age, smoking, and drinking status) and genotypes of each SNP between cases and controls was compared using the χ 2 test. Odds ratios (OR) and 95% confidence intervals (CI) were calculated using unconditional logistic regression to evaluate the association between genetic variants and the risk of colon and rectal cancer. Hardy–Weinberg equilibrium (HWE) in controls was assessed using the Pearson goodness-of-fit test. All statistical tests were two-sided, with a significance level of p < 0.05.
Results
Differential expression of CSPG4P12 in CRC and SNP selection
Bioinformatics analysis revealed that the pseudogene CSPG4P12 was poorly expressed in many types of cancer tissues, including colorectal cancer (Fig. 1a, b). To demonstrate if CSPG4P12 genetic variation affects gene expression, we analyzed the data from the GTEx database. The rs2880765 TT or AT genotype was associated with a significant increase in CSPG4P12 expression in sigmoid colon (p = 0.024) compared with the AA genotype, but no significant difference was observed in transverse colon (p = 0.483) (Fig. 1c). For the rs6496932 variant, the AA or CA genotypes did not exhibit any significant differences in CSPG4P12 expression levels compared with the CC genotype, although a decreasing trend was observed in both the sigmoid and the transverse colon (p = 0.434 and p = 0.761) (Fig. 1d). The rs8040855 GG or CG genotypes significantly increased the CSPG4P12 expression levels compared with the CC genotype in both the sigmoid and the transverse colon (p = 0.029 and p = 0.006) (Fig. 1e).

CSPG4P12 expression patterns in colorectal cancer.
Basic characteristics of subjects in case–control study
The basic information of 850 patients with CRC and 850 health controls is summarized in Table 2. There was no statistically significant difference in sex, age, smoking, and drinking status between the case and control groups (p > 0.05, Table 2), which means that the case and control groups were comparable.
Distributions of Selected Characteristics in Cases and Control Subjects
Two-sided χ 2 test.
Association of CSPG4P12 variants with the risk of CRC
For these three SNPs, we undertook a HWE calculation in both the control and case groups. No significant deviations were observed in both groups, indicating that that the subjects included in this study were representative of the population (all p > 0.05). We used three models (codominant, dominant, and recessive) to assess the association between CSPG4P12 SNPs (rs2880765, rs6496932, and rs8040855) and CRC risk (Table 3). The results indicated that the rs6496932 variant increased CRC risk in codominant (CA vs. CC: OR = 1.36, 95% CI= 1.11–1.66, p = 0.006) and dominant models (CA + AA vs. CC: OR = 1.32, 95% CI = 1.09–1.60, p = 0.005) but not in recessive model (AA vs. CC + CA: OR = 0.99, 95% CI = 0.72–1.35, p = 0.948). The rs8040855 variant decreased the risk of CRC in codominant (CG vs. CC: OR = 0.47, 95% CI = 0.32–0.69, p < 0.001) and dominant model (CG + GG vs. CC: OR = 0.48, 95% CI = 0.33–0.70, p < 0.001). The rs8040855 GG genotype was present in only one case within the case group, making it unsuitable for analysis in the recessive model. For the rs2880765 variant, no significant association with CRC risk was observed across the codominant, dominant, or recessive models (all p > 0.05).
Genotype Frequencies of SNPs in CSPG4P12 Genes and Their Association with CRC Risk
Data were calculated by unconditional logistic regression and adjusted for sex, age, smoking status, and drinking status.
p-Values were FDR-corrected.
CI, confidence interval; CRC, colorectal cancer; OR, odds ratio; SNP, single-nucleotide polymorphism.
Stratification analysis of the CSPG4P12 variants and CRC risk
The stratification analysis results are listed in Tables 4–6. In codominant and dominant models, the rs6496932 variant increased the risk of CRC in males (CA vs. CC: OR = 1.40, 95% CI = 1.08–1.83, p = 0.024; CA + AA vs. CC: OR = 1.41, 95% CI = 1.10–1.82, p = 0.014) and younger individuals (CA vs. CC: OR = 1.57, 95% CI = 1.18–2.10, p = 0.004; CA + AA vs. CC: OR = 1.49, 95% CI = 1.13–1.96, p = 0.010) (Table 4). Similarly, the rs6496932 variant increased the risk of CRC among nonsmokers (CA vs. CC: OR = 1.37, 95% CI = 1.09–1.73, p = 0.016; CA + AA vs. CC: OR = 1.31, 95% CI = 1.05–1.63, p = 0.036) and nondrinkers (CA vs. CC: OR = 1.50, 95% CI = 1.18–1.90, p = 0.002; CA + AA vs. CC: OR = 1.42, 95% CI = 1.13–1.78, p = 0.004) in codominant and dominant models (Table 4). However, the rs6496932 variant did not influence the risk of CRC in any of the subgroups in the recessive model (all p > 0.05) (Table 4).
Stratified Analysis of the Association Between CSPG4P12 rs6496932 and CRC Risk
Data were calculated by unconditional logistic regression and adjusted for sex, age, smoking status, and drinking status.
p-Values were FDR-corrected.
Stratified Analysis of the Association Between CSPG4P12 rs8040855 and CRC Risk
Data were calculated by unconditional logistic regression and adjusted for sex, age, smoking status, and drinking status.
p-Values were FDR-corrected.
Stratified Analysis of the Association Between CSPG4P12 rs2880765 and CRC Risk
Data were calculated by unconditional logistic regression and adjusted for sex, age, smoking status, and drinking status.
p-Values were FDR-corrected.
As shown in Table 5, stratified analyses showed that individuals carrying at least one rs8040855 G allele had a lower risk of CRC in any of the subgroups in both codominant and dominant models. When stratified by smoking or drinking status, results indicated that the G allele carriers had a lower risk of CRC among nonsmokers (CG vs. CC: OR = 0.40, 95% CI = 0.25–0.64, p < 0.001; CG + GG vs. CC: OR = 0.40, 95% CI = 0.25–0.64, p < 0.001) and nondrinkers (CG vs. CC: OR = 0.37, 95% CI = 0.24–0.59, p < 0.001; CG + GG vs. CC: OR = 0.37, 95% CI = 0.24–0.59, p < 0.001) but not among smokers and drinkers (all p > 0.05).
In addition, the rs2880765 variant was not associated with the risk of CRC in any of the subgroups in all three models (all p > 0.05) (Table 6).
LD and haplotype analysis
We further conducted haplotype and LD analyses. The LD analysis revealed a significant association between rs2880765 and rs6496932 (D′ = 0.648, r 2 = 0.063) (Fig. 2). Haplotype analysis showed that the CSPG4P12 Trs2880765Crs6496932Grs8040855 haplotype was associated with a reduced risk of CRC compared with the reference haplotype (CSPG4P12 Ars2880765Crs6496932Crs8040855) (OR = 0.46, 95% CI = 0.26–0.82, p = 0.049) (Table 7). No other haplotypes showed a significant association with CRC risk (p > 0.05).

LD plot about LD for the three tag SNPs (rs2880765, rs6496932, and rs8040855) in CSPG4P12. Numbers in squares indicate 100-fold D′ values for each pair of SNPs. LD, linkage disequilibrium; SNPs, single-nucleotide polymorphisms.
Haplotype Frequencies of CSPG4P12 Among Cases and Controls and Their Association with CRC Risk
p-Values were FDR-corrected.
Discussion
Pseudogenes are involved in the development of CRC. For example, DUXAP8 promotes CRC cell proliferation, invasion, and migration (He et al., 2020), and CTNNAP1 inhibits CRC growth (Chen et al., 2016). CSPG4P12 is a pseudogene-derived lncRNA that is strongly homologous to its parent gene, CSPG4 (Wiest et al., 2006). In cancer, SNPs in different regions of gene perform distinct functions through various mechanisms (Deng et al., 2017). SNPs can disrupt the function of lncRNAs by directly interfering with transcription factor binding or indirectly affecting the expression of regulatory factors (Abdi et al., 2022). In addition, SNPs in the distal region may regulate remote cis effects, thereby altering gene transcription (He et al., 2015). Numerous studies have demonstrated that genetic polymorphisms in pseudogenes are significantly associated with CRC risk. For example, MYLKP1 variants (rs12490683 and rs12497343) significantly increased the risk of colon cancer in African Americans (Lynn et al., 2018).
Currently, it remains uncertain whether CSPG4P12 polymorphisms are associated with CRC risk. This case–control study revealed that the CSPG4P12 rs6496932 variant increased CRC risk, while the rs8040855 variant significantly decreased the risk of CRC. This difference may be attributed to the distinct physiological characteristics and pathogenesis of colon and rectal cancers, among other factors (Paschke et al., 2018; Tamas et al., 2015).
Epidemiology studies have shown that the development of CRC is influenced by a combination of demographic characteristics, genetic factors, and environmental factors, with interactions between them (Archambault et al., 2022). In this study, our results also demonstrated that CSPG4P12 rs6496932 genetic variant increased the risk of colorectal cancer in males and younger individuals, while the rs8040855 genetic variant decreased the risk of colorectal cancer in sex- and age-stratified analyses. Consistent with other studies, the relationship between genetic variants and CRC risk is influenced by sex and age (Leberfarb et al., 2020; Wu et al., 2019). This highlights the importance of considering the interaction of sex, age, and genetic variants in the development of CRC.
Additionally, environmental factors such as smoking, alcohol consumption, and bacterial infections are considered as risk factors for CRC (Keum and Giovannucci, 2019). Ethanol, found in all types of alcoholic beverages, is a well-established risk factor for CRC and smoke-induced dysbiosis of the gut microbiota alters intestinal metabolites and impairs the intestinal barrier, promoting CRC (Bai et al., 2022; Grosse et al., 2019). CSPG4P12 may interact with the environmental factors, contributing to CRC development. Stratified analysis by smoking status showed that CSPG4P12 rs8040855 G allele reduced the risk of CRC among nonsmokers but not among smokers, whereas the rs6496932 A allele increased the risk of colorectal cancer among nonsmokers but not among smokers. Similarly, stratified analysis by alcohol consumption revealed that CSPG4P12 rs8040855 G allele reduced the risk of CRC among non-drinkers but not among drinkers, while the rs6496932 A allele increased the risk of colorectal cancer among nondrinkers. These findings suggest that CSPG4P12 rs6496932 and rs8040855 genetic variations may affect the risk of CRC through gene–environment interactions.
Our haplotype analysis revealed that the CSPG4P12 Trs2880765Crs6496932Grs8040855 haplotype was significantly associated with a reduced risk of CRC. No other haplotypes showed significant associations with CRC risk. This finding is consistent with our single SNP analysis, where the rs8040855 G allele was associated with a reduced risk of CRC. The presence of the G allele in the protective haplotype further supports its potential role in reducing cancer risk. The protective effect observed for the Trs2880765Crs6496932Grs8040855 haplotypes may be related to the combined influence of these SNPs on CSPG4P12 expression.
Conclusions
CSPG4P12 gene polymorphisms, particularly rs6496932 and rs8040855 genetic variants, significantly influence colorectal cancer risk. These findings highlight the potential of these genetic variants as biomarkers of CRC susceptibility and offer insights into personalized prevention strategies.
Footnotes
Acknowledgment
The authors thank the patients and their families for their involvement in the study.
Authors’ Contributions
X. Zhou: Conceptualization, methodology, formal analysis, investigation, and writing—original draft preparation. L.G.: Data curation, visualization, and investigation. Z.Y.: Visualization, data curation, and supervision. H.X.: Validation and formal analysis. Z.Z.: Investigation and resources. X. Zhang: Conceptualization, resources, supervision, project administration, writing, reviewing, editing, and funding acquisition. All authors revised the text and approved the final article.
Disclosure Statement
All the authors declare that they have no competing interests.
Funding Information
This study was supported by
Supplementary Material
Supplementary Data S1
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
