Abstract
Background
Buccal mucosa cancer (BMC) is an aggressive subtype of oral cancer. Conventional survival analysis cannot reflect the time-dependent nature of prognosis. This study aimed to assess dynamic survival probabilities in BMC using conditional survival (CS) analysis and to develop a CS-based nomogram for individualized prognostic evaluation.
Methods
Data of BMC patients diagnosed between 2004 and 2021 were obtained from the SEER database. CS analysis was applied to estimate dynamic survival, while annual hazard rate (AHR) analysis identified high-risk periods after diagnosis. A CS-nomogram was constructed using stepwise selection integrating best subsets regression, LASSO, and Cox analyses. Model performance was evaluated using calibration plots, area under the curve (AUC), decision curve analysis (DCA), and Kaplan–Meier stratification.
Results
A total of 2,310 patients were included. The 3-, 5-, and 10-year overall survival (OS) rates were 63%, 56%, and 41%, respectively. CS analysis showed a marked increase in 10-year survival probability, from 52% for 1-year survivors to 95% for 9-year survivors. AHR analysis revealed the highest mortality risk within the first year, which stabilized after three years. The CS-nomogram, incorporating key prognostic factors, achieved AUCs exceeding 0.80 in both training and validation cohorts.
Conclusions
This study establishes a dynamic prognostic framework for BMC. The CS-nomogram provides accurate, interval-updated survival predictions that can help clinicians tailor follow-up and counseling for long-term survivors. However, additional external validation in geographically diverse cohorts is needed.
Introduction
Oral cancer remains a significant global health concern, with buccal mucosa cancer (BMC) representing a distinct and clinically relevant subtype.1–3 BMC arises from the mucosal lining of the inner cheek and is predominantly squamous cell carcinoma. 4 Clinically, it is characterized by aggressive local invasion, a high propensity for lymph node metastasis, and a significant risk of recurrence, contributing to its poor prognosis.5,6 While numerous studies have focused on the prognosis of oral cancer as a whole, particularly tongue cancer, there is a notable paucity of research providing a comprehensive evaluation of BMC prognosis.7,8
Some studies have attempted to describe BMC outcomes using overall survival (OS) and cancer-specific survival (CSS) as primary endpoints.9–12 However, these traditional prognostic metrics fail to capture the dynamic nature of survival probabilities over time. This limitation underscores the necessity for further research aimed at refining dynamic prognostic assessments and optimizing therapeutic strategies for BMC. Conditional survival (CS) analysis, in contrast, offers a dynamic perspective by assessing survival probabilities at different time points after diagnosis.13–15 This approach is particularly valuable for malignancies with initially poor prognosis, as it can identify critical time periods with the highest mortality risk and provide updated survival probabilities for long-term survivors. 15 By applying CS analysis to BMC, we can better characterize its survival patterns and refine prognostic predictions, ultimately aiding clinical decision-making.
The development of accurate survival estimation tools is crucial for individualized treatment planning and patient counseling in clinical practice. Nomograms have become widely utilized tools for personalized survival prediction, integrating multiple prognostic factors into a user-friendly graphical format.9,10,16 Traditional nomograms, however, do not account for time-dependent changes in survival probability. 14 To address this limitation, we proposed developing a CS-based nomogram (CS-nomogram), which integrated CS analysis into predictive modeling. This novel approach offered superior accuracy compared to conventional models, allowing clinicians to dynamically assess patient prognosis at various post-diagnosis time points and adapt treatment strategies accordingly.
In this study, we utilized the Surveillance, Epidemiology, and End Results (SEER) database to conduct a comprehensive CS analysis of BMC. We further developed and validated a CS-nomogram to enhance prognostic accuracy and support clinical decision-making. Our findings provided valuable insights into the evolving survival trends of BMC patients and highlighted the importance of time-adjusted prognostic modeling in improving individualized care.
Methods
This study used data from the SEER database, maintained by the National Cancer Institute’s of U.S. The SEER database is openly accessed, all authors obtained necessary access permissions, and data were extracted using SEERStat software (Version 8.4.5). This study was reviewed and approved by the Institutional Review Board (IRB) of Shaoxing Central Hospital (Approval No.: 2026-030-001). Given the retrospective nature of the study and the use of de-identified data, the requirement for informed consent was waived by the IRB.
We retrospectively identified and analyzed patients with BMC diagnosed between 2004 and 2021 from the SEER database. Tumor cases were selected using ICD-O-3 codes C06.0 (cheek mucosa) and C06.1 (vestibule of mouth), with inclusion restricted to primary malignancies. Cases with unknown tumor characteristics or incomplete treatment information were excluded from the analysis. We extracted demographic variables (sex, age, race, marital status, and household income), tumor-related features (histological type, differentiation grade, primary site, and TNM stage), therapeutic interventions (surgical resection, radiotherapy, and chemotherapy), and patient survival data. Because the study period spans multiple TNM staging editions, staging variables were harmonized using the standardized SEER staging definitions to ensure consistency across different diagnostic periods. The detailed screening process and the step-by-step selection of the study cohort are presented in Figure S1. The primary outcome of this study was OS, defined as the time interval from initial diagnosis to either death or the last recorded follow-up.
Conditional survival and annual hazard analysis
CS provides a dynamic prognostic assessment by estimating the probability that a cancer patient, having already survived x years following diagnosis, will survive an additional y years, expressed mathematically as CS(y|x)=OS(x+y)/OS(x), where CS(y|x) represents the probability of the patient being alive for x+y years post-diagnosis, and x denotes successful survival for x year.14,15 Annual hazard rate (AHR) analysis estimates the instantaneous risk of mortality at each time point post-diagnosis, providing dynamic prognostic insights that complement traditional survival measures by identifying critical periods of elevated risk.14,17
CS-Nomogram construction
Patients were randomly assigned to the training and validation cohorts in a 7:3 ratio using the random sampling function in R. No additional stratification variables were applied. Baseline characteristics between the two cohorts were compared to confirm balance. To identify potential prognostic factors, three statistical approaches were applied independently, including best subsets regression (BSR), least absolute shrinkage and selection operator (LASSO) regression, and Cox regression analysis. Significant predictors were further refined via stepwise backward regression using the “MASS” package. 18 Model performance was assessed using the Akaike Information Criterion (AIC) and the area under the receiver operating characteristic curve (AUC), with optimal variables selected based on the lowest AIC and highest AUC values. To minimize the risk of overfitting and coefficient bias, we strictly adhered to a split-sample validation framework. All variable selection strategies (BSR, LASSO, and Uni-Cox followed by stepwise backward selection) were executed solely within the training cohort. The performance of each candidate model was then objectively compared based on AIC and AUC values. The final CS-nomogram was constructed only after the optimal predictor set was confirmed to possess superior discriminative power. These optimal variables were finally incorporated into a nomogram to predict 3-, 5-, and 10-year OS as well as 10-year CS.
Nomogram evaluation and application
Model performance was assessed in both training and validation cohorts. We employed calibration curves to examine the concordance between model-predicted probabilities and actual clinical outcomes, while receiver operating characteristic (ROC) curves with AUC metrics were applied to assess the model’s predictive discrimination. For robust internal validation, we implemented a bootstrapping approach with 1000 resampling iterations. To evaluate clinical applicability, we conducted decision curve analysis (DCA) with the “stdca.R” package. 19 We also employed nomogram model to calculate individualized risk scores for each patient, followed by comprehensive evaluation of the prognostic correlation between these risk scores and clinical outcomes. Individual risk scores were calculated by summing the weighted regression coefficients of each variable derived from the multivariable Cox model. The optimal cutoff value was determined using maximally selected rank statistics to identify the threshold that most effectively partitioned patients into prognostically distinct risk subgroups, thereby enabling robust risk stratification. Subsequently, survival outcomes among these risk-stratified subgroups were compared using Kaplan-Meier survival analysis, assessed by log-rank tests.
The statistical analysis was conducted using R software version 4.3.5, and a significance level of less than 0.05 was deemed statistically significant.
Results
Baseline demographic and clinical characteristics
Patients’ detailed data.
RT, radiotherapy; CT, chemotherapy.
Other histological types (comprising 14.2% of the cases) included Carcinoma, NOS (8010/3), Pseudosarcomatous carcinoma (8033/3), Small cell carcinoma, NOS (8041/3), Verrucous carcinoma, NOS (8051/3), Papillary squamous cell carcinoma (8052/3), Lymphoepithelial carcinoma (8082/3), Basaloid squamous cell carcinoma (8083/3), Basaloid carcinoma (8123/3), Adenocarcinoma, NOS (8140/3), Basal cell adenocarcinoma (8147/3), Adenoid cystic carcinoma (8200/3), Neuroendocrine carcinoma, NOS (8246/3), Merkel cell carcinoma (8247/3), Papillary adenocarcinoma, NOS (8260/3), Oxyphilic adenocarcinoma (8290/3), Clear cell adenocarcinoma, NOS (8310/3), Mucoepidermoid carcinoma (8430/3), Cystadenocarcinoma, NOS (8440/3), Signet ring cell carcinoma (8490/3), Infiltrating duct carcinoma, NOS (8500/3), Secretory carcinoma of breast (8502/3), Polymorphous low grade adenocarcinoma (8525/3), Acinar cell carcinoma (8550/3), Adenosquamous carcinoma (8560/3), Epithelial-myoepithelial carcinoma (8562/3), Mixed tumor, malignant, NOS (8940/3), Carcinoma in pleomorphic adenoma (8941/3), and Malignant myoepithelioma (8982/3).
Conditional survival and annual hazard rate analysis
Kaplan-Meier analysis revealed 3-, 5-, and 10-year OS rates of 63%, 56%, and 41% respectively in BMC patients (Figure 1(a) and (b)). CS analysis demonstrated progressively improving 10-year survival probabilities with each additional year survived post-diagnosis, increasing from 52% for 1-year survivors to 95% for those surviving 9 years (Figure 1(a) and (b)). AHR analysis showed a non-linear temporal pattern, with peak mortality risk occurring during the first year post-diagnosis, followed by gradual risk reduction and stabilization beyond the second year, reflecting the dynamic evolution of prognosis in BMC (Figure 1(c)). Conditional survival analysis and annual hazard rate analysis of BMC. Conditional survival curves (a) and their updated survival data adjusted for survived time (b); annual hazard rate curve adjusted for survived time (c). BMC, buccal mucosa cancer.
Variable selection and nomogram construction
During variable selection, we systematically compared three approaches: BSR, LASSO regression, and Uni-Cox modeling. Using BSR with maximum adjusted R-squared criteria, we initially identified six variables (age, histology, N stage, primary surgery, chemotherapy, and household income), which were subsequently refined to five through stepwise backward regression (age, histology, N stage, primary surgery, and household income; Figure 2(a) and (b)). LASSO regression with 1 standard error criteria selected six variables (age, histology, T stage, N stage, primary surgery, and marital status; Figure 2(c) and (d). The Uni-Cox model (p<0.05 threshold) initially captured eleven variables, which stepwise backward regression reduced to nine (age, grade, histology, T stage, N stage, surgery, radiotherapy, marital status, and household income; Figure 2(e). Model performance evaluation using AIC and AUC metrics demonstrated that the Uni-Cox approach outperformed others, achieving superior model fit (AIC: BSR, 9937.86; LASSO, 9894.73; Uni-Cox, 9884.9) and discriminative ability (Figure 3). Finally, we developed a CS-nomogram based on the variable subset identified via Cox proportional hazards regression, capable of predicting 3-, 5-, and 10-year OS as well as 10-year CS. This integrative model incorporates nine independent prognostic factors: age, grade, histology, T stage, N stage, surgery, radiotherapy, marital status, and household income. By summing the points assigned to each variable, the nomogram predicts traditional 3-, 5-, and 10-year OS at the time of diagnosis. Uniquely, this tool provides updated CS probabilities—such as the 10-year CS(5|5)—enabling clinicians to estimate the likelihood of surviving an additional 5 years for patients who have already achieved 5-year survival. Consequently, this model facilitates dynamic prognostic updates, reflecting the clinical reality that survival probability typically improves as patients successfully reach successive post-diagnosis milestones (Figure 4). Predictor screening. Best subsets regression (a and b), the least absolute shrinkage and selection operator regression (c and d) and Uni-Cox analysis (e) for screening predictors. The performance of BSR, LASSO, and Uni-Cox models were compared using receiver operating characteristic analysis with area under the curve values. BSR, best subsets regression; LASSO, the least absolute shrinkage and selection operator regression. A conditional survival (CS)-based nomogram for predicting dynamic prognosis in BMC patients. This integrative model incorporates nine independent prognostic factors: age, grade, histology, T stage, N stage, surgery, radiotherapy, marital status, and household income. By summing the points assigned to each variable, the nomogram predicts traditional 3-, 5-, and 10-year overall survival (OS) at the time of diagnosis. Uniquely, the lower axes provide updated CS probabilities—such as “10-yr CS(5|5)”—enabling clinicians to estimate the likelihood of surviving an additional 5 years for patients who have already achieved 5-year survival. This tool facilitates dynamic prognostic updates, reflecting how survival probability typically improves as patients reach post-diagnosis milestones.


Nomogram evaluation and application
Model performance was rigorously assessed in both training and validation cohorts. Calibration plots demonstrated excellent agreement with the 45-degree reference line, confirming strong predictive accuracy (Figure 5(a) and (b)). Bootstrap validation with 1000 resamples further validated model stability. Time-dependent ROC analysis revealed outstanding discrimination, with AUC values of 82.9% (95% CI 80.7%-85.1%), 82.7% (80.3%-85.0%), and 81.7% (78.5%-85.0%) for 3-, 5-, and 10-year predictions in the training set (Figure 5(c), and 79.7% (76.1%-83.4%), 82.4% (78.8%-85.9%), and 83.5% (79.2%-87.8%) respectively in the validation set (Figure 5(d)). Decision curve analysis confirmed clinical utility, showing superior net benefit across all threshold probabilities compared to alternative approaches in both cohorts (Figure 5(e) and (f)). These comprehensive evaluations demonstrated our nomogram’s robust accuracy and reliability for predicting BMC patient outcomes. The CS-nomogram’s predictive performance was assessed through calibration plots (a and b), time-dependent ROC curves (c and ), and decision curve analysis (e and f). CS, conditional survival; ROC, receiver operating characteristic analysis.
Furthermore, we calculated individual risk scores for each patient using the nomogram model and established an optimal prognostic cutoff value of 167 (Figure 6(a)). Based on this threshold, patients were stratified into distinct low-risk and high-risk groups. Kaplan-Meier analysis with log-rank testing confirmed statistically significant survival differences between these risk strata (p<0.001), validating the clinical relevance and discriminative power of our risk stratification approach (Figure 6(b) and (c)). CS-nomogram based risk stratification. The CS-nomogram enabled effective risk stratification by identifying optimal cutoff values (a) that distinguished distinct prognostic groups (b and c). CS, conditional survival.
Discussion
BMC is an uncommon and aggressive neoplasm of the oral cavity, characterized by a high propensity for local invasion, nodal metastasis, and poor long-term survival.9,11 Prognostic estimations for BMC have traditionally relied on static survival metrics, which fail to reflect the evolving nature of survival probabilities over time.9,11,12 In this study, we conducted a comprehensive CS analysis of BMC using a large, population-based dataset and constructed a CS-nomogram to provide individualized, time-adjusted prognostic estimations. Our results revealed a dynamic improvement in survival probability with increasing post-diagnosis survival time, underscoring the necessity of incorporating CS analysis into prognostic modeling to optimize patient counseling and clinical decision-making.
Our study revealed that CS significantly improves as the time since diagnosis increases. Specifically, the 10-year survival probability increased progressively for patients who had already survived one or more years post-diagnosis, rising from 52% at year 1 to 95% at year 9. This trend highlighted the dynamic nature of BMC prognosis, where the highest mortality risk was observed within the first year following diagnosis. The AHR curve also demonstrated a clear early high-risk phase within the first three years following diagnosis, after which mortality risk appeared to stabilize. This temporal pattern suggests that the early post-diagnosis period represents a critical window requiring intensified surveillance and management. In contrast, patients who survive beyond this phase may enter a more stable risk state, which raises the possibility that follow-up strategies could be risk-adapted over time. However, any potential de-escalation of follow-up intensity should be interpreted with caution, as survival risk alone does not fully capture recurrence dynamics, treatment-related sequelae, or functional outcomes. Prospective studies incorporating detailed clinical and longitudinal follow-up data are needed before formalizing time-based surveillance de-intensification strategies. In addition, within this framework, we intentionally selected OS as the primary endpoint. OS offers a comprehensive and objective measure of total patient burden by capturing both cancer-related and non-cancer-related mortality, including the physiological impact of intensive therapies and underlying comorbid conditions. Given that patients are primarily concerned with overall life expectancy, OS represents the most pragmatic and clinically relevant endpoint for prognostic assessment and counseling. At the same time, we acknowledge the complementary value of CSS, which can better isolate disease-related mortality and mitigate the influence of competing risks. However, due to the inherent limitations of the SEER database—particularly the potential misclassification and limited granularity of cause-of-death data—OS was considered the more robust and reliable endpoint for model development in this study. Accordingly, the potential influence of competing risks should be taken into account when interpreting our results. Future prospective studies incorporating high-quality cause-of-death data and standardized comorbidity indices are warranted to enable integrated analyses of both OS and CSS, thereby further refining prognostic accuracy.
The development of an accurate prognostic model is essential for personalized treatment planning in BMC. Traditional nomograms provide survival predictions based on baseline characteristics at diagnosis, but they fail to account for time-dependent changes in survival probability. 14 In contrast, our CS-nomogram integrated CS probabilities to offer more precise, dynamic prognostic estimations. This allows clinicians to dynamically reassess a patient’s prognosis at different time points post-diagnosis, facilitating more informed therapeutic decisions. Additionally, our risk stratification approach, based on individualized nomogram-derived scores, effectively differentiated patients into distinct prognostic subgroups, enabling tailored follow-up and intervention strategies. The robustness of our model was further reinforced by our rigorous variable selection process, which combined multiple statistical methods, ensuring the optimal inclusion of prognostically significant factors while minimizing overfitting and enhancing predictive accuracy.
Among the prognostic variables incorporated into our final model, age was a critical determinant of survival, with older patients exhibiting significantly poorer outcomes, consistent with previous studies. Histological subtype also played a vital role, as squamous cell carcinoma, the predominant histology, demonstrated distinct survival patterns. The analysis also revealed that advanced tumor stage, especially higher T classification and nodal metastasis (N stage), served as significant independent predictors of poor clinical outcomes, underscoring the critical influence of locoregional disease progression on patient survival.4,20,21 Notably, our prognostic model differs from previous nomograms for oral and tongue cancers by excluding M stage as a predictive factor. 22 This exclusion is justified by the extremely low prevalence of M1 disease in our cohort (1.4%), which leads to significant prediction inaccuracies when applying population-based nomograms to these rare metastatic cases. Among the variables considered, M-stage was omitted to maintain the predictive reliability and stability of the model. From a statistical perspective, the extremely low prevalence of M1 disease (1.4%) resulted in insufficient statistical power to derive meaningful coefficients for this subgroup, which would have compromised the overall performance of the nomogram. Clinically, patients with M1 disease represent a unique cohort with distinct biological behaviors and therapeutic requirements that are not aligned with the characteristics of the localized or regional disease cohorts. These patients should be flagged for aggressive systemic therapy or palliative care rather than being evaluated through a community-level prognostic tool. Applying our nomogram to these rare metastatic cases may lead to inaccurate expectations; thus, our tool is best utilized for personalizing the long-term follow-up and counseling of patients with non-metastatic disease, where it demonstrates the highest clinical utility and predictive accuracy. For treatment, primary surgical resection was associated with improved survival outcomes, reinforcing the role of surgery as the cornerstone of BMC treatment. Our analysis revealed that surgery and radiotherapy were significantly associated with better survival outcomes. However, considering the retrospective nature of the SEER data and potential confounding by indication—where patients in better health are more likely to receive aggressive treatment—these findings should be interpreted as prognostic associations rather than definitive evidence of treatment efficacy. Furthermore, socioeconomic factors, such as household income, were retained in the model, suggesting that disparities in healthcare access and treatment availability may influence survival outcomes. The inclusion of these variables enhanced the clinical applicability of our CS-nomogram, ensuring its relevance in diverse patient populations.
In clinical practice, the CS-nomogram provides a dynamic prognosis by adjusting baseline risk estimates as follow-up progresses. Clinicians first derive a total point score from baseline clinicopathological variables to establish the initial survival probability (St). For a patient who has already survived x years, the conditional probability of surviving an additional y years is determined by the ratio CS(y∣x)=S (x+y)/S(x). This methodology offers a distinct clinical advantage over traditional static models by accounting for the survival benefit accrued with each year of survival. Importantly, the risk stratification in this study was based on baseline characteristics at diagnosis and therefore represents a static classification. The CS-nomogram does not reassign patients into new risk categories over time; rather, it quantifies the dynamic improvement in survival probability conditional on having survived a given duration. As survival time increases, patients initially classified as high risk may experience substantial gains in conditional survival, in some cases approaching the prognosis of lower-risk groups. This phenomenon reflects a progressive attenuation of baseline risk over time, highlighting the clinical value of conditional survival in providing more optimistic and individualized prognostic information for long-term survivors. Future studies incorporating time-updated covariates and dynamic modeling strategies are warranted to enable formal prognostic re-classification. Furthermore, to enhance clinical applicability, the nomogram is designed for integration into Clinical Decision Support systems within electronic health records. By embedding model coefficients into automated computational pipelines, individualized survival estimates can be dynamically updated during follow-up visits based on time since diagnosis. Furthermore, an open-access R-Shiny-based web calculator is under development to facilitate external use, and full model coefficients are provided in the Supplementary Materials to ensure reproducibility and computability.
Several limitations must be acknowledged in our study. First, the SEER database lacks granular data on treatment (e.g., surgical margins, radiation doses) and key pathological markers (e.g., PNI, LVI, HPV status, and tobacco/alcohol history), which are vital prognostic drivers. And our U.S.-based cohort may not fully represent high-incidence regions like South Asia, where betel nut use and distinct molecular profiles (such as TP53 mutations or p16 status) significantly influence disease behavior. Consequently, our tool is currently optimized for a Western demographic, making external validation in geographically diverse cohorts a critical next step to establish global generalizability. Second, as a retrospective study, our findings are subject to inherent selection biases and unmeasured confounders. Finally, while our CS-nomogram was internally validated, external validation using independent datasets is necessary to further establish its generalizability and clinical utility. Future research should focus on updating prognostic variables, including molecular biomarkers and advancements in treatment strategies, to enhance prognostic accuracy and support the development of more targeted therapeutic approaches.
Conclusions
In conclusion, our study provided a novel, dynamic perspective on BMC prognosis through CS analysis and the development of a CS-nomogram. By incorporating time-dependent survival probabilities, our model offered superior predictive accuracy compared to traditional static models, enabling more personalized prognostication and risk stratification. These findings hold significant implications for clinical decision-making, patient counseling, and the optimization of therapeutic strategies in BMC management.
Supplemental material
Supplemental material - Conditional survival analysis May enhance prognosis estimate in buccal mucosa carcinoma
Supplemental material for Conditional survival analysis May enhance prognosis estimate in buccal mucosa carcinoma by Shuiming He, Zhihao Yang, Weirong Sang, Fujiang Du, Pengna Zhu in Digital Health.
Footnotes
Ethical considerations
This study was reviewed and approved by the Institutional Review Board (IRB) of Shaoxing Central Hospital (Approval No.: 2026-030-001). Given the retrospective nature of the study and the use of de-identified data, the requirement for informed consent was waived by the IRB.
Consent to participate
Our study did not involve direct contact with human participants, and all data used were de-identified and obtained from a public database.
Author contributions
SH and PZ conceived the study; SH and ZY performed data extraction and analysis; WS and FD contributed to the interpretation of results; SH drafted the manuscript; PZ provided final approval.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
