Abstract
Background:
The current American Joint Committee on Cancer 8th edition staging system on thyroid cancer describes outcomes for populations of patients with well-differentiated thyroid cancer (WDTC) and not individual patients. The aim of this study was to create a clinical nomogram that can be used to predict survival in individual patients with WDTC.
Methods:
A single institutional cohort of 8535 patients with WDTC treated with primary surgery at the Memorial Sloan Kettering Cancer Center was used to create a predictive nomogram for disease-specific survival (DSS) as a retrospective cohort study. The nomogram was created using DSS as the dependent variable, and the independent variables used were sex, age, pathology subtype, and TNM stage. An external validation cohort of 519 patients from three different international centers was used to assess the accuracy and generalizability of the nomogram.
Results:
Sex, age, pathology subtype, T stage, N stage, and M stage were significant predictors of DSS on univariable analysis. The nomogram created using all these variables showed an extremely high concordance index (0.963; SE 0.012). This nomogram was validated on the external patient cohort with a high concordance index (0.810; SE: 0.070).
Conclusions:
We describe a predictive nomogram that accurately predicts DSS in individual patients with WDTC. The external validation illustrates its generalizability. This nomogram will help in counseling individual patients on prognosis and may identify patients who could benefit from more aggressive therapy.
Introduction
In general, well-differentiated thyroid cancer (WDTC) has a good prognosis. The main factors predictive of outcome include sex, age, advanced T stage, advanced N stage, and distant metastases at presentation. 1 –6 The American Joint Committee on Cancer (AJCC) staging traditionally uses the T stage, N stage, and M stage to categorize the survival of patients into four groups (stages I, II, III, and IV), with stage IV patients having the worst survival outcome. 2 This classification helps healthcare professionals communicate about cancer cases, make treatment decisions, predict outcomes, and compare the efficacy of different treatments. Nonetheless, although thyroid cancer is the only cancer where age is also incorporated into the staging system, it does not incorporate other tumor and host risk factors, which could impact the biological behavior of cancer. In addition, it predicts outcomes of a population of patients and does not predict the survival of individual patients.
Using regression analysis, it is now possible to create nomograms. 7 These statistical models expand beyond standard TNM anatomical criteria by considering other clinical and pathological variables in a systematic, unbiased manner. Nomograms have been proposed to more accurately predict outcomes in patients with multiple cancers, including WDTC. 1,8 –12 They provide a more comprehensive method to assess an individual patient’s prognosis using multiple significant variables, each of which contribute to the prediction. 7,11,13,14 This allows us to obtain an outcome prediction in an individual patient. Therefore, we would be able to recognize more precisely those patients with worse prognosis, and it would be useful in order to give appropriate treatment, follow-up and counseling.
The aim of this study was to create a prognostic clinical nomogram using both tumor and host factors to better predict outcomes in an individual patient with WDTC.
Methods
Study population and variables of interest
After institutional review board approval (IRB 21-252), we performed a retrospective cohort study. The need for patient consent was waived. The main cohort was obtained from a database of 8556 patients with WDTC who received their primary surgery at Memorial Sloan Kettering Cancer Center (MSK) between 1986 and 2020. The tumor variables analyzed were pathology subtype (papillary vs. follicular vs. oncocytic) and TNM staging using the current AJCC 8th Edition. 2 The host variables analyzed were age and sex. A total of 21 patients were excluded due to missing data on TNM staging. The final cohort consisted of 8535 patients (see Supplementary Fig. S1).
An external validation dataset was created by combining patients from three different international centers—Instituto do Cancer do Estado de Sao Paulo (ICESP) from Brazil, Astana Medical University, and Kazakh Institute of Oncology and Radiology (KazIOR), both from Kazakhstan. All centers had IRB approval, and the need for patient consent was waived. ICESP included patients with WDTC who received their initial treatment at that institution between January 2009 and December 2015. Astana Medical University and KazIOR selected patients from the national cancer registry with WDTC who received surgery at one of these institutions between 2013 and 2019. Patients with the available data on all variables of interest were included. In total, 172 patients from ICESP, 262 patients from Astana Medical University, and 85 patients from KazIOR were included in the final external validation cohort (n = 519).
Statistical analysis
We created a nomogram, with disease-specific survival (DSS) as the endpoint. DSS was calculated based on the date of surgery to the last known follow-up date or date of death from thyroid cancer. Baseline covariates, including age, sex, pathology subtype, and TNM staging, were explored to be included in the nomogram based on a multivariable model. Age was used as a continuous variable, and analysis was done using restricted cubic splines. The internal validation of the nomogram was evaluated through two main methods. First, a calibration curve was constructed to visualize how well the predicted probabilities matched the actual probabilities of the outcome. Second, the discrimination of the nomograms was measured using the concordance index (C-index). 15 Discrimination is the ability of a model’s capacity to distinguish between patients with unique outcomes. The C-index value ranges from 0.5 to 1.0, with 0.5 signifying random chance and 1.0 signifying flawless discrimination capability. In addition, for the main cohort, nomograms to predict regional and distant recurrence-free survival were created. For these two nomograms, patients with distant metastasis at diagnosis were excluded.
Patient, tumor, and treatment characteristics were compared between cohorts using the chi-square test for categorical variables. Survival curves were generated using the Kaplan–Meier method, and distinctions in survival were assessed through the log-rank test. Hazard ratios were computed using Cox’s proportional hazard regression model, which was employed for both univariable and multivariable analyses. A p-value of <0.05 was considered statistically significant. Statistical software R (version 3.6.2; R Foundation for Statistical Computing) and Stata v16 were used for analyses.
Results
Characteristics of the cohorts
The characteristics of the main cohort (MSK) and the external validation cohort (External) are shown in Table 1. In the main cohort, most patients (68%) were younger than 55 years old, and 71% were female. The majority of cases were papillary carcinoma (95%) and had early-stage disease (89% Tx/T1/T2, 67% N0 and 99% M0). Additional clinicopathological variables for the main cohort can be found in Supplementary Table S1. The median follow-up time was 54 months (IQR 31–92). The number of events was 76 (<1%). The median survival time was 48 months (IQR 22–77).
Patient and Tumor Characteristics for MSK and External Cohorts
Percents may not add up to 100 due to rounding.
Pearson’s Chi-squared test.
MSK, Memorial Sloan Kettering Cancer Center.
The external validation cohort showed a similar percentage of patients younger than 55 years-old (60%), but a higher percentage of female patients (88%). The majority of patients had papillary carcinoma (96%). With regard to stage, the percentage of Tx/T1/T2 stage was lower (61%) and the N0 stage higher (73%), with most of the cohort being M0 (97%). The median follow-up time was 42 months (IQR 12–78). The number of events was 19 (<5%). The median survival time was 52 months (IQR 38–78).
Prognostic factors in WDTC
We studied the influence of variables known to be associated with survival in thyroid cancer. These included age, sex, pathology subtype, pathological T stage, pathological N stage, and M stage. The Kaplan–Meier survival plots of the different variables are shown in Figure 1. We observed a poorer survival in male patients, older patients, nonpapillary pathologies, and advanced TNM stages. The same analyses for the external cohort are shown in Figure 2, showing similar results. Due to the small cohort size, significant differences were only achieved for pathological T stage, pathological N stage, and M stage in the external cohort.

Kaplan–Meier survival curves for other prognostic factors—

Kaplan–Meier survival curves for other prognostic factors—
Prognostic impact of tumor and host variables on outcomes using univariable and multivariable analyses
We carried out univariable and multivariable analyses in the MSK cohort with DSS as the dependent variable and the previously tested and discussed factors as independent variables (Table 2). All variables tested in the univariable analysis showed statistically significant differences and were included in the multivariable analysis. All variables, except sex, maintained significance in the multivariable analysis. Similar results were found when we carried out the univariable and multivariable analysis in the external validation dataset (Table 3), although only TNM staging showed significant differences in the univariable analysis. Nonetheless, all variables were included in the multivariable analysis to be consistent with that performed in the main cohort. In this case, only T4 and M1 categories maintained significance in the multivariable analysis.
Factors Predictive of Disease
aDSS, disease-specific survival; HR, hazard ratio; CI, confidence interval.
Factors Predictive of Disease-Specific Survival on Cox Proportional Hazards Model Among External Cohort
aDSS, disease-specific survival; HR, hazard ratio; CI, confidence interval.
bNE, no events.
Creation of a nomogram to predict disease-specific survival
Due to the large patient cohort and detailed long follow-up of the MSK cohort of patients, we created a nomogram predictive for 10-year and 20-year DSS using age, sex, pathology subtype, pathological T stage, pathological N stage, and M stage (Fig. 3A). The c-index obtained was 0.963 (SE = 0.012). A graphic example to show how the nomogram works and how we should calculate the individualized risk prediction is shown in Figure 3B

The calibration plot and receiver operator curve (ROC) are shown in Figure 4, demonstrating a good fit and area under the curve. When we tested the nomogram with the external validation, a c-index of 0.810 was obtained (SE = 0.070).

Creation of nomograms to predict regional recurrence-free survival and distant recurrence-free survival
Utilizing the large MSK cohort of patients (only including patients without distant metastasis at presentation; n = 8,433), we performed univariable and multivariable analyses both for regional and distant recurrence, which are shown in Supplementary Tables S2 and S3, respectively. The number of events for regional recurrence was 297 (3.5%), and for distant recurrence was 89 (1.1%).
Next, we created nomograms that predict regional recurrence-free survival (c-index = 0.786, SE = 0.013) and distant recurrence-free survival (c-index = 0.914, SE = 0.015). The nomograms for regional and distant recurrence-free survival are shown in Supplementary Figures S2 and Figure S3, respectively. The corresponding calibration plots are shown in Supplementary Figure S4.
Discussion
The current method for staging patients with WDTC uses the AJCC staging system. However, this system does have many limitations. Patients with the same stage of cancer can have markedly different prognoses based on factors not captured by the TNM system. The TNM system primarily considers the size and extent of the tumor, lymph node involvement, and distant metastasis. However, it does not take into account pathology factors, which can significantly impact prognosis and treatment response. The system also does not consider patient-specific factors, which are crucial for individualized treatment planning. In addition, the system is less effective for staging rare cancers, where the understanding of disease progression and treatment response is limited compared to more common cancers.
Nomograms are a graphical way to assess patients, and by using statistical modeling and quantification of risk of multiple clinical and pathological variables, they can better predict survival outcomes for an individual patient. Nomograms and their associated online clinical outcome calculators are now widely used in multiple different cancer types, including melanoma, breast, and prostate cancer. 7,11,13,14 Their systematic approach also avoids bias from an individual physician or a single aberrant clinical variable. Well-designed nomograms have outperformed the projections of experienced clinicians, 16,17 and have been incorporated into clinical trial inclusion criteria and National Comprehensive Cancer Network (NCCN) guidelines. 18 In the literature, there are only a few studies which report nomograms in WDTC. Some predict the risk of metastasis or extrathyroidal extension, 19 –22 and only a few predict outcomes. 1,23 –25 Of the published nomograms that predict outcomes for WDTC, they are mainly based on data from The Cancer Genome Atlas (TCGA) or Surveillance, Epidemiology, and End Results (SEER) databases, and they include molecular variables that won’t be available routinely, and therefore the results are not as generalizable to a wide population. In addition, they lack validation on external cohorts of patients.
Our group has previously reported a prognostic nomogram using clinical and pathological variables. 1 This nomogram used AJCC TNM 7th edition staging variables as well as age and sex. In our new study, we aimed to develop a nomogram to predict DSS utilizing AJCC 8th edition TNM variables and an updated cohort of over 8500 patients. In addition, this nomogram was validated externally with 3 different centers from distinct geographical areas.
We started by analyzing the different prognostic factors individually and then together in a multivariable analysis. By doing this, we have confirmed the prognostic capacity of age, sex, histological subtype, and TNM staging, all of which maintained significance in the multivariable analysis of the main cohort except for sex, proving that they are independent prognostic factors. Some variables did not reach statistically significant results in the univariable or multivariable analyses of the external validation cohort, most likely due to the limited number of events present in that population.
When analyzing the variable age, several authors have studied its prognostic capacity in WDTC, 1,5,26,27 and most of the classifications use age as a categorical variable. In our study, we analyzed the prognostic capacity of age as a categorical variable in the univariable and multivariable analyses and the corresponding Kaplan-Meier curve. However, we have previously reported that DSS in thyroid cancer progressively declines with advancing age, indicating that age as a prognostic variable should be used as a continuous variable. 1 Hence, in the nomogram, we have used age as a continuous variable to enable us to capture the full spectrum of prognostic value of this relevant variable.
In order to analyze the accuracy of the nomogram, we plotted a calibration curve and a ROC. We show clearly that the calibration plot has a good consistency between the predicted and the observed survival and that the ROC has a good area under the curve. Another way to analyze accuracy is with the c-index. With the nomogram proposed in this study, we achieved an extremely high c-index of 0.963, which nearly reaches the optimal prediction (score of 1). By updating the TNM staging from the 7th to the 8th AJCC Ed, we improved the prediction capacity from a c-index of 0.958 to a c-index of 0.963.
Using an external validation cohort from three international institutions, we also obtained a very high c-index in the validation cohort (c-index = 0.810; SE = 0.070). This allows us to conclude that our nomogram is highly accurate and can be generalized to patients with thyroid cancer in other countries around the world. Moreover, as previously discussed, all the variables are usually available and reported, which makes the nomogram even more generalizable. Therefore, we would advocate using this nomogram routinely, for all patients and in all settings.
The nomogram is of great practical utility as it is able to predict survival outcomes following treatment for individual patients (individualized risk prediction), which allows for better counseling. We have to highlight that all patients used to create these nomograms have been treated according to ATA guidelines. This includes the use of postoperative adjuvant radioactive iodine (RAI). The nomogram is therefore not intended to make a decision on whether or not RAI should be given, but to be useful in two different clinical scenarios. Firstly, to identify patients at low risk of death and low risk of recurrence (after being treated according to ATA guidelines with surgery with or without RAI). Such patients will need less postoperative follow-up and may be discharged earlier, thereby reducing health costs. Secondly, to identify patients at the other end of the spectrum, such as patients at higher risk of death due to the risk of developing distant metastases (after being treated with surgery and RAI). If we can identify such patients, clinical trials using other types of adjuvant therapy, such as RTK inhibitors or immunotherapy, may be designed.
Our study does have inherent limitations due to its retrospective nature. There is the potential for selection bias associated with treatment of thyroid cancer due to physician as well as patient treatment preferences. However, we overcome this bias by having a very large patient cohort of more than 8500 patients, all of whom were treated in a similar fashion by experienced surgeons and endocrinologists with a unified treatment protocol following ATA guidelines. The inclusion of other pathological variables would likely improve the prediction of the model. However, due to the small number of events (death due to thyroid cancer), this was not possible. Future studies utilizing multicenter cohorts of patients with detailed pathological variables may allow for the inclusion of these other variables. Despite these limitations, we have created a predictive nomogram with a very high concordance index, validated on three external cohorts of patients from different countries. This nomogram is the most comprehensive reported so far in the literature. Most importantly, we were able to validate our results with an external cohort from different parts of the world, with different patient characteristics and different health systems, illustrating how generalizable the nomogram is.
Conclusion
Both tumor and host factors should be taken into account when assessing outcomes in cancer patients. We report an externally validated clinical nomogram that accurately predicted outcomes in individual patients with WDTC.
Footnotes
Authors’ Contributions
C.V.: Conception and design, collection and assembly of data, data analysis and interpretation, writing, editing, review; A.E.: Conception and design, assembly of data, data analysis and interpretation, editing, review; D.A.: Collection of data, data analysis and interpretation, editing, review; D.M.: Collection of data, data analysis and interpretation, editing, review; V.H.: Collection of data, review; A.R.S.: Supervision, review; J.P.S.: Supervision, review; R.M.T.: Supervision, review; D.A.: Collection of data, editing, review; R.A.P.: Collection of data, editing, review; L.L.M.: Collection of data, editing, review; L.P.K.: Supervision, review; G.A.: Collection of data, editing, review; R.K.: Collection of data, editing, review; S.G.P.: Conception and design, supervision, review; I.G.: conception and design, supervision, writing, editing, and review.
Author Disclosure Statement
The authors declare no conflicts of interest pertinent to this work. C.V. is listed as an inventor on IP owned by MSK on predictive models for immunotherapy response, unrelated to this study. S.G.P. has a patent PCT/US2016/026717 Methods of Cancer Detection Using PARPI-FL pending, holds equity in Summit Biomedical Imaging, and has a patent US 10,016,238 B2 Apparatus, system, and method for providing laser steering and focusing for incision, excision, and ablation of tissue in minimally invasive surgery, holding equity in ColdSteel Laser Inc., and having a patent PCT/US2014/073053 Systems, methods, and apparatus for multichannel imaging of fluorescent sources in real time have a patent PCT/US2015/065816 Cyclic peptides with enhanced nerve-binding selectivity, nanoparticles bound with said cyclic peptides, and use of same for real-time in vivo nerve tissue imaging and have a patent PCT/US2016/066969 Imaging systems and methods for tissue differentiation, e.g., for intraoperative visualization.
Funding Information
This study was partly funded by the NIH/NCI Cancer Center Support Grant P30 CA008748.
Supplementary Material
Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Figure S4
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
