Abstract
Background:
This study was conducted to compare the staging systems for the prediction of long-term disease status in patients with well-differentiated thyroid carcinoma (WDTC), and to find out the earliest postoperative period predictor of long-term persistence/recurrence of disease.
Methods:
Patients with WDTC (n = 356; M age = 41.5 ± 12.7 years) followed for at least five years (12.3 ± 5.0 years) after thyroidectomy and 131I remnant ablation at a tertiary regional hospital in Taiwan were retrospectively studied. Each patient was risk stratified using the American Joint Cancer Committee (stage I–IV) and American Thyroid Association (low, intermediate, and high risk) staging systems after operation and first 131I remnant ablation and using response to initial therapy reclassification (RTR; excellent, indeterminate, biochemical incomplete, and structural incomplete response) system, which is determined 6–24 months after the first 131I ablation. The clinical outcome was defined as no evidence of disease (NED; suppressed thyroglobulin [Tg] <0.5 ng/mL, stimulated Tg <1 ng/mL, and no structural detectable disease), biochemical persistent disease (BPD; suppressed Tg ≥0.5 ng/mL or stimulated Tg ≥1 ng/mL in the absence of structural disease), structural persistent disease (SPD; locoregional or distant metastases with any Tg level), or recurrent disease (RD; biochemical or structural disease identified after a period of NED).
Results:
At the time of final follow-up, 78.4% (n = 279) of the patients had NED, 9.3% (n = 33) had BPD, 10.1% (n = 36) had SPD, and 2.2% (n = 8) developed RD. All three systems could predict the increasing trend of SPD and the decreasing trend of NED with advancing stage of disease. However, the ATA risk estimates could be significantly refined by the RTR system, especially for the ATA high-risk group, in which 29.2% developed SPD/RD during follow-up. The RTR system reduced the likelihood of finding SPD/RD to 3.7% in those demonstrating an excellent response to therapy, and increased the likelihood to 78.6% in those demonstrating a structural incomplete response. Among the earliest postoperative factors, only the Tg level at the first 131I ablation could predict long-term persistence/recurrence.
Conclusions:
The results highly support incorporating the RTR system to modify the initial risk estimate during follow-up among Chinese patients with WDTC.
Introduction
W
Optimal management of thyroid cancer requires an individualized risk assessment in which the intensity of therapy and follow-up is tailored to patients with different risks of recurrence or persistent disease. Over the years, many physicians have traditionally relied on clinicopathologic anatomic staging systems such as the tumor-node-metastasis (TNM) staging system of the American Joint Committee on Cancer (AJCC) to classify patients into groups with differing risks of disease-specific survival (11). While the published guidelines of the American Thyroid Association (ATA) (12) endorse the use of the TNM staging system for predicting disease-specific mortality, it also outlines an additional staging system designed to predict the risk of persistent/recurrent disease better. Moreover, a response to initial therapy reclassification (RTR) system that can change over time to modify initial risk estimates was recommended by the ATA in 2015 (13) because initial staging systems were not capable of using new data obtained during the course of follow-up to modify the initial risk estimate.
These risk-stratification systems have been validated in cohorts of differentiated thyroid cancer patients in Argentina (14), Brazil (15), Italy (8), and the United States (7), confirming its clinical applicability across a wide spectrum of patients and healthcare systems. However, there are few studies using these systems in Asian populations. Thus, the purpose of this study was to compare the AJCC, ATA, and RTR staging systems for the prediction of long-term disease status in Chinese patients with WDTC who received 131I ablation therapy after total thyroidectomy. Further, by looking into the dynamic process during follow-up of the patients, the earliest postoperative risk factors that have long-term predictive value were investigated.
Materials and Methods
Subjects
Four hundred and eighty-two consecutive patients with WDTC, who were treated between October 1987 and December 2009 by a total or near-total thyroidectomy followed by 131I ablation therapy, were studied. As this was a long-term outcome study, patients who had been followed for at least five years after ablation therapy were required. In this series, 102 patients were excluded from the study because the follow-up was less than five years, and 24 patients were excluded because of the presence of positive antithyroglobulin (anti-Tg) antibodies. Finally, 356 patients were included in the study.
This study was conducted according to the guidelines of the Declaration of Helsinki, and the research protocol was approved by the Ethics Committee of Chang Gung Memorial Hospital.
Treatment protocol
All patients had undergone total thyroidectomy. If cervical lymph node (LN) involvement was suspected clinically prior to surgery or based on an abnormal aspect at the time of surgery, they would undergo simultaneous surgical dissection of central compartment (level VI) neck LNs, or combined ipsilateral jugulocarotid, supraclavicular, and supraomohyoid compartment (level III and IV) dissection. A first dose of 131I after total thyroidectomy was administered for remnant ablation, treatment of known persistent disease in the neck, or metastatic sites with activities of 30–200 mCi (M = 93 mCi). At that time, the thyrotropin (TSH) level was >30 mIU/mL, and serum Tg was measured (ablation-Tg).
Follow-up
Patients were usually followed every three months during the first year and at three- to six-month intervals thereafter based on the risk of the individual patient and the clinical course of the disease. During follow-up, all patients were treated with L-thyroxine (LT4) for TSH suppression, and serial Tg and TgAb were measured every three to six months. Serial physical examinations accompanied by neck ultrasound (US), 131I whole-body scans (WBS), radiography, and serum Tg assays were performed to detect possible persistent, recurrent, or metastatic lesions. Most patients (n = 261) included in the study had undergone diagnostic 131I scans during the 6–24 months after initial ablation (total thyroidectomy and radioiodine remnant ablation) therapy with concurrent measurement of stimulated serum Tg and anti-Tg levels.
Risk stratification
Postsurgical risk assessment
All patients were staged in accordance with the TNM staging system of the AJCC 7th edition (11). After initial surgery and the first ablation 131I therapy, patients were stratified into three groups in terms of risk of recurrence (low, intermediate, and high risk) according to the 2009 revised ATA guidelines (12). Low-risk patients met the following criteria: no local or distant metastases, all macroscopic tumor resected, no tumor invasion of locoregional tissues or vascular invasion, no aggressive tumor histology (e.g., tall cell, insular, columnar cell carcinoma), and, if 131I was given, no 131I uptake outside the thyroid bed on the post-ablation WBS. The intermediate-risk patients had a tumor with aggressive histology or vascular invasion and either microscopic invasion of the tumor into the perithyroidal soft tissues at initial operation, cervical LN metastases, or 131I uptake outside the thyroid bed on the WBS performed after 131I remnant ablation. High-risk patients met the following criteria: macroscopic tumor invasion, incomplete tumor resection, distant metastases, and thyroglobulinemia disproportionate to what was seen on the post-treatment WBS. In this study, the cutoff value was set at 10 ng/mL
Delayed risk stratification
The response to initial therapy was based on physical examination, TSH-stimulated serum Tg measurement with or without diagnostic WBS, suppressed Tg measurement (stimulated Tg measurement unavailable), and neck US and/or cross-sectional imaging at 6–24 months of follow-up. Patients with a suppressed serum Tg <0.5 ng/mL or a stimulated Tg <1 ng/mL and negative imaging were defined as having an excellent response. Patients were considered to have a biochemical incomplete response if they had negative imaging studies with a suppressed Tg ≥1 ng/mL or a stimulated Tg ≥10 ng/mL. Patients with any evidence of disease on structural or functional evaluation with any serum stimulated or suppressed Tg level were defined as having a structural incomplete response. An indeterminate response was defined as nonspecific findings on imaging studies, faint uptake in the thyroid bed on radioactive iodine scanning, a suppressed Tg that was detectable but <1 ng/mL, or a stimulated Tg that was ≥1 and <10 ng/mL.
Clinical endpoints definition
The clinical outcome at last follow-up was defined as no evidence of disease (NED; suppressed Tg <0.5 ng/mL, stimulated Tg <1 ng/mL, and no structural detectable disease), biochemical persistent disease (BPD; suppressed Tg ≥0.5 ng/mL or stimulated Tg ≥1 ng/mL in the absence of structural disease), structural persistent disease (SPD; locoregional or distant metastases with any Tg level), or recurrent disease (RD; biochemical or structural disease identified after a period of NED).
Assays
Serum Tg levels were measured by immunoradiometric assay (Tg IRMA; CIS-Bio International, Bagnols-sur-Cèze, France). The inter-assay coefficients of variation were 13.8%, 7.6%, and 4.3% at low, median, and high concentrations, respectively. The intra-assay coefficients of variation were 7.2%, 2.6%, and 1.8% at low, median, and high concentrations, respectively. The detection limits of analytical and functional sensitivity were 0.2 and 0.5 ng/mL, respectively. Tg autoantibodies (TgAb) were measured by a competitive radioimmunoassay (TgAb RIA CT; Biocode, Liège, Belgium); the sensitivity was 15 IU/mL. Patients with positive TgAb were excluded from analysis.
Statistics
The clinical and laboratory data, which are reported as the mean ± standard deviation (SD) values or as absolute numbers and percentages (categorical variables), were compared using an unpaired Student's t-test or chi-square test, as appropriate. Clinical parameters significant at 0.05 levels on univariate analysis were entered into a multivariate logistic regression analysis in which the odds ratio (OR) and the confidence interval (CI) were determined to evaluate their contributions to SPD/RD. The Akaike information criteria (AIC) within a Cox proportional hazard regression model were used to demonstrate the discriminatory ability of each risk-classification system. A smaller AIC value indicates a more desirable model for predicting outcome. All analyses were performed using IBM SPSS Statistics for Windows v21 (IBM Corp., Armonk, NY), and a p-value of <0.05 was considered to be statistically significant.
Results
Patient characteristics
Three hundred and fifty-six patients were finally included. Patient and tumor characteristics are presented in Table 1. The study population was largely composed of patients with papillary thyroid carcinoma (PTC; 93%; 331/356). The ratio of women to men was approximately 4:1, and the mean age was 41.5 years (SD = 12.7 years). The mean ablative dose of radioactive iodine was 93.0 mCi (SD = 26.0 mCi). Furthermore, most patients had thyroid bed uptake alone on the post-ablation WBS (92.1%; 328/356).
Data are presented as n (%) or mean ± standard deviation.
WBS, whole-body scan; AJCC, American Joint Cancer Committee; ATA, American Thyroid Association.
The tumor stages according to the TNM classification of the AJCC were stage I for 240 (67.4%) patients, stage II for 63 (17.7%) patients, stage III for 12 (3.4%) patients, and stage IV for 41 (11.5%) patients. The risk of recurrence according to the three-level stratification suggested by the ATA was low risk for 182 (51.2%) patients, intermediate risk for 61 (17.1%) patients, and high risk for 113 (31.7%) patients.
The mean primary tumor size was 2.7 ± 1.2 cm, with those with follicular thyroid carcinoma (FTC) having tumor sizes larger than those with PTC (3.5 ± 1.6 vs. 2.6 ± 1.2 cm; p < 0.001). Ninety-two (25.8%) patients (91 with PTC and one with FTC) had pathologically proven LN metastasis. Sixty-five (18.3%) patients had multifocal disease, and 45 (12.6%) patients had advanced disease with local invasion. The patients with PTC and FTC had similar ratios of multifocal disease (18.9% vs. 12.5%; p = 0.59) and local invasion (11.9% vs. 25.0%; p = 0.10). Cervical LN dissection was not systematically done but was performed in 112 (31.5%) patients on the basis of clinical LN involvement or an abnormal aspect at the time of surgery.
Comparison of AJCC, ATA, and RTR systems
First, the ability of the AJCC staging system to predict the clinically relevant endpoints of NED, BPD, SPD, and RD in the entire group of patients was evaluated (Table 2). The AJCC stage IV patients were less likely to be NED (53.6%; p < 0.001) and more likely to have SPD (29.3%) than stage I–III patients (p < 0.001). The likelihood of developing RD detected after a period of NED was higher in patients with AJCC stages III (8.3%) than it was in patients with stages I and II (p < 0.001).
NED, no evidence of disease; BPD, biochemical persistent disease; SPD, structural persistent disease; RD, recurrent disease.
When classified based on the ATA risk stratification system, 0.5% of the low-risk patients, 4.9% of the intermediate-risk patients, and 28.3% of the high-risk patients (p < 0.001) had SPD during follow-up (Table 2). In addition, 3.9% of low-risk, 8.2% of intermediate-risk, and 18.6% of high-risk patients had BPD. The high-risk patients were less likely to be NED (52.2%) and were more likely to have SPD (28.3%) than low- or intermittent-risk patients (p < 0.001).
Re-stratification based on the RTR system showed a gradation of risk across the categories with regard to the likelihood of being NED at final follow-up, the likelihood of having SPD or BPD at final follow-up, and the likelihood of having RD during follow-up. The highest and lowest likelihood of being NED at final follow-up was seen in patients with excellent response (92.3%) and structural incomplete response (13.3%), respectively. The likelihood of having SPD ranged from 0.5% in excellent responders to 80.0% in structural incomplete responders.
Persistent or recurrent disease developed in 8.4% of patients classified as AJCC stage I, 3.8% of ATA low-risk patients, and 3.4% of RTR excellent-response patients. Still, in Table 2, the lowest stages of each system were eventually classified as NED at the end of follow-up in 83.3% of the AJCC stage I, 92.3% of the ATA low-risk, and 92.3% of the patients with excellent response to initial therapy in the RTR system. The AIC value was lowest in the RTR system, followed by the ATA stratification system and then the AJCC staging system. This indicates that the RTR system had the best discriminatory ability in these patients.
The ATA risk estimates could be significantly refined by the RTR system. The development of SPD/RD in the ATA high-risk group was 29.2% during follow-up, while the RTR system reduced the likelihood of finding SPD/RD to 3.7% in those demonstrating an excellent response to therapy, and increased the likelihood to 78.6% in those demonstrating a structural incomplete response (Table 3). The development of NED was 52.2% in ATA high-risk patients, while the RTR system increased the likelihood of NED to 85.2% in those demonstrating an excellent response to therapy, and reduced the likelihood to 14.3% in those demonstrating a structural incomplete response (Table 4).
Predictive factors for SPD/RD
Furthermore, the baseline characteristics were analyzed to find out the early predictors for SPD/RD (Table 5). In univariate analysis, the following parameters were significantly associated with SPD/RD: sex, the presence of LN metastasis or local invasion, dissection of neck LN, and first ablation stimulated Tg. When the data were re-examined by multivariate analysis, only a stimulated Tg >10 ng/mL at ablation (OR = 9.13 [CI 4.10–20.36]; p = 0.001) remained as an independent prognostic factor.
OR, odds ratio; CI, confidence interval.
Discussion
To the best of the authors' knowledge, this is the first report using these risk-stratification systems for analysis in an Asian population. The data demonstrate that all three stratification systems could effectively predict long-term outcomes in patients with WDTC treated with total thyroidectomy and radioactive iodine ablation over a 12.3-year follow-up period. There was a progressive decrease of NED from stage I to IV in the AJCC classification, from the low- to the high-risk group in the ATA stratification system, and also from the excellent response to structural persistent disease in the RTR system. Also, there was an increased proportion of SPD from lower to higher stages in each of the classification systems. Thus, these risk-stratification systems could be effectively applied for predicting long-term outcomes in patients with thyroid cancer in Chinese populations. However, a superiority of the RTR system is recognized, especially for patients initially classified as ATA high risk.
In clinical practice, the RTR system is used to modify initial risk estimates provided by the ATA risk-stratification system. As can be seen in Tables 3 and 4, the initial risk estimates were significantly modified based on response to therapy assessments using data obtained during the 6–24 months of follow-up. The change in risk estimates were most apparent when an excellent response to therapy significantly decreased the risk of having SPD/RD and increased the likelihood of NED in patients initially classified as ATA intermittent and high risk. Also, a structural incomplete response increased the risk of SPD/RD to 78.6% and decreased the likelihood of NED to 14.3% in high-risk patients. These findings are similar to previous studies (7,15).
This concept of “ongoing risk stratification” has been validated by the ATA (13) as a new tool to assess the individual risk of recurrent disease in WDTC patients treated with total thyroidectomy and radioiodine remnant ablation. While the initial staging systems (AJCC system and ATA re-stratification system) can be informative in guiding therapeutic and early diagnostic follow-up strategy decisions, the RTR system that incorporates individual response to therapy during follow-up can provide an individualized approach to real-time and ongoing management. The present results are in agreement with and reinforce these recommendations.
Furthermore, the data in the present series show that patients with indeterminate response tend to have a better long-term outcome than patients with a biochemical incomplete response (Table 2). This result is similar to previous studies (15 –17). Pitoia et al. (16) showed the stimulated Tg value would decrease to <1 ng/mL in all patients whose initial Tg levels were between 1 and 2 ng/mL. Lamartina et al. (17) also reported that negativity of the stimulated Tg value (defined as ≤1 ng/mL) was significantly more common in the subgroup with indeterminate response than that with incomplete biochemical response.
This study found that an elevated stimulated serum Tg level >10 ng/mL, taken at time of first ablation therapy after total thyroidectomy, was associated with SPD/RD in both univariate (p < 0.001) and multivariate (p < 0.001) analyses. Ronga et al. (18) reported that Tg levels 40 days after total thyroidectomy provide a useful early diagnostic indication of metastatic disease, despite the presence of a postsurgical remnant. Kim et al. (19) reported that serum Tg levels measured at the time of immediate postoperative 131I remnant ablation correlated well with serum Tg levels at the time of the initial diagnostic WBS and had a complementary role for predicting persistence or recurrence of disease.
In conclusion, the present data support the concept that dynamic risk assessment based on response to therapy is an effective tool to personalize the treatment plan for patients with WDTC. Furthermore, the stimulated serum Tg at the first 131I ablation is a good early predictor of SPD/RD for long-term prognosis.
Footnotes
Author Disclosure Statement
The authors have nothing to disclose. There is no conflict of interest in this study. No competing financial interests exist.
