Abstract
Background:
The 2015 American Thyroid Association (ATA) Risk Stratification System for differentiated thyroid cancer (DTC) is designed to predict recurring/persisting disease but not survival. Earlier studies evaluating this system evaluated the 2009 edition, comprised a low number of patients with ATA high-risk, had low numbers of patients with follicular thyroid cancer (FTC), or did not distinguish between papillary and FTC. Therefore, we evaluated the prognostic value of the 2015 ATA Risk Stratification System in a large population of high-risk thyroid cancer patients, which included a substantial proportion of FTC patients.
Methods:
We retrospectively studied adult patients with DTC who were diagnosed and/or treated at a Dutch university hospital between January 2002 and December 2015. All patients fulfilled the 2015 ATA high-risk criteria. Overall survival and disease-specific survival (DSS) were analyzed using the Kaplan–Meier method. Logistic regression and Cox proportional hazards models were used to estimate the effects of DTC subtype and ATA high-risk criteria on response to therapy, recurrence, as well as survival.
Results:
We included 236 patients with high-risk DTC (32% FTC) with a mean age of 56 years. Median follow-up was 6 years. At final follow-up, 69 patients (29%) had excellent response, while 120 (51%) had structural disease. All high-risk criteria, except large pathologic lymph nodes, were inversely related to excellent response and positively related to structural disease at final follow-up. During follow-up, 14% of the 79 patients who achieved excellent response developed a recurrence. Finally, 10-year DSS was much higher in the initial excellent response than in the initial structural disease group (100% vs. 61%, respectively).
Conclusions:
In a population of high-risk DTC patients harboring a large subset of FTC patients, the 2015 ATA Risk Stratification System is not only an excellent predictor of persisting disease but also of survival. As much as 14% of the high-risk patients who had an excellent response upon dynamic risk stratification experienced a recurrence during follow-up. Clinicians should thus be aware of the relatively high recurrence risk in these patients, even after an excellent response to therapy.
Introduction
The worldwide incidence of differentiated thyroid cancer (DTC) has been steadily increasing over the last two decades (1,2). This seems partly due to increased diagnosis of indolent tumors, and since mortality has remained stably low for DTC, a less aggressive therapeutic approach seems more appropriate (1 –3). To optimize the need for aggressive therapy and follow-up strategies, different systems that predict the risk of recurrence and survival in patients with DTC have been proposed. While the American Joint Committee on Cancer (AJCC)/Tumor Node Metastasis (TNM) Staging System has been designed to predict disease-specific survival (DSS) (4 –8), the American Thyroid Association (ATA) and European Thyroid Association (ETA) Risk Stratification Systems have been designed to estimate the risk of disease recurrence (3,9). The ATA Risk Stratification System is widely used and several studies have shown its usefulness in predicting disease recurrence (10 –15) and even DSS (11,16). However, these studies comprised relatively small proportions of ATA high-risk patients (14,15), evaluated the previous 2009 edition (10 –13), only comprised patients with papillary thyroid cancer (PTC) (13 –15), or had low numbers of patients with follicular thyroid cancer (FTC). Furthermore, these studies did not distinguish between PTC and FTC (10 –16) although FTC can manifest very differently from PTC in several ways, that is, lymph node metastases are less common in FTC, and patients are usually older with more often distant metastases at initial presentation (17). The aim of our study was therefore to evaluate the prognostic value of the 2015 ATA Risk Stratification System in a large population of ATA high-risk thyroid cancer patients and to compare PTC and FTC.
Materials and Methods
Study population and clinical outcomes
We retrospectively included all patients, ages 18 years or older, who were diagnosed and/or treated for either PTC or FTC (including Hurthle cell carcinoma [HCC]) at the Erasmus Medical Center, Rotterdam, The Netherlands, between January 2002 and December 2015. All patients fulfilled the 2015 ATA high-risk criteria (3), that is, macroscopic invasion of the tumor into the perithyroidal soft tissues (gross extrathyroidal extension [ETE]), incomplete tumor resection, distant metastases or postoperative serum thyroglobulin level (Tg) suggestive for distant metastatic disease, any metastatic lymph node larger than 3 cm, or FTC with extensive vascular invasion. Besides, all patients underwent thyroid surgery. From patient records, we obtained demographic, disease, treatment, response to therapy, recurrence, and mortality characteristics. Demographical variables included age at diagnosis, sex, and year of diagnosis. Disease characteristics included disease type, TNM-stage (8th edition), presence/absence of multifocal disease, and minor/gross ETE. Data regarding treatment consisted of extent of surgery, use of radioactive iodine (RAI), and use of other treatment modalities (e.g., external beam radiotherapy).
Response to therapy was defined according to the four 2015 ATA responses to therapy categories and was continually assessed during follow-up (i.e., dynamic risk stratification [DRS]) (3). Patients were considered to have an excellent response to therapy, that is, no evidence of disease (NED), if they had a suppressed Tg <0.2 ng/mL or thyrotropin-stimulated Tg <1 ng/mL, no detectable antibodies, and no evidence of structural disease on imaging. Patients were considered to have a biochemical incomplete response if they had a suppressed Tg ≥1 ng/mL or stimulated Tg ≥10 ng/mL or rising anti-Tg antibody levels, and no evidence of structural disease on imaging. Patients were considered to have a structural incomplete response if they had structural evidence of disease on imaging. And finally, patients were considered to have an indeterminate response if they had a nonstimulated Tg <1 ng/mL or a stimulated Tg <10 ng/mL, declining or stable anti-Tg antibody levels. Persistent disease was defined as either a structural or a biochemical incomplete response. Response to therapy was recorded for the first time at 6 to 18 months after the first therapy; thereafter during and at end of follow-up. A recurrence was defined as a new biochemical, structural, or functional disease after longer than 12 months of NED. Time to last follow-up, survival status, and date and cause of death were recorded. Survival was defined as the time of initial diagnosis to last date of follow-up, death, or end of study (December 2017), whichever occurred first. Cause of death was obtained from hospital or general practitioner records. Patients with extensive or rapidly progressive thyroid cancer and no other clear cause of death were classified as death from thyroid cancer. The study protocol was approved by the Institutional Review Board of the Erasmus Medical Center.
Statistical analysis
For continuous variables, means and standard deviations, or medians with interquartile ranges were calculated. For categorical variables, absolute numbers with percentages were recorded. Differences in characteristics between PTC and FTC were assessed using the Student's t-test or χ2-test. For DTC, overall survival (OS) and DSS were analyzed using the Kaplan–Meier method, and compared across response to therapy categories using the log-rank test. The same analyses were also performed for PTC and FTC separately. In addition, we compared FTC and HCC based on evidence indicating that HCC is not a subtype of FTC (18,19). Univariate and multivariate logistic regression or Cox proportional hazards models were used to examine the effect of DTC subtype and the different ATA high-risk criteria on response to therapy, developing NED, recurrence, or survival. Data on ATA high-risk criteria were missing in 3% of the values. Due to this low percentage, a patient was left out from the corresponding analysis if a value was missing. p-Values below 0.05 were considered significant. All analyses were performed using SPSS Statistics for Windows (version 24.0).
Results
Population characteristics
During the study period, a total of 255 patients were eligible for the study. Nineteen patients were excluded because they had insufficient follow-up information. Therefore, the analyses presented here were performed in the remaining 236 patients.
Table 1 lists the characteristics of the study population. Mean age was 56.3 years, and 148 (63%) were women. PTC was present in 160 (68%) patients, and the remaining 76 patients (32%) had FTC, including 29 patients (38%) with HCC. Median follow-up time was 72 months, and during follow-up, 70 patients (30%) died, of whom 49 (70%) due to thyroid cancer. Total or hemithyroidectomy was performed in all patients, and 227 patients (96%) received radioiodine therapy [68 (29%) once, 76 (32%) twice, and 82 (35%) received more than 2 therapies]. Neck dissection was performed in 105 patients (45%). Patients with FTC were significantly older (64.1 years vs. 52.6 years; p < 0.001), had significantly larger tumors at presentation (5.0 cm vs. 3.0 cm; p < 0.001), and received a lower cumulative RAI dose (195 mCi vs. 298 mCi; p = 0.019) than those with PTC. In addition, patients with HCC were significantly more often male than those with FTC (Supplementary Table S1).
Characteristics of the Study Population
Significant p-values displayed in bold.
Values are mean (±standard deviation), medians (25–75 IQR), or numbers (%).
p-Value comparing PTC and FTC.
AJCC, American Joint Committee on Cancer; DTC, differentiated thyroid cancer; FTC, follicular thyroid cancer; HT, hemithyroidectomy; IQR, interquartile ranges; PTC, papillary thyroid cancer; RAI, radioactive iodine; TKI, tyrosine kinase inhibitor; TNM, tumor–node–metastasis; TT, total thyroidectomy.
At diagnosis, 78 patients (33%) had distant metastases, with lung and bone as most common sites. PTC patients had significantly more often gross ETE and large pathological lymph nodes than FTC patients (Supplementary Table S2). Furthermore, no differences between HCC and FTC were seen. Finally, the majority of patients had either 1 (41%) or 2 (31%) high-risk factors (Fig. 1).

Number of American Thyroid Association high-risk criteria per disease type.
Response to therapy and survival
Seven patients (3%) died within six months after initial therapy, precluding assessment of initial response to therapy in these patients. After initial therapy, the majority of the remaining 229 patients continued to have structural disease (51%), while an excellent response was seen in only 38 patients (17%). The other patients had either a biochemical incomplete (7%) or an indeterminate (26%) response. These percentages were similar for PTC and FTC separately (Table 2), and also for HCC and FTC (Supplementary Table S3). For DTC in general, as well as for PTC and FTC separately, the initial response to therapy category was significantly related to OS and DSS (both p < 0.001). Patients with an initial excellent response had the best prognosis, followed by indeterminate, biochemical incomplete, and structural incomplete responses (Fig. 2). This same pattern was seen for PTC and FTC (Supplementary Figs. S1 and S2), and also for HCC and FTC separately. None of the patients with an initial excellent response died from thyroid cancer during follow-up.

Kaplan–Meier curves for OS (
Response to Therapy After First Therapy
Values are numbers (%).
Seven patients were excluded due to death precluding initial response to therapy assessment.
p-Value comparing PTC and FTC.
At the end of follow-up, 69 patients (29%) had an excellent response, while 120 patients (51%) still had structural disease (Table 3). The other patients had either a biochemical incomplete (4%) or an indeterminate (16%) response. In the majority of patients, structural disease was confined to either the neck region, or present as lung or bone metastases. Patients with FTC had more often structural disease than those with PTC (63% vs. 45%; p = 0.01), while no differences between FTC and PTC were seen in the excellent response group. No differences between HCC and FTC were seen (Supplementary Table S4). Of the patients with structural disease after initial therapy, 74% still had structural disease at final follow-up, while 12 patients (10%) had an excellent response at final follow-up.
Response to Therapy at End of Follow-Up
Significant p-values displayed in bold.
Values are numbers (%).
p-Value comparing PTC and FTC.
During follow-up, 79 patients (35%) achieved NED after a median of 22 months. In 11 patients (14%) who achieved NED, a recurrence occurred during follow-up after a median of 47 months. Both these percentages were similar for PTC and FTC, and also no differences were seen taking time into account (see Fig. 3 for recurrence). Also, no differences between HCC and FTC were seen. Of the 11 patients with a recurrence, there were 2 patients with elevated Tg-levels, while 2 patients had local, 3 had distant, and 4 had both local and distant disease. Furthermore, 2 of these 11 patients with a recurrence died from thyroid cancer.

Kaplan–Meier curves for recurrence for PTC and FTC separately. FTC, follicular thyroid cancer; PTC, papillary thyroid cancer.
Risk factors
In a univariate analysis, the presence of distant metastases or an elevated postoperative Tg increased the risk of having persistent disease and not having an excellent response after initial therapy (Supplementary Table S5). All ATA high-risk criteria, except any metastatic lymph node larger than 3 cm, increased the risk of having persistent disease and not having an excellent response at end of follow-up (Supplementary Table S6). The presence of gross ETE or distant metastases resulted into an increased all-cause and thyroid cancer-specific mortality (Supplementary Table S7). Distant metastases or an elevated postoperative Tg also resulted in a lower risk of developing NED during follow-up (Supplementary Table S8), while gross ETE led to a higher risk of recurrence (Supplementary Table S9).
Also after adjusting for age, sex, and the other ATA high-risk criteria, the presence of distant metastases or an elevated postoperative Tg increased the risk of having persistent disease and not having an excellent response after initial therapy (Supplementary Table S10). Gross ETE and elevated postoperative Tg still increased the risk of having persistent disease and not having an excellent response at end of follow-up in the multivariate analysis, while the other criteria were not significantly associated anymore (Supplementary Table S11). None of the ATA high-risk criteria influenced OS or DSS in the multivariate analysis (Supplementary Table S12). Distant metastases and elevated postoperative Tg still resulted in a lower risk of developing NED during follow-up (Supplementary Table S8). Because of insufficient events, no multivariate analysis with respect to recurrence was performed.
Discussion
This study shows that, in a population of patients with high-risk DTC, the 2015 ATA Risk Stratification System is an excellent predictor for persisting disease as well as for survival. At the end of follow-up, half of the population still had structural disease, while one-third showed excellent response. Fourteen percent of the patients with an excellent response experienced a recurrence later during follow-up.
We observed that the majority of the patients still had structural disease (51%) after initial therapy. This percentage is similar to the study of Pitoia et al. (13), who used the 2009 version of the ATA Risk Stratification System in patients with PTC. However, two other studies found lower percentages of structural disease using the 2015 version (14,16). We show that particularly initial distant metastases at presentation increase the risk of having persistent structural disease. This might be the reason for these differences as the percentage of patients with distant metastases in our study was more than twice the percentage of the other two studies (33% vs. 15%) (14,16). An excellent response after initial therapy was seen in 17% of our patients, which is similar to earlier studies using the 2009 version (10,11). We found no differences between PTC and FTC, and between HCC and FTC, regarding the initial response to therapy categories.
One-third of the patients had NED at final follow-up, whereas half of them still had structural disease. These percentages are similar to the studies of Shah and Boucai (16) using the 2015 version, and Pitoia et al. (13) using the 2009 version. On the other contrary, both Vaisman et al. (11) and Tuttle et al. (10) showed lower numbers of NED and higher numbers of structural disease using the 2009 version. Explanations for these differences might be the use of different versions of the ATA Risk Stratification System in combination with disease characteristics, for example, gross ETE or elevated postoperative Tg. We show that these two factors increase the risk of having persistent disease and not having an excellent response. However, data on these factors were unfortunately unavailable for the other studies.
We found a recurrence rate of 14% in the high-risk patients who achieved an excellent response according to the DRS defined in the 2015 ATA Guidelines. In contrast, the 2015 ATA Guidelines cite a recurrence rate of 1–4% in DTC patients with an excellent response (3). However, this is predominantly based on recurrence rates in ATA low- and intermediate-risk patients, and previous studies on high-risk patients showed much higher recurrence rates of 14–30% (10,13,14,16). This indicates that the recurrence rates after an excellent response are much higher in high-risk patients than in those with low or intermediate risk, illustrating the importance of careful follow-up in ATA high-risk patients. Furthermore, in a univariate analysis, we show that the presence of gross ETE at initial presentation increases the risk of a recurrence; this might have been caused by the aggressive and invasive nature of these tumors. Due to the low number of events, we were unable to confirm this in a multivariate analysis.
The risk of dying from thyroid cancer in this high-risk population was 21%, which is in line with earlier studies (reporting 15–18%) (10,11,16). We also demonstrate that the response to therapy category determined after initial therapy is a strong predictor of survival in high-risk patients. Our 10-year DSS of 100% in excellent responders is the same as found by Shah and Boucai (16). However, our 10-year DSS was 61% in patients with a structural incomplete response, whereas Shah and Boucai reported a DSS of 28% (16). This difference might be due to the fact that 21% of their patients had poorly DTC; those patients had a significantly higher mortality rate than the other patients in their study.
One of the main strengths of this study is the large number of patients with high-risk thyroid cancer compared with previous studies (10 –12,20); only the recent study of Shah and Boucai (16) had a similar number of patients with high-risk DTC, but they included poorly DTC as well. Furthermore, the relatively high number of FTC patients enabled us to compare PTC with FTC. In addition, our follow-up of 72 months was comparable with earlier studies (10,13,14,16,21). A possible limitation of the study is that patients were recruited from a single tertiary university hospital, which might attract patients with more aggressive disease, especially FTC, because of the availability of advanced treatments. Another limitation might be the inability to perform a multivariate analysis for recurrence because of the low number of events. Finally, because of the retrospective character of the study, our data set was incomplete in 3% of the ATA high-risk criteria values. Furthermore, only 16 patients had insufficient information to determine their ATA risk category, and 19 patients had insufficient follow-up information. It is therefore highly unlikely that such a small proportion would have altered the overall results.
In conclusion, this study shows that, in a population of high-risk patients with DTC harboring a large subset of FTC patients, the 2015 ATA Risk Stratification System is not only an excellent predictor of persisting disease, but also of survival. In addition, as 14% of the high-risk patients with an excellent response to therapy experienced a recurrence during follow-up, clinicians should be aware of this substantial high recurrence risk when treating and following up on these patients.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Supplementary Material
Supplementary Figure S1
Supplementary Figure S2
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
Supplementary Table S5
Supplementary Table S6
Supplementary Table S7
Supplementary Table S8
Supplementary Table S9
Supplementary Table S10
Supplementary Table S11
Supplementary Table S12
