Implementing the Modified 2009 American Thyroid Association Risk Stratification System in Thyroid Cancer Patients with Low and Intermediate Risk of Recurrence

Abstract

Objective:

The primary purpose of this study was to validate the proposed modified 2009 American Thyroid Association Risk Stratification System (M-2009-RSS) in patients with thyroid cancer and to compare the findings with those of the 2009 ATA Risk of Recurrence (2009 ATA-RR) and the Ongoing Risk of recurrence system. The secondary purpose was to assess which risk stratification system had the best predictive value to foresee the probability of structural incomplete response or the no evidence of disease (NED) status at the end of follow-up.

Subjects and Methods:

This retrospective review included 149 patients with differentiated thyroid cancer who had low and intermediate 2009 ATA-RR and were treated at a single experienced center and followed-up for a median of 6 years (range 3–12 years). Each patient was risk stratified using both the 2009 ATA-RR and the M-2009-RSS. The primary endpoints were 1) the best response to initial therapy defined as either excellent response, biochemical or structural incomplete response, or indeterminate response; 2) clinical status at final follow-up defined as either NED, biochemical incomplete response, structural incomplete response, indeterminate response, or recurrence (biochemical or structural disease identified after a period of NED), and 3) ongoing RR defined as low or high risk according several outcomes after initial treatment.

Results:

Mean age of included patients was 45.3±13 years. Both the ATA 2009-RR and the M-2009-RSS provided clinically meaningful graded estimates with regard to the status of NED at the end of follow-up in low-risk patients (84% for 2009 ATA-RR and 74% for M-2009-RSS) or the likelihood of having persistent structural disease (0% for 2009 ATA RR and 3.6% for the M-2009-RSS). When patients were classified as low risk, the positive predictive value (PPV) and negative predictive value (NPV) to predict structural disease was 0% and 88.7% for the 2009 ATA-RR, 3.6% and 86.5% for the M-2009-RSS, and 1.6% and 68.2% for the ongoing RR (p=0.022 and 0.055 of chi-square test for PPV and NPV, respectively).

Conclusions:

Despite expanding the definition of low risk to include small-volume lymph node metastases, minor extrathyroidal extension, and minimally invasive follicular thyroid cancer, the M-2009-RSS predicts clinical outcomes (structural incomplete response and NED at the end of follow-up) that are very similar to the previously validated 2009 ATA RR classification system.

Introduction

Risk stratification in patients with differentiated thyroid cancer (DTC) has been proposed by the American Thyroid Association (ATA), the European Thyroid Association (ETA), and the Latin American Thyroid Society (LATS) (1 –3). Risk stratification classifications have been validated in several cohorts of patients around the world (4 –7). More recently, several groups have demonstrated that risk stratification can be further improved if the initial risk estimates obtained using either the ATA or ETA risk of recurrence systems are actively modified over time based on response to therapy and course of the disease (5 –8).

After validation of the ATA risk of recurrence classification, new data indicated that patients with low-risk DTC could be a larger group than previously considered (9 –11). Due to the current changes observed in the worldwide classifications of the risk of recurrence in patients with DTC, we aimed to recategorize patients according to these new variables that appeared after the validation of the 2009 ATA RR classification (Haugen et al., 2015 ATA Management Guidelines for Patients with Thyroid Nodules and Differentiated Thyroid Cancer, in review; 12,13). As an example, several studies have shown that the presence of less than five metastatic lymph nodes and/or the presence of micrometastasis (<2 mm) independently of the number of affected lymph nodes and/or minimal extrathyroidal extension (T3) have a probability of recurrence not higher that 5–8% (14 –23). Currently, the 2015 ATA guidelines will probably propose that these group of patients might be considered together as low risk of recurrence (Haugen et al., in review).

Although most of the published studies endorse this new classification (14 –23), the modified 2009 ATA risk stratification system (M-2009 RSS) has not yet been validated. Therefore, the aim of the present study was to describe both early and late clinical outcomes in the same cohort of low and intermediate risk of recurrence DTC patients who were risk stratified according to the 2009 ATA RR classification compared to the modified 2009 ATA risk stratification system from the ATA 2015 guidelines (24). Secondarily, we aimed to evaluate the impact on prediction of the final status after using the ongoing risk of recurrence classification obtained after the initial response to treatment.

Materials and Methods

We retrospectively reviewed our database containing 563 file records of patients with DTC who had been followed from January 2001 to December 2013. To be included in the analysis, patients were required to have undergone total thyroidectomy with or without lymph node dissection and should have received remnant ablation with radioiodine (RAI) after thyroid hormone withdrawal (THW) and being classified as low or intermediate risk of recurrence according to the 2009 ATA RR classification (Table 1) and having a follow-up not less than 3 years after initial treatment (1). Of 288 DTC patients with low and intermediate risk of recurrence evaluated at our center, 139 were excluded because the follow-up was less than 3 years. With these criteria, 149 DTC patients were included in the study.

Table 1.

2009 American Thyroid Association Risk of Recurrence Classification

Risk level	Characteristics
Low	Any size, intrathyroidal
	N0 M0
	No aggressive histology
	No vascular invasion
	No uptake outside TB
Intermediate	N1
	Minor ETE
	Vascular invasion
	Aggressive histology
	Uptake outside TB
High	Gross ETE
	Incomplete tumor resection
	Distant metastases
	Inappropriate Tg

ETE, extrathyroidal extension; M0, absence of distant metastasis; NO, absence of lymph node metastasis; TB, thyroid bed; Tg, thyroglobulin.

Risk of recurrence classifications

The 2009 ATA RR classification is summarized in Table 1. The M-2009-RSS for low and intermediate risk of recurrence, as described in the ATA draft guidelines (Haugen et al., in review) can be seen in Figure 1. In addition, we classified patients with minor extrathyroidal extension as low risk even though they are considered to be borderline between intermediate and low risk and will probably be classified as intermediate risk in the final version of the M-2009-RSS.

FIG. 1.

Proposed 2009 American Thyroid Association Modified Risk Stratification System compared to 2009 American Thyroid Association risk of recurrence classification (frequency of structural disease reported in the literature between parentheses). FTC, follicular thyroid cancer; LN, lymph node; PTC, papillary thyroid cancer.

Ablation protocol

Our ablation protocol used fixed RAI activities based on the extent of initial disease. Patients typically received 3.70 GBq (100 mCi) ¹³¹I for low risk (2009 ATA RR) disease or 5.55 GBq (150 mCi) for intermediate risk (2009 ATA RR) disease. A low-iodine diet was prescribed from 1 week before RAI administration through 2 days afterwards. THW comprised at least 3 weeks without thyroid hormone, starting from thyroidectomy or THW for the diagnostic studies. RAI was administered following that interval, in all cases with thyrotropin (TSH) levels above 50 mIU/L. A posttherapy whole-body scan (WBS) was performed 5–7 days after therapeutic RAI administration.

Thyroglobulin and thyroglobulin antibody measurement

Samples for thyroglobulin (Tg) and thyroglobulin antibody (TgAb) measurement were taken on the day of ablative RAI administration. Tg and TgAb levels were assessed in one of two reference laboratories from Argentina using one of two commercial immunometric assays; the same laboratory and assay were used throughout a patient's follow-up. Tg assays comprised the Elecsys Tg Electrochemiluminescence Immunoassay (Roche Diagnostics GmbH, Mannheim, Germany), which has a 0.5 μg/L detection limit, or the Immulite 2000 Tg Chemiluminiscence Assay (Siemens Corp., Los Angeles, CA), with a 0.9 μg/L functional sensitivity. TgAb assays comprised the Elecsys Anti-Tg Electrochemiluminescence Immunoassay (RSR Ltd., Pentwyn, Cardiff, UK), or the Immulite 2000 Anti-TG Ab chemiluminescent immunometric assay method (Siemens). For both TgAb assays, values >20 IU/mL were considered to be positive and to render Tg measurements uninterpretable. These patients were excluded from the study.

Clinical management during follow-up

Clinical status in response to initial therapy was assessed using THW-stimulated (n=98) or recombinant human (rh)TSH-stimulated (n=51) Tg testing and neck ultrasonography (US) in all patients and diagnostic WBS in intermediate-risk patients (150 MBq [4 mCi] activity) performed 9–18 (mean 12±3) months after ablation. Neck US using an 11 MHz linear array transducer was performed every 6 months after ablation. Patients with measurable stimulated or unstimulated Tg, suspicious neck US findings, or both during follow-up underwent morphological or functional imaging or both, including computed tomography (CT) (n=19 [12%] or [¹⁸F]fluorodeoxyglucose positron emission tomography [FDG-PET] (n=15 [10%]). All US suspicious nodules ≥1 cm in diameter underwent fine-needle aspiration biopsy (FNAB) with measurement of Tg in the wash out of the aspirate.

After ablation, all patients were kept on a suppressed TSH level until January 2008 when all patients started thyroid hormone therapy according to the LATS recommendations for each risk of recurrence group (target TSH: <0.1 mIU/L for intermediate risk; 0.4–1 mIU/L for low risk; and thyroid hormone replacement for very low risk LATS classification) (2).

Clinical outcome definitions

The primary endpoint of the study was the best response to initial therapy (surgery+RAI ablation) assessed at the 12 (±3 months) follow-up visit based on stimulated Tg values, neck US, diagnostic WBS, and risk appropriate additional functional and cross-sectional imaging (5,6,10). Excellent response to therapy was defined as a stimulated Tg <1 μg/L in the absence of TgAb, plus absent or <0.1% thyroid bed uptake on diagnostic WBS (if done), with a normal postoperative neck ultrasound. Patients demonstrating a stimulated Tg value between 1 and 10 μg/L without structural evidence of disease or having nonspecific findings in the ultrasound or persistent measurable TgAb were classified as having an indeterminate response. Those patients who showed a stimulated Tg level >10 μg/L or detectable Tg levels under thyroid hormone therapy without any findings in US were classified as having biochemical incomplete response. Patients with structural evidence of disease (with or without abnormal Tg values) were classified as having structural persistent disease.

The second endpoint of the study was clinical status at time of final follow-up (5,10). Patients were classified as having no evidence of disease (NED) if at the time of final follow-up the suppressed Tg was<1 μg/L, Tg antibodies were negative, neck US was free of suspicious signs, and there were no pathological findings on any other imaging studies performed for clinically indicated reasons (WBS, radiography, CT, FDG-PET, or any other modality) or in any biopsy specimen. Patients with persistent disease at the time of final follow-up were classified as either indeterminate or biochemical or structural incomplete response using the same definitions used in the evaluation of response to initial therapy. Patients who had structural or biochemical evidence of disease identified following a period of NED were classified as having recurrent disease. Disease sites were classified as local (thyroid bed), lymph node metastasis confirmed by FNAB with positive cytology, and/or distant metastasis confirmed by biopsy and/or imaging.

Ongoing risk of recurrence classification

After defining the initial response to treatment, we reclassified these patients according to the ongoing RR classification. We created the following criteria to define this variable. Low risk of recurrence for the ongoing risk were those patients with 1) initial excellent response; 2) those with indeterminate response without any suspicious US finding, or stable detectable stimulated Tg levels between 1 and 10 μg/L, or stable or decreasing TgAb, and 3) those with stable or decreasing stimulated Tg levels >10 μg/L or stable or decreasing Tg levels <5 μg/L under thyroid hormone therapy (biochemical incomplete response). High risk of recurrence for the ongoing RR was considered when 1) patients had a structural persistence as initial response to treatment, 2) indeterminate response with US suspicious findings (suspicious lymph nodes <1 cm in the larger diameter, which were not evaluated with FNAB), 3) biochemical incomplete response with detectable Tg levels >5 μg/L under thyroid hormone therapy or increasing Tg levels during follow-up (whether on thyroid hormone therapy or after TSH stimulation), and biochemical incomplete response due to increasing titers of TgAb antibodies during follow-up (Table 2).

Table 2.

Ongoing Risk of Recurrence Classification

Ongoing risk of recurrence	Characteristics
Low	1) Excellent response to treatment
	2) Indeterminate response without any suspicious US finding,
	3) Indeterminate response with stable detectable stimulated Tg levels between 1 and 10 ng/mL, or
	4) Indeterminate response with stable or decreasing TgAb
	5) Biochemically incomplete response with stable or decreasing stimulated Tg levels>10 ng/mL or stable or decreasing Tg levels<5 ng/mL under thyroid hormone therapy
High	1) Structural persistence as initial response to treatment
	2) Indeterminate response with US suspicious findings
	3) Biochemically incomplete response with detectable Tg levels>5 ng/mL under thyroid hormone therapy or increasing Tg levels during follow-up
	4) Biochemically incomplete response with increasing titers of TgAb

TgAb, thyroglobulin antibody; US, ultrasonography.

Statistical analysis

Epidemiological data are presented as the mean±SEM, with median and range when appropriate. To evaluate significant differences in data frequency, we analyzed two-way contingency tables by the Fisher exact test or 2×3 contingency tables by the chi-squared test.

The agreement between different risk stratification systems was calculated using Cohen's k coefficient; a value of 1 implies perfect agreement and a value <1 implies less than perfect agreement. It was evaluated using the Landis and Koch semiquantitative scale (poor agreement ≤0.20, fair agreement 0.21–0.40, moderate agreement 0.41–0.60, good agreement 0.60–0.80, and very good agreement 0.80–1.0) (24).

Diagnostic accuracy was calculated according to Galen and was based on true-positive (TP), true-negative (TN), false-positive (FP), and false-negative (FN) results. The positive predictive value (PPV) was TP/(TP+FP) and the negative predictive value (NPV) was TN/(FN+TN); the 95% confidence interval [95% CI] of all estimates was also evaluated (25,26).

Statistical analysis was performed using the SPSS statistical software (SPSS version 10.0, SPSS Inc., Chicago, IL). We considered p<0.05 to be statistically significant for all analyses.

Results

Each of the 149 low- and intermediate-risk patients had total thyroidectomy in a specialized center with subsequent RAI remnant ablation following traditional THW. The median follow-up in the whole cohort was 6 years (range 3–12 years; mean 7.7±3.5 years). As can be seen in Table 3, the majority of patients had classic papillary thyroid cancer (PTC) (70%), 86% were female, and 72% were AJCC stage I (27).

Table 3.

Characteristics of the 149 Patients with Differentiated Thyroid Cancer Included in the Study

Patient characteristics
Sex
Female	131 (86%)
Male	18 (14%)
Age, years
Mean±SEM	45.3±12.9
Median (range)	46 (15.7–75.3)
Histology and variant
Classic PTC	106 (71.1%)
Follicular variant PTC	24 (l6.1%)
Oncocytic variant PTC	1 (0.7%)
Tall cell PTC	8 (5.1%)
FTC	5 (3.5%)
Hürthle cell thyroid cancer	5 (3.5%)
Bilaterality and multifocality
Bilateral tumor	33 (22.1%)
Multifocal tumor	28 (18.8%)
Neck dissection
No	61 (40.9%)
Central	49 (32.9%)
Central and lateral	39 (26.2%)
TNM stage AJCC
I	107 (71.8%)
II	11 (7.4%)
III	20 (13.4%)
IV	11 (7.4%)
ATA 2009 risk of recurrence
Low	69 (46.3%)
Intermediate	80 (53.7%)
ATA 2015 risk of recurrence
Low	112 (75.2%)
Intermediate	37 (24.8%)
Cumulative radioiodine activity, mCi
Media±SEM	250±108
Median (range)	171 (100–350)

AJCC, American Joint Committee on Cancer; NED, no evidence of disease; RA, remnant ablation; SEM, standard error mean.

As expected, while only 46% of patients were classified as low risk in the 2009 ATA-RR system, 75% were classified as low risk by the M-2009-RSS. Conversely, 54% were classified as 2009 ATA-RR intermediate risk, while 25% were classified as intermediate risk by the M-2009-RSS (see Table 3).

For the entire cohort, the best response to initial therapy was excellent in 47%, structural persistent disease in 12% (lymph nodes in all 18 patients), biochemical incomplete response in 15%, and indeterminate response in 26%. At the time of final follow-up, 66% were classified as NED, 4% as having structural persistent disease, 9% as having biochemical persistent disease, and 19% as having an indeterminate response (Table 4). On the comparison between the initial responses to treatment for the respective low-risk categories of 2009 ATA-RR and M-2009-RSS, no statistically significant differences were observed (excellent response 67% vs. 55%, respectively, p=0.18; and structural incomplete response 2.9% vs. 9.8%, respectively, p=0.14). Also, when the respective intermediate-risk categories from 2009 ATA-RR and M-2009-RSS classifications were compared, again no statistically significant differences were observed (excellent response 30% vs. 22%, p=0.47; and structural incomplete response 20% vs. 19%, p=0.91) (Table 5).

Table 4.

Best Response to Initial Therapy and Status at the End of Follow-Up for the Whole Cohort of Patients

Response or status	n (%)
Initial response to treatment
Excellent response	70 (46.9)
Indeterminate response	39 (26.2)
Biochemical incomplete	22 (14.8)
Structural incomplete	18 (12.1)
Clinical status at final follow-up
NED	98 (65.8)
Indeterminate	28 (18.8)
Biochemical incomplete	14 (9.4)
Structural incomplete	6 (4.0)
Recurrent disease	3 (2.0)

Table 5.

Initial Response to Treatment Comparing 2009 American Thyroid Association Risk of Recurrence and the Modified 2009 American Thyroid Association Risk Stratification System

	Initial response to treatment n (%)
Risk ^a	Excellent	Indeterminate	Biochemical incomplete	Structural incomplete
2009-ATA RR
Low risk (n=69)	46 (66.7)	15 (21.7)	6 (8.7)	2 (2.9)
Intermediate risk (n=80)	24 (30)	24 (30)	16 (20)	16 (20)
M-2009-ATA RSS
Low risk (n=112)	62 (55.4)	28 (25)	11 (9.8)	11 (9.8)
Intermediate risk (n=37)	8 (21.6)	11 (29.7)	11 (29.7)	7 (19)

2009-ATA RR, 2009 ATA Risk of Recurrence; M-ATA-2009 RSS, Modified 2009 ATA Risk stratification system.

Low-risk 2009-ATA RR versus low-risk M-2009-ATA RSS excellent response (p=0.18) and structural incomplete response (p=0.14). Intermediate RR ATA 2009 versus intermediate M-2009-ATA RSS excellent response (p=0.47) and structural incomplete response (p=0.91).

Agreement among classifications

When the ongoing RR was assessed, we classified 22/149 (14.8%) patients as high ongoing risk of recurrence and 127/149 (85.2%) patients as low ongoing risk of recurrence.

Using the Cohen's k coefficient, the agreement between 2009 ATA-RR and M-2009-RSS was classified as moderate, but fair agreement was found between 2009 ATA-RR or the M-2009-RSS and the ongoing RR stratification (Table 6).

Table 6.

Measurement of Agreement Among 2009 American Thyroid Association Risk of Recurrence Classification, the Modified American Thyroid Association 2009 Risk Stratification System and the Ongoing Risk of Recurrence Stratification

Agreement	k [95% CI] ^a	Landis and Koch semiquantitative scale
2009-ATA RR and M-ATA-2009 RSS	0.443 [0.33–0.58]	Moderate
2009-ATA RR and ongoing RR	0.234 [0.09–0.38]	Fair
M-ATA-2009 RSS and ongoing RR	0.314 [0.11–0.52]	Fair

RR, risk of recurrence.

Cohen's coefficient (k) and confidence intervals (CIs) are presented.

Comparison among risk stratification systems (2009 ATA-RR, M-2009-RSS, and ongoing RR) for different levels in predicting final outcome

To assess which risk stratification system had the better predictive value in any of the low or intermediate risk of recurrence, the 2009 ATA-RR, the M-2009-RSS, and the ongoing RR systems were correlated with the final outcome (NED status) (Table 7).

Table 7.

Status at the End of Follow-Up for the 2009 American Thyroid Association Risk of Recurrence Classification, for the Modified American Thyroid Association 2009 Risk Stratification System, and for the Ongoing Risk of Recurrence

	Final outcome, n (%)
	NED	Indeterminate response	Biochemical incomplete	Structural persistent disease
2009 ATA RR
Low risk (n=69)	58 (84.1)	7 (10.1)	4 (5.8)	—
Intermediate risk (n=80)	40 (50)	21 (26.3)	10 (12.5)	9 (11.2)
M-2009-RSS
Low risk (n=112)	83 (74.1)	17 (15.2)	8 (7.1)	4 (3.6)
Intermediate risk (n=37)	15 (40.5)	11 (29.7)	6 (16.2)	5 (13.6)
Ongoing RR
Low risk (n=127)	94 (74)	22 (17.3)	9 (7.1)	2 (1.6)
High risk (n=22)	4 (18.2)	6 (27.3)	5 (22.7)	7 (31.8)

As expected, 84% of the low-risk patients in the 2009 ATA-RR system, 74% in the low-risk M-2009-RSS, and 74% in the low-risk ongoing RR stratification were with NED at final follow-up, without any statistical significant differences (p=0.23 for chi square test). The frequency of patients with NED showed significant differences in the ongoing high-risk category with respect to 2009 ATA-RR and M-2009-RSS intermediate-risk category (50%, 40%, and 18%, for 2009 ATA-RR, M-2009-RSS and ongoing RR stratification, respectively), p=0.03 for chi-square test.

On the other hand, the rate of patients with structural incomplete response in the low-risk group was similar among the three classifications (p=0.22 for chi-square test), but with borderline significance for the intermediate risk group for the 2009 ATA-RR and M-2009-RSS with respect to the high-risk ongoing RR category: 11%, 13%, and 32%, for 2009 ATA-RR, M-2009-RSS, and ongoing RR stratification, respectively (p=0.06 for chi-square test).

PPV, NPV, and accuracy of the three risk stratification systems for different levels in predicting structural persistent disease as final outcome

We evaluated the ability of different risk stratification systems to predict structural persistent disease as final outcome by determining PPV, NPV, and accuracy (Table 8).

Table 8.

Positive Predictive Value, Negative Predictive Value, and Accuracy to Predict Structural Persistent Disease in Low and Intermediate Risk of Recurrence as Final Outcome According to the Different Risk Stratification Systems

	PPV [95% CI]	NPV [95% CI]	Accuracy
2009 ATA-RR
Low risk (n=69)	0.0 [0.0–0.0521]^a	0.89 [0.79–0.95]	0.48 [0.39–0.56]
Intermediate risk (n=80)	0.11 [0.05–0.20]	1.00 [0.95–1.00]	0.52 [0.44–0.61]
M-2009-RSS
Low risk (n=112)	0.04 [0.01–0.09]	0.86 [0.71–0.95]	0.24 [0.17–0.32]
Intermediate risk (n=37)	0.13 [0.04–0.29]	0.96 [0.91–0.99]	0.76 [0.68–0.82]
Ongoing RR
Low risk (n=127)	0.02 [0.00–0.06]	0.68 [0.45–0.86]	0.11 [0.07–0.18]
High risk (n=22)	0.32 [0.14–0.55]	0.98 [0.94–0.99]	0.88 [0.82–0.93]

There is uncertainty in only one direction; this range is the 97.5% CI.

NPV, negative predictive value; PPV, positive predictive value.

When patients were classified as low risk by the 2009 ATA-RR, by the M-2009-RSS, and by the ongoing RR, the PPV to predict structural persistent disease was 0%, 3.6%, and 1.6%, respectively (p=0.22 for chi-square test) and the NPV was 88.7%, 86.5%, and 68.2%, respectively (p=0.06 for chi-square test). Therefore, the 2009 ATA-RR accuracy was significantly higher than M-2009-RSS and ongoing RR accuracies: 48% [39%–56%], 24% [17%–32%], and 11% [7%–18%], respectively [95% CI] (Table 8).

When patients were classified as intermediate RR by the 2009 ATA-RR and by the M-2009-RSS and high risk by the ongoing RR, the PPV to predict structural persistent disease was 11%, 13%, and 32% (p=0.06 for chi-square test) and the NPV were 100%, 96%, and 98%, respectively (p=0.22 for chi-square test). The ongoing RR accuracy was significantly higher than the 2009 ATA-RR and the M-2009-RSS accuracies: 88% [82%–93%], 76% [68%–82%], and 52% [44%–61%], respectively [95% CI] (Table 8).

Discussion

By using the 2009 ATA RR, the M-2009-RSS, and the ongoing risk of recurrence prognostic systems to risk stratify the same cohort of 149 low- and intermediate-risk DTC patients treated with total thyroidectomy and RAI ablation at a single thyroid cancer specialty center, we have confirmed again the utility of the ATA 2009 system (5,7,11) and for the first time, demonstrated the clinical utility of the M-2009-RSS. The 2009 ATA RR system has already been validated in cohorts of DTC patients in Argentina (4), Brazil (11), Italy (7), and New York (5) confirming its clinical applicability across a wide spectrum of patients and health care systems.

Our data demonstrate that both the 2009 ATA RR and the M-2009-RSS effectively risk stratify patients with regard to a broad spectrum of clinical outcomes, even though the low-risk category in the M-2009-RSS was expanded to include tumors beyond intrathyroidal PTCs (see Fig. 1). As would be expected, the precise estimates for remission, persistent disease, and recurrence might vary between risk categories based on the specific criteria used to define low risk and intermediate risk. Therefore, it is important to clearly specify what is meant when describing a patient as either intermediate risk or low risk. For clear communication between clinicians and in research reports, it is important to be very specific with regard to what risk is being referred to and what classification system (specific definitions) are being used.

With regard to predicting the NED status at the end of follow-up, both the 2009 ATA RR and the M-2009-RSS demonstrated almost similar results (84% vs. 74%, respectively). As can be seen in Table 6, the 2009 ATA RR system classified a larger number of patients as intermediate risk (n=80) than the M-2009-RSS (n=37). This is primarily related to the difference in classification of N1 patients put all together as intermediate risk in the 2009 ATA RR system. It appears that with inclusion of some of the N1 patients in the low-risk category, the use of the M-2009-RSS resulted in barely lower rates of excellent responses (74%). In this investigation, we confirm the data from a recent review that have demonstrated that the risk of structural disease recurrence can vary from 4% in patients with fewer than five metastatic lymph nodes, to 5% if all involved lymph nodes are <0.2 cm (9). Given the 0% risk of structural recurrence in the ATA 2009 low-risk category and the 3.6% risk of structural recurrence in the M-2009-RSS low-risk category, we confirm these previous findings.

The ongoing risk of recurrence, proposed initially by Tuttle et al. (5) and validated in 588 DTC patients stratified according to the response to therapy after 2 years of follow-up, was later confirmed by Castagna et al. (7) who showed that the reclassification of DTC patients on the basis of the results observed after initial therapy (total thyroidectomy and RAI ablation), particularly in the intermediate/high-risk patients, was an effective way of classifying patients. This ongoing risk of recurrence was designated as delayed risk of recurrence by these authors, and they concluded that this dynamic staging allowed establishing a better plan to adapt the subsequent follow-up. For example, an excellent response to treatment allows excluding a significant number of patients from unnecessary intensive work-up (5,7). Perhaps the lower accuracy percentages observed in our study might be related to the absence of inclusion of a high risk RR for the 2009 ATA RR.

There are several other limitations in our study; for instance, each of the individual groups has a low number of patients when divided by risk of recurrence, which might impact the statistical results. As such, the results of this study remain to be validated by larger multicenter studies that can pool data in order to minimize confounders. On the other hand, every patient in our cohort, even the ones classified as low risk, were treated with surgery and RAI, a common practice in the past in most countries of Latin America (2). Currently, this practice of care may not apply to many institutions where low-risk patients do not receive RAI. Therefore, additional validation studies are required in patients treated without RAI ablation.

In conclusion, both the 2009 ATA RR and the M-2009-RSS appear to similarly risk stratify patients with regard to the main clinical outcomes (structural incomplete response and NED) even though the M-2009-RSS expanded the definition of low risk to include patients beyond those with intrathyroidal classical papillary thyroid cancer. Furthermore, ongoing RR is a better predictor of the structural incomplete response at final follow-up than static risk predictions established at the time of diagnosis.

Footnotes

Author Disclosure Statement

Fabián Pitoia and R. Michael Tuttle have been consultants for Genzyme-Sanofi. The other authors declare no competing financial interests.

References

Cooper

, Doherty

, Haugen

, Kloos

, Lee

, Mandel

, Mazzaferri

, McIver

, Pacini

, Schlumberger

, Sherman

, Steward

, Tuttle

. 2009. Revised American Thyroid Association management guidelines for patients with thyroid nodules and differentiated thyroid cancer. Thyroid, 19:1167–1214.

Pitoia

, Ward

, Wohllk

, Friguglietti

, Tomimori

, Gauna

, Camargo

, Vaisman

, Harach

, Munizaga

, Corigliano

, Pretell

, Niepomnizcze

. 2009. Recommendations of the Latin American Thyroid Society on diagnosis and management of differentiated thyroid cancer. Arq Bras Endocrinol Metabol, 53:884–887.

Pacini

, Schlumberger

, Dralle

, Elisei

, Smit

, Wiersinga

. 2006. European Thyroid Cancer Taskforce. European consensus for the management of patients with differentiated thyroid carcinoma of the follicular epithelium. Eur J Endocrinol, 154:787–803.

Pitoia

, Bueno

, Urciuoli

, Abelleira

, Cross

, Tuttle

. 2013. Outcomes of patients with differentiated thyroid cancer risk stratified according to the American Thyroid Association and Latin American Thyroid Society risk of recurrence classification systems. Thyroid, 23:1401–1407.

Tuttle

, Tala

, Shah

, Leboeuf

, Ghossein

, Gonen

, Brokhin

, Omry

, Fagin

, Shaha

. 2010. Estimating risk of recurrence in differentiated thyroid cancer after total thyroidectomy and radioactive iodine remnant ablation: using response to therapy variables to modify the initial risk estimates predicted by the new American Thyroid Association staging system. Thyroid, 20:1341–1349.

Vaisman

, Shaha

, Fish

, Tuttle

. 2011. Initial therapy with either thyroid lobectomy or total thyroidectomy without radioactive iodine remnant ablation is associated with very low rates of structural disease recurrence in properly selected patients with differentiated thyroid cancer. Clin Endocrinol (Oxf), 75:112–119.

Castagna

, Maino

, Cipri

, Belardini

, Theodoropoulou

, Cevenini

, Pacini

. 2011. Delayed risk stratification, to include the response to initial treatment (surgery and radioiodine ablation), has better outcome predictivity in differentiated thyroid cancer patients. Eur J Endocrinol, 165:441–446.

Vaisman

, Tala

, Grewal

, Tuttle

. 2011. In differentiated thyroid cancer, an incomplete structural response to therapy is associated with significantly worse clinical outcomes than only an incomplete thyroglobulin response. Thyroid, 21:1317–1322.

Randolph

, Duh

, Heller

, LiVolsi

, Mandel

, Steward

, Tufano

, Tuttle

. 2012. American Thyroid Association Surgical Affairs Committee´s Taskforce on Thyroid Cancer Nodal Surgery. The prognostic significance of nodal metastases from papillary thyroid carcinoma can be stratified based on the size and number of metastatic lymph nodes, as well as the presence of extranodal extension. Thyroid, 22:1144–1152.

10.

Hugo

, Robenshtok

, Grewal

, Larson

, Tuttle

. 2012. Recombinant human TSH-assisted radioactive iodine remnant ablation in thyroid cancer patients at intermediate to high risk of recurrence. Thyroid, 22:1007–1015.

11.

Vaisman

, Momesso

, Bulzico

, Pessoa

, Dias

, Corbo

, Vaisman

, Tuttle

. 2012. Spontaneous remission in thyroid cancer patients after biochemical incomplete response to initial therapy. Clin Endocrinol (Oxf), 77:132–138.

12.

Rosário

, Ward

, Carvalho

, Graf

, Maciel

, Maia

, Vaisman

; Sociedade Brasileira de Endocrinologia e Metabologia. 2013. Thyroid nodules and differentiated thyroid cancer: update on the Brazilian consensus. Arq Bras Endocrinol Metabol, 57:240–264.

13.

Pitoia

, Califano

, Vázquez

, Faure

, Gauna

, Orlandi

, Vanelli

, Novelli

, Mollerach

, Fadel

, San Martín

, Figari

, Cabezón

. 2014. Inter Society Consensus for the Management of Patients with Differentiated Thyroid Cancer. Rev Arg Endocrinol Metab, 51:85–118.

14.

Bardet

, Malville

, Rame

, Babin

, Samama

, De Raucourt

, Michels

, Reznik

, Henry-Amar

. 2008. Macroscopic lymph-node involvement and neck dissection predict lymph-node recurrence in papillary thyroid carcinoma. Eur J Endocrinol, 158:551–560.

15.

Cranshaw

, Carnaille

. 2008. Micrometastases in thyroid cancer. An important finding? Surg Oncol, 17:253–258.

16.

Leboulleux

, Rubino

, Baudin

, Caillou

, Hartl

, Bidart

, Travagli

, Schlumberger

. 2005. Prognostic factors for persistent or recurrent disease of papillary thyroid carcinoma with neck lymph node metastases and/or tumor extension beyond the thyroid capsule at initial diagnosis. J Clin Endocrinol Metab, 90:5723–5729.

17.

Ito

, Tomoda

, Uruno

, Takamura

, Miya

, Kobayashi

, Matsuzuka

, Kuma

, Miyauchi

. 2006. Minimal extrathyroid extension does not affect the relapse-free survival of patients with papillary thyroid carcinoma measuring 4 cm or less over the age of 45 years. Surg Today, 36:12–18.

18.

Ito

, Tomoda

, Uruno

, Takamura

, Miya

, Kobayashi

, Matsuzuka

, Kuma

, Miyauchi

. 2006. Prognostic significance of extrathyroid extension of papillary thyroid carcinoma: massive but not minimal extension affects the relapse-free survival. World J Surg, 30:780–786.

19.

Moon

, Kim

, Chung

, Yoon

, Kwak

. 2011. Minimal extrathyroidal extension in patients with papillary thyroid microcarcinoma: is it a real prognostic factor?. Ann Surg Oncol, 18:1916–1923.

20.

Nixon

, Ganly

, Patel

, Palmer

, Whitcher

, Tuttle

, Shaha

, Shah

. 2011. The impact of microscopic extrathyroid extension on outcome in patients with clinical T1 and T2 well-differentiated thyroid cancer. Surgery, 150:1242–1249.

21.

Hotomi

, Sugitani

, Toda

, Kawabata

, Fujimoto

. 2012. A novel definition of extrathyroidal invasion for patients with papillary thyroid carcinoma for predicting prognosis. World J Surg, 36:1231–1240.

22.

Shin

, Ha

, Park

, Ahn

, Kim

, Bae

, Kim

, Choi

, Kim

, Bae

, Kim

. 2013. Implication of minimal extrathyroidal extension as a prognostic factor in papillary thyroid carcinoma. Int J Surg, 11:944–947.

23.

Chéreau

, Buffet

, Trésallet

, Tissier

, Golmard

, Leenhardt

, Menegaux

. 2014. Does extracapsular extension impact the prognosis of papillary thyroid microcarcinoma?. Ann Surg Oncol, 21:1569–1664.

24.

Landis

, Koch

. 1977. The measurement of observer agreement for categorical data. Biometrics, 33:159–74.

25.

Galen

. 1982. Application of the predictive value model in the analysis of test effectiveness. Clin Lab Med, 2:685–699.

26.

Royston

, Sauerbrei

. 2008. Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. Wiley, Chichester, UK.

27.

AJCC. 2010. Thyroid In: Edge

, Byrd

, Compton

, Fritz

, Greene

, Trotti

(eds) AJCC Cancer Staging Manual. 7th ed. Springer, New York, NY, pp. 87–96.