Abstract
Introduction:
The 2015 American Thyroid Association (ATA) guidelines recommend response to therapy (RTT) assessment 1–2 years after initial treatment in differentiated thyroid cancer (DTC) patients to guide thyrotropin (TSH) goals and long-term follow-up. We hypothesized that data collected during the first 2 years of follow-up may be sufficient to determine RTT without thyroglobulin (Tg) stimulation.
Materials and Methods:
Patients treated with total thyroidectomy and radioiodine for intermediate-risk DTC, followed for >2 years, and had sufficient follow-up data were included. Data on Tg, ultrasound, scans, and long-term outcomes were collected.
Results:
One-hundred twenty patients met inclusion criteria, with 68% women and mean age 55 ± 15 years. Intermediate risk was due to lymph-node involvement (72%), extrathyroidal extension (51%), vascular invasion (12%), and high-risk histology (9%). At the end of follow-up of 7 ± 4 years, 26% had persistent disease (14% biochemical, 12% structural). According to the ATA RTT system (using stimulated-Tg), 56% had excellent RTT, of whom only 2% had disease at the end of follow-up. In the “nonstimulated” system (which includes basal Tg, post-131I therapy whole-body scan (TxWBS) for assessment of residual lymph-node metastases after surgery, and structural imaging studies), 57% had excellent response, of whom none had disease at the end of follow-up. Only eight patients (7%) were classified differently due to recombinant human thyrotropin stimulation (as either excellent or indeterminate response), with no difference in predictive value, with a receiver–operator characteristic area under the curve of 0.903 with Tg-stimulation and of 0.918 without.
Conclusions:
In patients with no evidence of disease during the first 2 years of follow-up, the addition of stimulated-Tg adds little prognostic information. We suggest the use of excellent RTT based on basal Tg together with TxWBS and structural imaging studies.
Introduction
Response to therapy (RTT) assessment during the first 1–2 years after initial therapy for differentiated thyroid cancer (DTC) is effective in predicting long-term disease recurrence, and was endorsed by the American Thyroid Association (ATA) 2015 guidelines (1). This assessment system, first proposed in 2010 by Tuttle et al. (2) and later validated in multiple studies (3 –9), is based on data collected during treatment and follow-up (post-therapy 131I whole-body scan [TxWBS], neck ultrasound [US], basal thyroglobulin [Tg], Tg antibodies [TgAb]), together with recombinant human thyrotropin (rhTSH) stimulated Tg levels. The results of RTT assessment are categorized as excellent, indeterminate, biochemical incomplete, or structural incomplete, and have major implication for patients' follow-up and the need for TSH suppression.
While US and basal Tg + TgAb tests are routinely performed during follow-up, Tg stimulation requires either rhTSH injection or thyroid hormone withdrawal (THW). The use of rhTSH stimulation requires resources, including medical staff time, patients' work hours lost, medication cost, and more. Alternatively, THW has a significant impact on the patients' quality of life, and requires repeated blood tests until return to euthyroidism (10). It is therefore important to assess whether Tg stimulation is indeed essential for the accuracy of the RTT assessment tool.
Several previous studies evaluated the prognostic value of the individual tests performed during follow-up, including TxWBS (11), neck US (12), and basal Tg (13,14). However, in clinical practice these tests are evaluated in combination with the later addition of stimulated Tg. Given the good predictive value of the tests obtained during routine follow-up, it is important to evaluate the added value of rhTSH stimulation in predicting future recurrences. Specifically, we question whether patients with normal TxWBS, normal neck US, and undetectable Tg levels require rhTSH stimulation.
Given these uncertainties, we performed a study comparing the ATA RTT assessment tool with the same tool only without stimulated-Tg. Patients with initial ATA intermediate risk of recurrence were included, in whom the risk of recurrence is significant, and in whom the impact of the RTT is most pronounced in terms of follow-up approach and the need for TSH suppression.
Materials and Methods
Data for this study were collected from the Rabin Medical Center Thyroid Cancer Registry. We reviewed the electronic medical records of 527 patients diagnosed with intermediate-risk DTC according to ATA initial-risk classification between the years 2000 and 2016, who had total thyroidectomy and RAI ablation (1). We included patients followed for >2 years, age >18 at diagnosis, and who had sufficient data on US, Tg, TgAb, stimulated Tg values, and TxWBS. All patients included in the study were treated with TSH suppressive therapy and had at least one neck US performed during the first year of follow-up and serum Tg and TgAb levels obtained on levothyroxine suppression during the first and second year of follow-up. A stimulated Tg value within the first or second year of follow-up was required for inclusion in the study. Patients with interfering TgAb were excluded.
The Rabin Medical Center is a tertiary referral center providing comprehensive care for patients with thyroid cancer in a multidisciplinary environment, including the ear, nose, and throat, endocrinology, and nuclear medicine departments. In 2005, we started a registry of all patients with nonmedullary thyroid carcinoma followed since 1973 at our Endocrine Institute. Patients entered the registry retrospectively if operated before 2005 and prospectively thereafter. Quality control of the registry data was performed by data verification on several occasions.
Each patient was stratified using the 8th edition of the AJCC/UICC staging system and ATA risk of recurrence stratification system (only intermediate-risk patients were included). All clinical data obtained during the first and second year of follow-up were used to assess the response to initial therapy as either excellent, indeterminate, or structural/biochemical incomplete response according to the ATA 2015 management guidelines. In addition, a “nonstimulated” excellent RTT was defined as undetectable Tg levels, normal TxWBS performed as part of the initial therapy, which was included as a tool to assess for the presence of residual lymph-node metastases after surgery, and normal structural imaging (which includes neck US, and if performed also CT and/or MRI studies) (Table 1). Undetectable Tg levels were reported as <0.5 ng/mL until 2003 (10% of included patients) and later as <0.2 ng/mL (90% of included patients).
Description of the Response to Therapy Systems
Parameters that differ between the systems are in bold.
Ninety percent of patients were analyzed using Tg assay with functional sensitivity of 0.2 ng/mL, and 10% with functional sensitivity of 0.5 ng/mL.
Post 131I whole-body scan part of initial therapy, evaluating presence of residual lymph-node metastases after surgery.
ATA, American Thyroid Association; RAI, radioiodine; RTT, response to therapy; Tg, thyroglobulin; TgAb, Tg antibodies; TSH, thyrotropin; TxWBS, post-therapy (131I) whole-body scan.
Patients were usually followed every 6 months during the first year and at 6–12 months intervals thereafter at the discretion of treating physician. Follow-up included physical examination, laboratory tests (basal Tg levels, TSH, T4, T3, and TgAb), and imaging studies (US in all patients, and other imaging modalities as required). Tg measurements performed within 1 month of RAI ablation were excluded. Stimulated Tg was evaluated using THW in 27% of patients and rhTSH in 73% of patients. Our stimulation protocol includes basal Tg assessment before the first rhTSH injection or before THW. Patients were considered to have no evidence of disease (NED) at the end of follow-up if they had a basal serum Tg <0.2 ng/mL, normal TgAb levels, and no structural evidence of disease on imaging studies. Recurrence was defined as new biochemical (basal Tg >1 ng/mL and/or stimulated Tg >2 ng/mL) or structural evidence of disease after any period of NED.
TSH was determined by a solid-phase, two-site chemiluminescent immunometric assay, run on Immulite 2000 (Siemens Medical Solutions Diagnostics, Los Angeles, CA). Undetectable Tg levels were reported until 2003 as <0.5 ng/mL using a noncompetitive immunoradiometric assay (Sorin, Italy), and later as <0.2 ng/mL (90% of included patients) determined by a solid-phase, chemiluminescent immunometric assay, run on Immulite 2000 (CRM 457, molecular mass 660 kDa; Siemens Medical Solutions Diagnostics).
The study was approved by the local ethics committee.
Statistical analysis
All statistical analyses were performed with the SPSS v.25.0 (IBM Corp., Armonk, NY). Associations between two categorical variables were examined using χ 2 test and Fisher's exact test. Receiver–operator characteristic (ROC) curves were performed for excellent, indeterminate, biochemical incomplete, and structural incomplete responses expressed as 1,2,3, and 4, respectively. A two-sided p-value of <0.05 was considered statistically significant for all analyses.
Results
Patient characteristics
Of the 527 potentially eligible patients, 120 patients had adequate information to allow determination of clinical status throughout their follow-up. Mean age was 55 ± 15 years, with a mean follow-up period of 7 ± 4 years. The majority of patients had papillary thyroid carcinoma (PTC) (81%), were female (68%), and classified as AJCC/UICC stage I (55%). Patients were classified as intermediate risk due to lymph-node involvement in 72%, extrathyroidal extensions in 51%, vascular invasion in 12%, and high-risk histology in 9% (38% of patients had two or more of these features). All patients were treated with total thyroidectomy and radioiodine therapy with a mean activity of 135 ± 30 mCi (Table 2).
Patients' Characteristics
NED, no evidence of disease; SD, standard deviation.
For intermediate-risk patients, the rate of recurrent/persistent disease at the end of follow-up was 26%: 17 patients (14%) had biochemical disease (14 patients with persistent disease and 3 patients with recurrent disease), and 14 patients (12%) had structural disease. No patient died of thyroid cancer during follow-up.
Predictive value of the 2015 ATA RTT system
We first evaluated the predictive value of the 2015 ATA RTT system (Table 3). Among the included patients, 67 patients (56%) had excellent RTT, 12 patients (10%) had biochemical incomplete response, 25 patients (21%) had structural incomplete response, and 16 patients (13%) had indeterminate response (Table 3). Structural disease was detected by neck US in 24 of 25 patients (96%), and by PET/CT in one patient (4%) with level 6 disease.
Performance of the 2015 American Thyroid Association Response to Therapy System
After follow-up of 7 ± 4 years, 98% of patients with excellent RTT had NED, with only one case of biochemical persistence (which was recurrent disease). In the indeterminate response group, 88% had NED at the end of follow-up, while in the biochemical and structural incomplete response groups the rate of NED was low (0% and 36%, respectively); among them, 12 had additional radioiodine therapy and 4 underwent a second surgery. ROC analysis yielded an area under the curve (AUC) of 0.903 (p < 0.001) (Fig. 1).

ROC curves of response to therapy systems. AUC for the 2015 ATA system (solid line) 0.903 [95% CI 0.842–0.964] and for the “nonstimulated” system (dashed line) 0.918 [CI 0.868–0.968]. The difference between the systems was not significant. ATA, American Thyroid Association; AUC, area under the curve; CI, confidence interval; ROC. receiver–operator characteristic.
Predictive value of RTT system without Tg stimulation
We examined the “nonstimulated” system performance for two parameters: first, the rate of patients defined as excellent RTT, and second for the predictive value of the excellent, incomplete, and indeterminate categories. The mean TSH levels under suppression were 0.36 ± 0.63 mIU/L. Using the definitions detailed in Table 1, 68 patients (57%) had excellent RTT (compared with 67 patients [56%] in the ATA system), and 15 patients (12%) had an indeterminate response (compared with 16 patients [13%] in the ATA system) (Table 4). All patients in the biochemical and structural incomplete response groups were categorized without change in the two systems (with or without Tg stimulation).
Performance of “Nonstimulated” Response to Therapy System (Basal Tg, TxWBS, Imaging)
As detailed in Table 1.
At the end of follow-up, all patients with normal TxWBS, normal US, and undetectable basal Tg (excellent response group) were free of disease (Table 4). In the indeterminate response group, 80% had NED at the end of follow-up, 7% had biochemical persistence, and 13% had structural persistence. The results of the incomplete response groups were as expected based on the ATA system. ROC analysis yielded an AUC of 0.918 (p < 0.001) (Fig. 1).
In comparison with the 2015 ATA system, 10 patients (8%) were differently categorized into the two systems (with or without rhTSH stimulation) (Table 5). Five patients categorized as excellent response before stimulation had a detectable stimulated Tg (2.5 ± 1.6 ng/mL, range 1.3–5.2 ng/mL), but <10 ng/mL, and were reclassified as indeterminate response according to the ATA system. None of these patients had disease at the end of follow-up. Another five patients who were classified as excellent response according to the ATA classification had an abnormal TxWBS (lymph-node uptake after the initial radioiodine ablation), and in the “nonstimulated” system were categorized as indeterminate response. One of these patients (20%) had structural recurrence during follow-up.
Patients with Discordant Classification Between the 2015 American Thyroid Association and “Nonstimulated” Systems
F, female; FVPTC, follicular variant PTC; LN, lymph-node metastases; M, male; mETE, minimal extrathyroidal extension; PTC, papillary thyroid carcinoma; TxWNS, post-therapy whole-body scan; US, ultrasound; VI, vascular invasion.
To assess whether a higher Tg threshold can be used, we analyzed basal Tg “undetectable” thresholds of <0.5 and <1 ng/mL in the excellent response group (in this analysis, Tg levels between 0.2 ng/mL and each of these thresholds were classified as undetectable). With an undetectable threshold of 0.5 ng/mL, there was no change in the number of patients defined as excellent RTT. With an undetectable threshold of <1 ng/mL, two additional patients were defined as excellent response (instead of indeterminate response). Both had NED at the end of follow-up, leading to a higher ROC analysis AUC of 0.92 (p < 0.001).
Discussion
The RTT assessment system endorsed by the 2015 ATA guidelines is effective in estimating risk of disease persistence or recurrence, and enables personalized adjustment of the follow-up strategy and TSH suppression (1). The most significant impact of the system is in patients with excellent response, in whom the risk of recurrence is very low, do not require TSH suppression, and require less intensive follow-up with Tg testing every 1–2 years and periodic US (1,15). The study presented here evaluated whether excellent RTT can be defined based on clinical data obtained during the first 2 years of follow-up, without the use of rhTSH stimulation. We demonstrate that in intermediate-risk patients this can be achieved with US, basal Tg, and TxWBS (to assess for residual lymph-node metastases after surgery), with similar detection rates (percentage of patients) and with a similar predictive capability as the ATA system. In fact, all of the 68 patients (57% of the cohort) with “nonstimulated” excellent RTT were free of disease at the end of therapy, despite a 26% risk of persistent disease in the entire cohort. This is in comparison with the ATA system, which includes stimulated Tg values, in which 67 patients (56%) were categorized as excellent response, with one case of recurrence (2%). Comparison of the two systems using ROC curve AUC resulted in equal predictive capabilities, with no added benefit of rhTSH stimulation.
The predictive value of stimulated Tg with either THW or rhTSH has been demonstrated in multiple studies (16 –21,21 –28). While most studies analyzed its role as an independent factor, several studies evaluated the combination with US or diagnostic whole-body scans, showing that the combination of stimulated Tg with US is superior to either test alone (19,21,29 –32). However, the requirement for routine Tg stimulation has been challenged in recent years. Basal Tg levels measured by second-generation assays proved to be a sensitive tool for Tg detection (13,33 –35). Spencer et al. (33) evaluated 1029 paired serum specimens of basal Tg and rhTSH-stimulated Tg values, and found a strong linear correlation between the two tests. When basal Tg was <0.1 ng/mL a positive stimulated Tg >2.0 ng/mL was very unlikely (0.3%), and with a basal Tg <0.2 ng/mL (used in 90% of our study population) only 2.5% had a positive stimulation test. These findings suggest that the routine use of rhTSH stimulation testing would change patient classification of disease status in only very few patients.
While previous studies suggested basal Tg threshold as low as 0.15 ng/mL (14), several recent studies demonstrated that even with low-detectable basal Tg levels, the risk of recurrence is very low, and stimulation may not be necessary (34,36). Rosario et al. (34) evaluated the outcomes of 130 low-risk patients with detectable basal Tg of <0.3 ng/mL and normal US, and found only two structural recurrences after a median follow-up of 5 years. Eventual recurrences were detected by basal Tg elevation and/or US, leading the authors to suggest follow-up without Tg stimulation. A similar threshold of 0.27 ng/mL was suggested by Brassard et al. (28) who evaluated 715 patients over a median follow-up of 6.2 years. Verburg et al. (36) evaluated 3176 US examinations performed in 773 patients together with nonstimulated Tg levels. They showed that patients with a Tg <1 ng/L have an extremely low rate of true-positive US findings, but have a considerable rate of false-positive findings. The authors suggested reserving US examinations only for patients with a Tg level >1 ng/mL, thus questioning the need for the more sensitive detection capability of stimulated Tg (or possibly of ultrasensitivity Tg assays). This is supported by our data, in which a Tg threshold <1 ng/mL for excellent RTT was as predictive as a lower Tg threshold of <0.5 or <0.2 ng/mL, although our sample size is too small to suggest changing the Tg threshold from the currently recommended 0.2 ng/mL.
Our study evaluated patients with ATA intermediate risk of recurrence, in whom RTT assessment has the greatest impact on treatment and follow-up. In this group, excellent response decreases the estimated risk of recurrence from 20–30% to <5%, leading to less intensive follow-up and no need for TSH suppression. This is in comparison with patients with low risk who have <5% risk at baseline, and with patients with high risk of recurrence in whom excellent RTT decreases from 60–70% to 15–20%, thus requiring routine follow-up and TSH suppression (2,5,7,8,37). In our study, the risk of recurrence decreased from 26% to 2% with the ATA system, and from 26% to 0% with the “nonstimulated” system. The differences in patient categorization between the two systems was small, with only 10 patients (8%) categorized differently as either indeterminate risk based on Tg stimulation (4%), or abnormal TxWBS (4%) in patients who would otherwise be defined as excellent response. Of this group, only one patient with an abnormal TxWBS had disease recurrence, and was therefore better identified by the “nonstimulated” system. Indeed, the ROC curve AUC resulted in equal predictive capabilities, but with the limitation that the number of events was small. Patient categorization as biochemical or structural incomplete response was identical in both systems, as this status was evident based on imaging and/or basal Tg, with no added benefit to Tg stimulation.
The strengths of our study are the inclusion of patients with intermediate risk with an overall risk of recurrence of 26% in whom the identification of those with excellent response is more challenging (in comparison with low-risk patients), the long follow-up of 7 ± 4 years, and full clinical data on TxWBS, US, and Tg levels. Our study has several limitations. It is a retrospective study using data from a single medical center, which is similar to previous studies evaluating RTT (2,6,8,38 –40). Of 527 potentially eligible patients with intermediate-risk DTC, only 120 patients had adequate information for inclusion, which may result in selection bias for patients with better adherence to treatment and follow-up. Nevertheless, the characteristics of the included patients in our study are typical for intermediate-risk patients reported in previous studies, with a mean age of 55 years, a female-to-male ratio of 2.2:1, 81% classic PTC, a mean administered radioiodine activity 131I activity of 135 mCi, and persistent/recurrent disease in 26% at the end of follow-up (Table 2). We therefore believe that our results can be generalized to DTC patients with similar characteristics. In addition, TSH stimulation was performed with either THW (27% of patients) or rhTSH (73% of patients), which may have an impact on the obtained Tg concentrations (22). Undetectable Tg levels were reported as <0.2 ng/mL in 90% of included patients, and as <0.5 ng/mL in 10% of included patients (up to 2003), but these two thresholds performed similarly in the RTT systems. These Tg levels were obtained during TSH suppression (mean 0.36 ± 0.63 mIU/L), but few patients had nonsuppressed TSH levels (range <0.2 to 8.2 mIU/L). Although higher TSH levels in these few patients would potentially lead to detection of less “nonstimulated” excellent RTT, this system performed as well as the 2015 ATA system, and may have even performed better if the TSH was well suppressed in all patients.
In conclusion, intermediate-risk thyroid cancer patients with normal TxWBS, US, and undetectable basal Tg have a very low risk of recurrence. Tg stimulation changes the RTT classification in a very small percentage of patients, and adds little prognostic value. We suggest continued use of the RTT assessment tool, which is very effective in predicting clinical outcomes, using definition of excellent RTT based on basal Tg levels <0.2 ng/mL.
Footnotes
Acknowledgment
The work was performed in partial fullfillment of the MD thesis requirements of the Sackler Faculty of Medicine, Tel Aviv University.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
