Abstract
Background:
One of the most widely used risk stratification systems for estimating individual patients' risk of persistent or recurrent differentiated thyroid cancer (DTC) is the American Thyroid Association (ATA) guidelines. The 2015 ATA version, which has increased the number of patients considered at low or intermediate risk, has been validated in several retrospective, single-center studies. The aims of this study were to evaluate the real-world performance of the 2015 ATA risk stratification system in predicting the response to treatment 12 months after the initial treatment and to determine the extent to which this performance is affected by the treatment center in which it is used.
Methods:
A prospective cohort of DTC patients collected by the Italian Thyroid Cancer Observatory web-based database was analyzed. We reviewed all records present in the database and selected consecutive cases that satisfied inclusion criteria: (i) histological diagnosis of DTC, with the exclusion of noninvasive follicular thyroid neoplasm with papillary-like nuclear features; (ii) complete data of the initial treatment and pathological features; and (iii) results of 1-year follow-up visit (6–18 months after the initial treatment), including all data needed to classify the estimated response to treatment.
Results:
The final cohort was composed of 2071 patients from 40 centers. The ATA risk of persistent/recurrent disease was classified as low in 1109 patients (53.6%), intermediate in 796 (38.4%), and high in 166 (8.0%). Structural incomplete responses were documented in only 86 (4.2%) patients: 1.5% in the low-risk, 5.7% in the intermediate-risk, and 14.5% in the high-risk group. The baseline ATA risk class proved to be a significant predictor of structural persistent disease, both for intermediate-risk (odds ratio [OR] 4.67; 95% confidence interval [CI] 2.59–8.43) and high-risk groups (OR 16.48; CI 7.87–34.5). Individual center did not significantly influence the prediction of the 1-year disease status.
Conclusions:
The ATA risk stratification system is a reliable predictor of short-term outcomes in patients with DTC in real-world clinical settings characterized by center heterogeneity in terms of size, location, level of care, local management strategies, and resource availability.
Introduction
Most cases of differentiated thyroid cancer (DTC) currently diagnosed display indolent clinical behavior and are associated with very low mortality rates. Therefore, a more conservative approach in DTC management, with less extensive surgery, more selective use of radioactive iodine and less intensive follow-up protocols, is being recommended (1).
The aim is to avoid subjecting low-risk patients to unnecessary diagnostic procedures and overtreatment without reducing the chances of identifying those rare cancers that are likely to require more aggressive management (2). To facilitate this process, several scientific societies have developed tools for the prognostic stratification of patients with DTC. One of the most widely used is that developed by the American Thyroid Association (ATA), which aids clinicians in estimating individual patients' risk of persistent or recurrent disease (3). Its usefulness in these settings has been demonstrated by several retrospective, single-center studies (4).
A revised version of this system was included in the ATA guidelines published in 2016 (1). The criteria for classifying the risk of recurrence as high in this version are substantially more restrictive compared with the previous ones and, as a result, the number of patients who will be considered at low or intermediate risk has increased. The updated stratification system has also been validated in several retrospective cohort studies in different parts of the world. Most of these studies, however, were conducted at single health care facilities that served as a referral center for patients with thyroid cancer (5 –7), and it is unclear whether their findings reflect the performance of the system in real-world, heterogeneous clinical settings. New data, ideally from prospective, multicenter studies, are needed to better define the impact of the recent revisions on the system's ability to predict post-treatment DTC evolution, fundamental information for developing cost-effective follow-up strategies.
In this prospective cohort study, we analyzed data of over 2000 cases of DTCs managed in 40 diverse health care settings in Italy. Our aims were as follows: (i) to evaluate the performance of the 2015 ATA risk stratification system in predicting the response to treatment documented ∼12 months after the initial disease treatment; and (ii) to determine the extent to which this performance is affected by the treatment center itself.
Methods
The Italian Thyroid Cancer Observatory (ITCO) web-based database was opened in 2013 at the Thyroid Cancer Center of the Sapienza University of Rome (the network's coordinating center). Since then, it expanded to 49 other thyroid cancer centers in the country that joined the network (8). The database now includes prospectively collected data on nearly 7000 patients with histologically confirmed diagnoses of differentiated, medullary, poorly differentiated, or anaplastic thyroid cancer. Cases are inputted in the database at the time of the initial treatment in the reporting ITCO center, or when the patient begins follow-up in the reporting center within 12 months after undergoing initial treatment in a non-ITCO center.
Each case record contains information on patient demographics and biometrics, circumstances of the diagnosis, tumor pathology, surgical and radioactive iodine (RAI) treatments, as well as the results of periodic follow-up examinations. The ITCO provides no guidance or restrictions in terms of patient management to the participating centers, since the database is designed to provide a picture of real-world practices. Sensitive patient data are encrypted, and the database is anonymously managed for statistical analysis.
For the purposes of the present study, we reviewed all records present in the database and selected consecutive cases that satisfied the following criteria: (i) histological diagnosis of DTC—papillary thyroid cancer (PTC), follicular thyroid cancer (FTC), and poorly DTCs and their variants (with the exclusion of noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP); (ii) availability of all information on the initial treatment and pathological characteristics of the tumor required for the ATA recurrence risk assessment; and (iii) results of the 1-year follow-up visit (carried out 6–18 months after the initial treatment), including all data needed to classify the estimated response to treatment.
For each case, we recorded the following information.
Initial treatment
Treatment of the primary tumor was classified as thyroid lobectomy or total thyroidectomy. The latter category also included patients who had had a completion thyroidectomy following thyroid lobectomy. For all patients who had total thyroidectomy, we also recorded the use of radioactive iodine remnant ablation (RRA) as performed or not performed. Cervical lymph node dissection, when performed, was described as central compartment dissection, lateral compartment dissection, or both.
Risk of persistent or recurrent disease
The estimated level of risk was determined by the study team in accordance with the 2009 ATA guidelines (3) and relevant modifications in the 2015 update (1). Classification was based on the data available immediately after the initial treatment

Risk of persistent or recurrent disease according to the ATA risk stratification system. ATA, American Thyroid Association; ETE, extrathyroidal extension; PTC, papillary thyroid cancer; RAI, radioactive iodine.
Responses to the initial treatment
These were classified as excellent, biochemical incomplete, structural incomplete, or indeterminate on the basis of data collected during the clinical evaluation carried out at the 1-year follow-up visit. These data included imaging findings (cervical ultrasound in all patients, and RAI scintigraphy in selected individuals), basal or stimulated serum thyroglobulin (Tg) levels, and anti-Tg antibody (TgAb) levels. Additional imaging studies were performed at the clinicians' discretion. The results were classified as specified in the ATA guidelines (1) for patients who had undergone thyroidectomy followed by RRA, and as advocated by the European Society for Medical Oncology (9) for those whose initial treatment consisted of surgery alone (thyroidectomy or lobectomy) (Supplementary Table S1).
Cervical lymph nodes with highly suspicious features on ultrasonography, as defined by the European Thyroid Association guidelines (10), were considered imaging evidence of persistent disease; those displaying low-suspicion features were classified as nonspecific imaging findings (11). Suspicious findings of other imaging studies were classified by the treating physicians. The presence of structural disease at the 1-year evaluation was considered persistent disease (1,12).
Statistical analyses
In our descriptive analysis, continuous variables were expressed as medians with interquartile ranges and nominal variables in terms of frequency counts and corresponding percentages. To model the response to treatment, we chose a cumulative link model, which is routinely used to describe the distribution of ordinal categorical response data. The ordinal response was categorized in descending order of desirability as excellent, indeterminate, biochemical incomplete, or structural incomplete. In this framework, the link used was the log of a probability ratio, calculated as the probability of not exceeding a given category versus the probability of exceeding it (in a logit model perspective), and predictors were inserted on a linear scale.
Given the hierarchical structure of the data, with patients nested within treatment centers, we used a mixed-effect model specification, with a center-specific intercept summarizing unobserved center-specific characteristics. To account for potential overlap or association between the center-specific features that were observed, and the unobserved features described by the random intercepts, we inserted in the model the average for each covariate on patients from a given center. We approximated the integral defining the likelihood via the Laplace approximation, using the R library ordinal (13). We also evaluated a binary response to treatment consisting of excellent versus structural incomplete responses only, in a mixed logit model. This was estimated by the R library lme4 (14).
All statistical analyses were performed with the R statistical software package, R Core Team (2017) (15).
Results
Out of 6867 case records in the database at data lock (2019), 1452 (21.1%) were excluded because the histological diagnosis was incomplete or failed to meet the inclusion criteria (i.e., tumors diagnosed as medullary thyroid cancer, anaplastic thyroid cancer, NIFTP, or tumors with unknown malignant potential), and 148 additional cases were excluded because one or more items essential to estimate the risk of recurrence were missing. Subsequent exclusions consisted of 3158 cases in which the patient had not yet undergone a 1-year follow-up assessment, and 38 others in which 1-year follow-up data needed to classify the treatment response were lacking. Thus, the final cohort consisted of 2071 patients who have been followed in 40 of the ITCO centers (Table 1
Clinical and Demographic Features of the Study Cohort
The categories “total thyroidectomy + RRA” and “total thyroidectomy,” each includes some cases of “completion thyroidectomy” performed after thyroid lobectomy.
Many pathology reports did not include details of muscle invasion before the publication of 8th edition of AJCC TNM staging system.
IQR, interquartile range; ITCO, Italian Thyroid Cancer Observatory; RRA, radioactive iodine remnant ablation.
The ATA risk of persistent/recurrent disease was classified as low in 1109 patients (53.6%), intermediate in 796 (38.4%), and high in 166 (8.0%). Treatment responses observed at the 1-year follow-up visit are summarized in Table 2. Overall, structural incomplete responses were documented in 86 (4.2%) patients. The frequency of structural incomplete responses increased progressively with the level of risk estimated at baseline from 1.5% in the low-risk group, to 5.7% in those at intermediate risk, and 14.5% in the small subset of patients considered at high risk for persistent/recurrent disease (Fig. 2).

Prevalence of patients classified as low, intermediate, and high risk, and their rates of structural persistent disease.
Responses to Treatment at 1-Year Evaluation
Not possible to document stability of thyroglobulin values at first evaluation.
RRA, radioactive iodine remnant ablation; TL, thyroid lobectomy; TT, total thyroidectomy.
As shown in Table 3, the ATA risk class assigned at baseline proved to be a significant predictor of the response to treatment observed at the 1-year follow-up visit. It was able to predict the presence of structural disease as opposed to an excellent response. Furthermore, in cases classified as intermediate or high risk, there was a significantly higher probability of a “less-than-excellent response” (i.e., indeterminate or biochemical incomplete or structural incomplete, in decreasing order) (Table 4).
Subgroup Analysis of 1662 Patients with Excellent or Structural Complete Responses at 1 Year
Likelihood of structural disease according to the estimated risk of persistent disease calculated at baseline.
Versus excellent response.
CI, confidence interval; OR, odds ratio; SE, standard error.
Likelihood of Less-Than-Excellent Response (All Four Classes; Ordinal Analysis) According to the Estimated Risk of Persistent Disease Calculated at Baseline
ATA, American Thyroid Association.
We also assessed whether the performance of the initial persistent disease risk estimate was significantly influenced by the practices of individual reporting centers, which included both academic and nonacademic health care facilities distributed throughout Italy (Table 1). Some potential biases (such as the case-mix of patients treated, surgical volumes, and different tools used) are difficult to document but can potentially influence both the initial risk estimation and the subsequent assessment of the response to treatment. The mixed-effect model we used took these into account with a center-specific intercept summarizing unobserved center-specific features. The practices of individual recruiting centers did not influence the prediction of the 1-year status by the ATA risk stratification system (coefficient −0.88 ± 1.53, p = 0.57, in intermediate-risk patients, and −0.77 ± 2.19, p = 0.72, in high-risk patients, for the prediction of structural disease).
Discussion
A reliable estimate of the post-treatment risk of persistent or recurrent disease in a patient with DTC on the basis of clinical, histopathological, and perioperative data provides valuable prognostic information. Importantly, it supports clinicians' efforts to develop personalized treatment and follow-up strategies (4,16). Most cases can be safely managed with less extensive surgery, more selective use of RAI therapy, and relatively relaxed follow-up schedules. The expected benefits are substantial and include reduced health care costs, lower treatment-related morbidity rates, and improved quality of life for patients. These expected benefits must, however, be weighed against the risk of missing those thyroid cancers that warrant intensive therapeutic efforts and close post-treatment surveillance due to their intrinsic biologic aggressiveness.
The ATA risk stratification system was validated in different cohorts around the world (17 –20). Additional features were published in 2016 to include evaluation of the number of vascular invasion foci, number and size of involved lymph nodes, presence of extranodal extension, and (if available) BRAF and TERT promoter mutational status. This updated version has been already validated (5 –7,21 –23). However, all of these validation studies were based on retrospective review and conducted in a few high-volume thyroid cancer referral centers. The current study is based on a large, contemporary cohort of patients with prospectively collected data in many thyroid centers across Italy, including academic and nonacademic institutions, to validate the ATA risk stratification system in predicting persistent disease at the 1-year follow-up visit.
Interinstitutional and interobserver variabilities have been reported in the diagnosis of histological subtypes (24), the detection and quantification of extrathyroidal extension (25), neck ultrasonographic examination (26), and various aspects of RAI administration, including indications, the amount of activity administered, and the method used to ensure appropriately elevated TSH levels at the time of RAI therapy (withdrawal of thyroid hormone replacement therapy vs. rhTSH injections) (27). Treatment centers also vary widely in the number of thyroidectomies performed by their staff each year and this factor is a well-established predictor of outcome (28,29). Furthermore, different assays for Tg and TgAb are used in different institutions.
All of these confounding factors could not be systematically documented on the basis of the information available in our database, but it was reasonable to suspect that some of them could potentially influence both the initial risk estimates in our cohort and the subsequent assessment of the response to treatment. By the use of mixed-effect model, we took into account unobserved center-specific features, documenting that the performance of ATA risk stratification system is not affected by the center in which it is applied.
Our findings demonstrate that the ATA risk stratification system for recurrent/persistent disease is indeed a reliable predictor at the 1-year follow-up evaluation, independent of treatment centers. This is true despite the fact that the likelihood of a “less-than-excellent” response varies across treatment centers, probably as a result of between-center differences in surgical volumes, case mixes, the availability of diagnostic tools, and/or other factors.
It should be stressed that our findings apply exclusively to the prediction of the response to initial treatment documented at the 1-year visit. Risk stratification is in fact a dynamic, ongoing process in which the likelihood of recurrence is periodically reassessed and the management strategy modified as needed (30). Our current findings cannot provide any indication on how the system will perform in predicting the longer term evolution of a DTC. It is important to note, however, that most DTC recurrences are identified within the first 5 years of follow-up after initial treatment (31). Moreover, recent evidence suggests that persistent disease observed at the 1-year follow-up visit is associated with worse outcomes than “recurrences” identified later (12). Prediction of this early outcome may thus have particular clinical relevance.
A limitation of our study is related to the inclusion of non-PTC cases in our cohort, in particular FTCs and Hurthle-cell thyroid carcinomas. There is indeed growing evidence that these tumors behave differently from each other and also from PTCs (24). These tumors represented only 5.2% of the DTCs in our cohort. Therefore, our current findings can shed no light on the specific performance of the ATA risk stratification system in patients with these less common thyroid cancer histologic subtypes.
In conclusion, the ATA risk stratification system is a reliable predictor of short-term outcomes in patients with DTC in real-world clinical settings characterized by appreciable treatment-center heterogeneity in terms of size, location, level of care, diagnostic resources, and local management strategies.
Footnotes
Acknowledgments
We thank all the collaborators of the ITCO Network: Rosa Falcone, Valeria Ramundo, Marco Biffoni, Laura Giacomelli (Department of Translational and Precision Medicine, and Department of Surgical Sciences, Sapienza University of Rome); Michela Massa (Department of Medical Sciences, Fondazione IRCCS Casa Sollievo della Sofferenza, S. Giovanni Rotondo); Efisio Puxeddu (Department of Medicine, University of Perugia); Paolo Limone (Division of Endocrinology, Diabetology, and Metabolism, Mauriziano Umberto I Hospital, Turin); Armando Patrizio (Department of Clinical and Experimental Medicine, University of Pisa); Poupak Fallahi (Department of Translational Research of New Technologies in Medicine and Surgery, University of Pisa); Michela Marina (Department of Medicine and Surgery, Azienda Ospedaliero-Universitaria and University of Parma); Ilaria Messuti (Department of Oncology, Division of Endocrinology and Metabolism, Humanitas-Gradenigo Hospital, University of Turin); Giovanni Savoia, Piernicola Garofalo (Division of Endocrinology Cervello Hospital, Palermo); Maria Grazia Deiana, Federica Presciuttini (Department of Endocrinology, AOU Sant'Andrea, Sapienza University of Rome); Marco Centanni, Camilla Virili (Department of Medico-surgical Sciences and Biotechnologies, Sapienza University of Rome); Valeria Calzolaro (Geriatrics Unit, Department of Clinical and Experimental Medicine, University of Pisa); Maria Beatrice Panarotto (ASST-Spedali Civili, Brescia); Ezio Ghigo (Division of Endocrinology, Diabetology and Metabolism, Department of Medical Sciences, University of Turin); Andrea Palermo (University Hospital Campus Bio-Medico, Rome); Salvatore Tumino (Department of Clinical and Experimental Medicine, University of Catania); Gianluca Aimaretti (Department of Translational Medicine, Università del Piemonte Orientale, Novara); Maria Grazia Chiofalo, Vincenzo Marotta (Istituto Nazionale dei Tumori, IRCCS Fondazione Pascale, Naples); Annamaria D'Amore (Division of Endocrine Surgery, Fondazione Policlinico Gemelli, Catholic University, Rome); Alice Nervo, Marco Gallo, Alessandro Piovesan, Alberto Ragni, Francesco Felicetti (Department of Medical Sciences, Molinette Hospital, University of Turin); Luca Chiovato, Martina Molteni, Giulia Bendotti (Unit of Internal Medicine and Endocrinology, Laboratory for Endocrine Disruptors, Department of Internal Medicine and Therapeutics, Istituti Clinici Scientifici Maugeri IRCCS, University of Pavia); Lorenzo Bresciani, Laura Locati (Istituto Nazionale dei Tumori di Milano).
Author Disclosure Statement
No competing financial interests exist.
Funding Information
Writing support was provided by Marian Everett Kent, BSN, and funded by the Fondazione Umberto Di Mario. The study was supported by the Sapienza University of Rome Research Grant (RM11916B83A211FC) to C.D.
Supplementary Material
Supplementary Table S1
