Abstract
Background:
To evaluate differences in malignancy rates and consequent follow-up strategies for cytologically benign thyroid nodules before and after the introduction of the Bethesda system according to the risk stratification categories of four thyroid management guidelines.
Methods:
This retrospective study was approved by our institutional review board. In this study, 1716 thyroid nodules with initially benign cytologic diagnosis at ultrasound-guided fine needle aspiration (US-FNA) in 1695 patients were included: 1187 nodules from the pre-Bethesda period and 529 nodules from the post-Bethesda period. Based on US features, the thyroid nodules were categorized into the final assessment categories of the 2015 American Thyroid Association (ATA), the 2016 American Association of Clinical Endocrinologists, American College of Endocrinology, and Associazione Medici Endocrinologi, the American College of Radiology Thyroid Imaging Reporting and Data System (ACR-TIRADS), and the European Thyroid Association guidelines for Ultrasound Malignancy Risk Stratification systems. Estimated malignancy rates before and after propensity score matching according to follow-up intervals were obtained.
Results:
Of the 1716 thyroid nodules benign on initial US-FNA, the malignancy rate was 3.2% (38 of 1187) in the pre-Bethesda period and 2.6% (14 of 529) in the post-Bethesda period (p = 0.641). The 2015 ATA high suspicion pattern and the ACR-TIRADS category 5 had high estimated malignancy rates of >5% in the post-Bethesda period (6.52 and 8.57, respectively). Positive findings that indicated US-FNA in the ACR-TIRADS had estimated malignancy rates of 5.26 and 5.67, respectively, while the remaining guidelines had estimated malignancy rates of <5% in both periods.
Conclusions:
Immediate diagnostic intervention after benign cytologic diagnosis may not be necessary regardless of the cytologic criteria applied, but it can be considered for the highly suspicious categories in the 2015 ATA or the ACR-TIRADS for benign cytologic diagnosis of the Bethesda system.
Introduction
Thyroid nodules are a common medical issue nowadays, with reports showing that ∼19–68% of healthy adults have incidentally detected thyroid nodules on neck ultrasound (US) examinations (1). For thyroid nodules detected on US, US-guided fine needle aspiration (US-FNA) is considered the procedure of choice for further evaluation due to its accuracy and cost-effectiveness (1,2). Approximately 60–70% of thyroid nodules are diagnosed as benign with US-FNA, with the larger benefit of FNA being that it enables the recommendation of conservative management over unnecessary surgery based on reliable benign interpretations (3).
The malignancy rate for benign cytologies given by the Bethesda system is 0–3%, and thus, clinical follow-up is recommended (3,4). But cytological examinations are not always accurate as false-negative results for benign cytology have been reported to be 0.3–18% (5 –9). Therefore, management for cytologically benign thyroid nodules vary from clinical follow-up to repeat US-FNA. The malignancy rates for these benign nodules reported in the current literature are mostly from studies done before the introduction of the Bethesda system for thyroid cytopathology (7 –10); based on these past studies, the 2015 American Thyroid Association (ATA) guidelines (1) recommended repeat US-FNA at different time intervals according to US patterns after an initial benign cytologic diagnosis. For comparison, the 2016 American Association of Clinical Endocrinologists, American College of Endocrinology, and Associazione Medici Endocrinologi (AACE/ACE/AME) guidelines (2) simply recommended repeat aspiration for thyroid nodules containing suspicious US features. One recent study showed that cytologically benign thyroid nodules have a low malignancy risk when the Bethesda system is applied regardless of US features (11), but little has been investigated on the differences in malignancy rates for cytologically benign thyroid nodules diagnosed before and after introduction of the Bethesda system. Also, most authoritative guidelines for the management of thyroid nodules have recently published revised versions containing risk stratification systems for thyroid nodules with delicate implementations for individual US features (1,2,12). When differences in the reported false-negative rates of benign thyroid nodules are considered according to the presence of suspicious US features (9,10), final outcomes and concurrent management approaches for cytologically benign thyroid nodules may differ according to the chosen risk stratification system.
Thus, we investigated differences in malignancy rates for cytologically benign thyroid nodules that were diagnosed before and after the introduction of the Bethesda system according to the risk stratification categories of four management guidelines, and the consequent appropriate follow-up strategies for these thyroid nodules.
Materials and Methods
This retrospective study was approved by our institutional review board, with a waiver for the requirement of informed consent for reviewing medical records and US images.
Patients
The Bethesda System for Reporting Thyroid Cytopathology (3) was implemented in December 2009 at our institution. Our institutional database was searched for all US-FNA procedures initially performed on thyroid nodules measuring ≥10 mm during two 1-year periods covering March 2008 to February 2009 for the pre-Bethesda period, and March 2013 to February 2014 for the post-Bethesda period. US-FNA was performed on 2879 thyroid nodules measuring ≥10 mm during the pre-Bethesda period, and 1767 thyroid nodules during the post-Bethesda period. Of them, 1912 (66.4%) of the 2879 thyroid nodules and 1083 (58.7%) of the 1767 thyroid nodules were cytologically diagnosed as benign on US-FNA during the pre- and post-Bethesda periods, respectively. Thyroid nodules meeting the following criteria were included: (i) nodules that underwent surgery, (ii) nodules with definitive benign or malignant cytology results on follow-up US-FNA or US-core needle biopsy, and (iii) nodules that showed no change in size or decreased size at follow-up US performed at least 1 year after initial benign cytologic results. Finally, 1716 thyroid nodules in 1695 patients were included in this study: 1187 nodules in 1172 patients for the pre-Bethesda period and 529 nodules in 523 patients for the post-Bethesda period (Fig. 1).

Diagram showing the inclusion of thyroid nodules in the study population. AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance; FN, follicular neoplasm; ND, non-diagnostic; RFA, radiofrequency ablation; US-CNB, ultrasound-guided core needle biopsy; US-FNA, ultrasound-guided fine needle aspiration.
The mean size of the 1716 thyroid nodules was 21.5 mm (standard deviation: 12.0, range: 10–100 mm). The mean age of the 1695 patients was 50.3 years (standard deviation: 12.1, range: 18–85 years). Of the 1695 patients, 1415 (83.5%) were women and 280 (16.5%) were men.
US examinations and image analysis
US examinations were performed with a 7–15 MHz (HDI5000; Philips Medical Systems, Bothell, WA), an 8–15 MHz (Acuson Sequoia; Siemens Medical Solutions, Mountain View, CA), or a 5–12 MHz (iU22; Philips Medical Systems) linear transducer. Real-time US examinations and subsequent US-guided FNA were performed by one of 13 radiologists (3 faculty members who consistently performed US and US-FNA during both periods, 2 fellows during the pre-Bethesda period, and 2 faculty members and 6 fellows during the post-Bethesda period, with 1–20 years of experience in thyroid imaging). US-FNA was performed on the nodule considered most suspicious for malignancy, or on the largest mass when none of the multiple thyroid masses showed any suspicious US feature.
US features of the thyroid nodules were prospectively analyzed and recorded in our institutional database by the radiologist who had performed US examinations according to the following categories during both periods: internal composition, echogenicity, margin, presence of calcifications, and shape (13). Internal composition was classified as solid, mainly solid (solid contents ≥50%), or mainly cystic (solid contents <50%). Echogenicity was classified as hyper- to isoechoic (nodules showing hyper- to isoechogenicity compared with the surrounding thyroid parenchyma), hypoechoic (nodules showing hypoechogenicity compared with the surrounding thyroid parenchyma), or markedly hypoechoic (nodules showing hypoechogenicity compared with the strap muscles). Margin was classified as circumscribed or not circumscribed (i.e., microlobulated or irregular). Calcifications were classified as macrocalcifications including eggshell calcifications, microcalcifications or mixed calcifications, or no calcifications. Shape was classified as parallel or nonparallel (greater in the anteroposterior dimension than the transverse dimension, taller-than-wide shape).
Using the prospectively recorded data, one radiologist (J.H.Y.) with 10 years of experience in thyroid imaging sorted the thyroid nodules into the specific final assessment categories of the four recently published management guidelines for thyroid nodules: the 2015 ATA management guidelines for thyroid nodules (1), the 2016 AACE/ACE/AME guidelines (2), the American College of Radiology Thyroid Imaging Reporting And Data System (ACR-TIRADS) (12), and the European Thyroid Association guidelines for Ultrasound Malignancy Risk Stratification (EU-TIRADS) (14) (Table 1).
Indications for Ultrasound-Guided Fine Needle Aspiration According to the Four Management Guidelines for Thyroid Nodules
Suspicious US features defined in the high suspicion pattern.
Reader selects one feature from each of the categories and sums the points to reach a final assessment category.
Published in a multi-institutional analysis study.
2015 ATA, 2015 American Thyroid Association Management Guideline for Thyroid Nodules; 2016 AACE, American Association of Clinical Endocrinologists, American College of Endocrinology, and Associazione Medici Endocrinologi Medical Guidelines; ACR-TIRADS, American College of Radiology Thyroid Imaging Reporting and Data System; EU-TIRADS, European Thyroid Association Guideline for Ultrasound Malignancy Risk Stratification of Thyroid Nodules; US-FNA, ultrasound-guided fine needle aspiration.
US-FNA and cytologic interpretation
US-FNA was performed at least twice on each thyroid nodule by using a 23-gauge needle attached to a 2-mL disposable syringe, without using an aspirator. Local anesthesia was not routinely applied for aspiration procedures. The aspirated material was expelled on glass slides and immediately placed in 95% ethanol for Papanicolaou staining. The remaining material in the syringe was rinsed in saline for cell block processing. One of eight cytopathologists reviewed the prepared slides and provided cytologic diagnosis. Cytopathologists were not present during US-FNA procedures, and additional staining was performed at the cytopathologist's request on a case-by-case basis.
Until December 2009, cytologic reports at our institution were categorized into five categories: (i) malignancy: specimens showing abundant cells with unequivocal cytologic features of malignancy; (ii) suspicious for papillary thyroid carcinoma (PTC): specimens containing cytologic atypia, but insufficient for definite diagnosis of malignancy; (iii) indeterminate: including follicular neoplasm or Hürthle cell neoplasm, specimens showing cytologic features of monotonous cellular population and scanty colloid-lacking features of papillary carcinoma; (iv) benign: specimens including colloid nodules, nodular hyperplasia, lymphocytic thyroiditis, and Graves' disease; and (v) inadequate: specimens showing <6 groupings of well-preserved thyroid cells, each consisting of <10 cells per group (15,16). From December 2009, US-FNA cytologic reports were made based on the Bethesda System for Reporting Thyroid Cytopathology (3).
Data and statistical analysis
Histopathology results from surgery and follow-up US-FNA were considered as the reference standard. Nodules followed with US examinations that were stable or had decreased size were considered benign. Thyroid nodules that had the US features or size indications for US-FNA given by the four management guidelines were classified as positive, while nodules that did not were classified as negative.
For the subject-based comparison of demographics and imaging features, the independent two-sample t-test was used to compare continuous variables, and the Chi-square test was used to compare categorical variables. For the nodule-based comparison of demographic and imaging features, the generalized estimating equations (GEE) method was used. Estimated malignancy rates with standard errors were calculated for US categories and for each of the final assessment categories in the four management guidelines, and they were compared with the GEE method. Since the overall population was considerably larger and the follow-up duration was significantly longer in the pre-Bethesda period, propensity score matching was used to construct a cohort of patients with similar follow-up intervals; then, demographics, imaging features, and estimated malignancy rates were compared between the pre- and post-Bethesda periods.
Statistical analysis was performed by using statistical software (SAS version 9.4; SAS Institute, Cary, NC). Two-sided p-values <0.05 were considered to have statistical significance.
Results
Of the 1716 cytologically benign thyroid nodules on initial US-FNA, 1664 (97.0%) were benign and 52 (3.0%) were finally proven malignant. When dividing the thyroid nodules according to the time period, 38 (3.2%) of the 1187 nodules were malignant in the pre-Bethesda period, while 14 (2.6%) of the 529 thyroid nodules were malignant in the post-Bethesda period (p = 0.641). Among the 226 thyroid masses confirmed with surgical resection, 177 were benign (adenomatous hyperplasia in 150, follicular adenoma in 22, Hürthle cell adenoma in 2, and thyroiditis in 3), and 50 were malignant (PTC in 29, follicular variant of PTC in 9, follicular carcinoma in 6, Hürthle cell carcinoma in 2, medullary carcinoma in 2, diffuse large B cell lymphoma in 1, and metastasis from renal cell carcinoma in 1). Two malignant nodules were diagnosed as papillary carcinoma on cytology, but they were not confirmed with surgery since the two patients were referred to other institutions.
Demographic features according to the pre- and post-Bethesda periods are summarized in Table 2. Among the 1187 cytologically benign thyroid nodules in the pre-Bethesda period, the mean follow-up interval was 46.7 ± 33.1 months, significantly longer than the mean follow-up interval of 25.4 ± 11.9 months (p < 0.001) for the 529 nodules in the post-Bethesda period. The mean size of the thyroid nodules did not significantly differ between the pre-Bethesda and post-Bethesda periods, before and after propensity score matching (p = 0.202 and p = 0.187, respectively). Significantly higher rates of follow-up US-FNA were seen in the pre-Bethesda period, while higher rates of follow-up US examinations were seen in the post-Bethesda period (p < 0.001, before and after matching, respectively).
Demographics of the Cytologically Benign Thyroid Nodules Before and After Propensity Score Matching
Percentages are in parentheses.
Comparison between pre- and post-Bethesda periods.
Estimated malignancy rates for the pre- and post-Bethesda periods according to US features
Table 3 summarizes the estimated malignancy risks of US features in the pre- and post-Bethesda periods. Estimated malignancy rates were higher than the reported false-negative rate of 5% for thyroid nodules with benign cytology (17,18) in markedly hypoechogenicity, not circumscribed margins, macro- or eggshell calcifications, and nonparallel shape for the pre-Bethesda period, and in micro- or mixed calcifications for the post-Bethesda period, before and after propensity score matching. When comparing the pre- and post-Bethesda periods, the estimated malignancy rate of the pre-Bethesda period was significantly higher than the post-Bethesda period for markedly hypoechogenicity and nonparallel shape, before and after matching (all p < 0.05, respectively).
Comparison of the Estimated Malignancy Risks of Ultrasound Features Between the Cytologically Benign Thyroid Nodules Before and After Propensity Score Matching
Propensity score matching according to follow-up interval.
Microlobulated or irregular margins.
SE, standard error.
Estimated malignancy rates according to the final assessment categories of the four management guidelines
Table 4 summarizes the estimated malignancy rates according to the final assessment categories of the four management guidelines in the pre- and post-Bethesda periods (Supplementary Table S2). Estimated malignancy rates were >5% in the highest suspicion categories of all four management guidelines in the pre-Bethesda period, before and after matching. However, in the post-Bethesda period, estimated malignancy rates were <5% in all final assessment categories of the four management guidelines except for the high suspicion pattern of the 2015 ATA guidelines [5.46, standard error (SE): 3.06 before matching, 6.52, SE: 3.64 after matching] and category 5 of the ACR-TIRADS (7.32, SE: 4.07 before matching, 8.57, SE: 4.73 after matching). The estimated malignancy rate did not significantly differ between the pre- and post-Bethesda periods for any of the final assessment categories after propensity score matching (all p > 0.05, respectively).
Comparison of the Estimated Malignancy Risks According to the Final Assessment Categories of the Four Management Guidelines Before and After Propensity Score Matching
Propensity score matching according to follow-up interval.
TR, TIRADS category.
Estimated malignancy rates according to the US-FNA indications of the four management guidelines
Table 5 summarizes the estimated malignancy rates according to the US-FNA indications of the four management guidelines in the pre- and post-Bethesda periods (Supplementary Table S3). Before and after propensity score matching, positive findings indicating US-FNA in the 2015 ATA, 2016 AACE, and EU-TIRADS had estimated malignancy rates <5% in both the pre- and post-Bethesda periods. The estimated malignancy rate for the ACR-TIRADS was 5.05 (SE: 1.05) in the pre-Bethesda period and 5.06 (SE: 1.64) in the post-Bethesda period before matching, while it was 5.26 (SE: 1.81) in the pre-Bethesda period and 5.67 (SE: 1.97) in the post-Bethesda period after matching, respectively. None of the positive findings indicating US-FNA in the four guidelines showed significant differences in the estimated malignancy rates between the pre- and post-Bethesda periods (all p > 0.05, respectively).
Comparison of the Estimated Malignancy Risks According to the Ultrasound-Guided Fine Needle Aspiration Indications of the Four Management Guidelines Before and After Propensity Score Matching
Propensity score matching according to follow-up interval.
Discussion
Our results show that the estimated malignancy rates according to the US-FNA indications of the 2015 ATA, 2016 AACE, and EU-TIRADS for cytologically benign thyroid nodules are within the false-negative rates of 5% for benign cytology reported in the literature (17,18), regardless of the application of the Bethesda system for cytologic diagnosis. Indications for US-FNA in the currently revised management guidelines incorporate both size and US features (Table 1), and our results show that cytologically benign thyroid nodules with US findings that result in a recommendation for FNA in the four management guidelines have low malignancy rates; this observation supports the fact that conservative surveillance can be considered after an initial benign cytologic diagnosis rather than performing additional invasive diagnostic studies. The estimated malignancy rates for the ACR-TIRADS were slightly higher: before matching, 5.05 (SE: 1.05) in the pre-Bethesda period and 5.06 (SE: 1.64) in the post-Bethesda period, and after matching, 5.26 (SE: 1.81) in the pre-Bethesda and 5.67 (SE: 1.97) in the post-Bethesda period, respectively.
The 2015 ATA guidelines do not recommend further immediate diagnostic studies or treatment for thyroid nodules diagnosed as benign on cytology (strong recommendation, high-quality evidence) (1). But repeat US and FNA is also recommended if suspicious US features are present due to the discrete false-negative rates of nodules with benign cytology (strong recommendation, moderate-quality evidence) (1). This results in some confusion regarding the management of benign thyroid nodules, as clinicians end up questioning whether immediate intervention should be considered in benign nodules showing suspicious US features. The evidence for recommending repeat examinations for benign thyroid nodules with suspicious US features in the 2015 ATA guidelines is mostly based on studies with cytology results obtained in the pre-Bethesda era (10,19); our study provides malignancy rates after application of the Bethesda system and including the 2015 ATA patterns. Our results show that there were no significant differences in overall malignancy rates between the pre- and post-Bethesda periods for each final assessment category in the four management guidelines, supporting the fact that benign cytologic diagnosis may not be affected by the cytologic criteria applied for diagnosis. Results of our study also support the recommendation of the 2015 ATA guidelines that repeat aspiration may be considered for thyroid nodules with high suspicion patterns on US diagnosed as benign on US-FNA. Similar results are seen with the ACR-TIRADS; regardless of the cytologic system applied for diagnosis, cytologically benign thyroid nodules assessed as category 5 may be considered for repeat aspiration. In contrast, the malignancy rates of high risk patterns of the 2016 AACE and EU-TIRADS decreased to levels of <5% after application of the Bethesda system (Table 4), but without statistically significant differences compared with the pre-Bethesda period. The differences in malignancy rates between the various guidelines may have been due to the (i) inclusion of the “markedly hypoechoic” feature in the 2016 AACE and EU-TIRADS, (ii) the scoring system in the ACR-TIRADS giving higher scores for US features considered to have different weights in predicting malignancy, and (iii) larger size indication for FNA in the ACR-TIRADS for categories 3 and 4 compared with the low to intermediate suspicion/risk patterns. Nodules with a “markedly hypoechoic” feature that do not have other suspicious features such as microcalcifications, not circumscribed margins, or taller-than-wide shape would be triaged into high risk patterns according to the AACE and EU-TIRADS, while they are triaged into the intermediate suspicion pattern when using the 2015 ATA guidelines. Among the “markedly hypoechoic” lesions, 85.7% (24 of 28) in the pre-Bethesda and 100.0% (5 of 5) in the post-Bethesda were benign (Supplementary Table S1), which explains the lower malignancy risk in the AACE and EU-TIRADS high risk patterns.
The broad range of malignancy rates for cytologically benign thyroid nodules reported in the literature leads to recommendations for follow-up examinations (5 –9) with either US or US-FNA. Prior studies were conducted in diverse settings, in which the diagnostic performances and false-negative results of US-FNA were investigated, with the primary end point being the detection of malignancy. However, thyroid cancer (TC) itself generally progresses very slowly. The shortest time interval was 2 years before changes were detected, prompting repeat US-FNA in false-negative cytologically benign thyroid nodules, and only 1 cancer showed detectable changes in US features among 852 thyroid nodules classified as benign on US (7). Another prior study also showed that the follow-up interval could be safely extended to 3 years without an increase in mortality or patient harm (20). Based on these results, we should now focus on resultant patient mortality rather than the delay in the diagnosis when evaluating the consequences of false-negative cytologies, as the immediate detection of missed cancers from thorough surveillance may not contribute to prolonging patient survival, as seen in a recent study with no thyroid-cancer-related deaths detected in a study cohort followed for a mean of 8.5 years among 18 false-negative cases (21). Missed cancers among initial biopsy-proven benign thyroid nodules convey negligible TC mortality, and based on the low malignancy rates of cytologically benign nodules that range from 3.23% to 5.67% according to the US-FNA indications of published management guidelines in our study, the benefit of repeat examinations performed on these patients should be thoroughly questioned.
There are several limitations to this study. First, this study has a retrospective design, in which an inherent selection bias is inevitable. To evaluate the prevalence of malignancy among thyroid nodules with benign cytology and compare the malignancy rates between the four management guidelines, we included patients from the same time period of 1 year for the pre- and post-Bethesda periods, which, in turn, result in different sample sizes in the two periods, which may have affected our results. Second, thyroid nodules in this study were classified into the final assessment categories by using prospectively collected data analyzed individually by radiologists with a wide range in levels of experience (1–20 years) in thyroid imaging. In addition, different radiologists were involved in data acquisition during the two periods. Variability among performers in image analysis may have affected our results, especially when considering the higher percentages of “markedly hypoechogenic” and nonparallel features in the pre-Bethesda period (Supplementary Table S1), but differences among radiologists was not considered in the statistical analysis. Efforts have been made to improve reproducibility by using tools for image quantification as in a recent publication (22), and we anticipate future technical advances enabling objective quantification of imaging features to overcome observer variability. Third, interobserver variability among the 8 cytopathologists who interpreted the slides during the two study periods was not considered. Fourth, the Bethesda system was revised in 2017 (4), and the revised version has not been applied to this study because the prior Bethesda version was used during the study period (3), but our results would not differ much since the 2017 Bethesda system provides no modifications regarding the benign category. Lastly, 13.2% (226 of 1716) of the thyroid nodules were confirmed surgically, and results may have differed if only thyroid nodules with surgical diagnosis had been included. But including surgically confirmed thyroid nodules may have provoked another selection bias, and the low surgery rate reflects actual clinical practice on benign thyroid nodules.
In conclusion, further diagnostic intervention after benign cytologic diagnosis may not be necessary regardless of the cytologic criteria applied, but it can be considered for nodules with US characteristics in the highly suspicious categories in the 2015 ATA or the ACR-TIRADS.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
