Abstract
Background:
For cytologically benign thyroid nodules with very low to intermediate suspicion ultrasound patterns, optimal ultrasound follow-up intervals and outcomes of discontinuing follow-up are unclear.
Methods:
Ovid MEDLINE, Embase, and Cochrane Central were searched through August 2022 for studies comparing different ultrasound follow-up intervals and discontinuation versus continuation of ultrasound follow-up. The population was patients with cytologically benign thyroid nodules and very low to intermediate suspicion ultrasound patterns, and the primary outcome was missed thyroid cancers. Utilizing a scoping approach, we also included studies that were not restricted to very low to intermediate suspicion ultrasound patterns or evaluated additional outcomes such as thyroid cancer-related mortality rate, nodule growth, and subsequent procedures. Quality assessment was performed, and evidence was synthesized qualitatively.
Results:
One retrospective cohort study (n = 1254; 1819 nodules) compared different first follow-up ultrasound intervals for cytologically benign thyroid nodules. There was no difference between >4- versus 1- to 2-year intervals to first follow-up ultrasound in the likelihood of malignancy (0.4% [1/223] vs. 0.3% [2/715]), and no cancer-related deaths occurred. Follow-up ultrasound at >4 years was associated with increased likelihood of ≥50% nodule growth (35.0% [78/223] vs. 15.1% [108/715]), repeat fine needle aspiration (19.3% [43/223] vs. 5.6% [40/715]), and thyroidectomy (4.0% [9/223] vs. 0.8% [6/715]). The study did not describe ultrasound patterns or control for confounders, and analyses were based on interval to first follow-up ultrasound only. Other methodological limitations were not controlling for variability in follow-up duration and unclear attrition. The certainty of evidence was very low. No study compared discontinuation of ultrasound follow-up versus continuation.
Conclusions:
This scoping review found that evidence comparing different ultrasound follow-up intervals in patients with benign thyroid nodules is limited to one observational study, but suggests that the subsequent development of thyroid malignancies is very uncommon regardless of follow-up interval. Longer follow-up may be associated with more repeat biopsies and thyroidectomies, which could be related to more interval nodule growth that meets thresholds for further evaluation. Research is needed to clarify optimal ultrasound follow-up intervals for low to intermediate suspicion cytologically benign thyroid nodules and outcomes of discontinuing ultrasound follow-up for very low suspicion nodules.
Introduction
Thyroid nodules are common. Prevalence estimates in the general population vary according to the method of identification: by palpation, 2% to 7%; by ultrasound, 20% to 70%; and based on autopsy, 8% to 65%. 1,2 On initial evaluation, the majority (86–95%) of thyroid nodules are benign. 3,4 Fine needle aspiration (FNA) cytology has a low false-negative rate for cancer; therefore, the likelihood of a missed cancer following benign FNA cytology results is low—but not zero. 3 Optimal approaches for identifying missed cancers aim to balance the potential yield of different follow-up strategies with costs and other burdens, such as those related to unnecessary repeat FNA and other procedures. Although a second benign FNA result effectively rules out cancer, repeat FNA for all initially benign nodules is not cost-effective or practical, given the very low prevalence of missed cancers. 4
More efficient are risk-stratified approaches that utilize ultrasound to identify nodules with increased likelihood of malignancy that warrant repeat FNA; in these approaches, the timing and frequency of follow-up ultrasound are based on the initial ultrasound findings.
The 2015 American Thyroid Association (ATA) guidelines on the management of thyroid nodules provide risk-stratified recommendations for ultrasound follow-up of initially benign thyroid nodules. 4 For nodules with a high suspicion ultrasound pattern, the 2015 ATA guideline recommends repeat ultrasound and ultrasound-guided FNA within 12 months. For nodules with low to intermediate suspicion ultrasound patterns, the guideline recommends repeat ultrasound at 12 to 24 months, with repeat FNA or repeat ultrasound for nodules with sonographic evidence of significant growth or development of new suspicious sonographic features. For nodules with a very low suspicion ultrasound pattern, the 2015 ATA guideline noted that the utility of repeat ultrasound is limited; if ultrasound is performed, it recommends it be done at ≥24 months.
The 2015 ATA recommendations were based on studies describing the natural history of benign thyroid nodules and ultrasound features associated with increased risk of cancer, 3,5,6 as direct evidence on the benefits and harms of different ultrasound surveillance strategies for benign thyroid nodules was not available. For lower risk benign thyroid nodules, ultrasound surveillance could potentially be spaced out further or even discontinued after some period of initial surveillance, given the low likelihood and generally favorable prognosis of differentiated thyroid cancer.
To inform an updated guideline on the evaluation and management of benign thyroid nodules, ATA commissioned a scoping review on ultrasound surveillance strategies in patients with benign thyroid nodules. This review focuses on low or intermediate suspicion benign thyroid nodules based on ultrasound pattern. The purpose of this scoping review is to evaluate the benefits and harms of (i) discontinuation of ultrasound surveillance and (ii) a longer interval follow-up than the 2015 ATA recommendation for such nodules.
Methods
In conjunction with the American Thyroid Association's Guidelines Task Force for the management of adult patients with thyroid nodules, we determined the key questions for this scoping review:
In patients with cytologically benign thyroid nodules and very low suspicion ultrasound patterns, how does the discontinuation of ultrasound surveillance compared with low-frequency ultrasound follow-up (≥2 years) affect the risk of missing clinically significant thyroid cancers? In patients with cytologically benign thyroid nodules and low to intermediate suspicion ultrasound patterns, how does a delayed ultrasound follow-up examination (five years) compared with earlier follow-up (one to two years) affect the risk of missing clinically significant thyroid cancer?
Given that the ATA ultrasound pattern system of nodule risk stratification is similar to the Thyroid Imaging Reporting and Data Systems proposed by the America College of Radiology (ACR TIRADS), the Korean Society of Thyroid Radiology (K-TIRADS), and the European Thyroid Association (EU-TIRADS), this scoping review included articles that referenced any of these systems.
Search strategies
We searched the Cochrane Central Register of Controlled Trials, Elsevier Embase®, and Ovid MEDLINE® (through August 2022) for relevant studies. Search strategies are shown in Appendix A1. Searches were supplemented by a reference list review of relevant articles.
Study selection
Two investigators independently reviewed abstract and full-text articles for inclusion using prespecified eligibility criteria. Discrepancies were resolved by consensus. The population was adults with ultrasound imaged nodules with benign cytology with very low (Key Question 1) or low to intermediate suspicion patterns (Key Question 2) and different ultrasound follow-up strategies were compared.
Because we expected evidence directly comparing the outcomes of prespecified ultrasound follow-up strategies to be very limited, we adopted a scoping review approach 7,8 and applied looser inclusion criteria. Specifically, we included studies that were not restricted to patients with very low or low to intermediate suspicion ultrasound patterns (although this was noted as a study limitation) and we included studies that compared alternative follow-up intervals (e.g., >4 years vs. 2 or 3 years). The primary outcome was the likelihood of identifying thyroid cancers; as part of the scoping review approach, we also included data on health outcomes (e.g., mortality rate, metastasis), subsequent procedures (thyroidectomy, repeat biopsy), and nodule growth. We did not restrict inclusion to randomized trials; we also included cohort studies and case–control studies that compared different ultrasound follow-up strategies. Inclusion was restricted to English language studies; studies published only as conference abstracts were excluded.
Data abstraction
We extracted the following data from studies: author, year, country, study period, study design, sample size, duration of follow-up, age, percent female, thyroid hormone status, thyroid nodule characteristics (ultrasound pattern, diameter, and volume), ultrasound surveillance strategies, and outcomes. Data were extracted by one investigator and verified by a second.
Assessing methodological quality of individual studies
The quality (risk of bias) of each study was rated as “good,” “fair,” or “poor” using predefined criteria for studies on diagnostic accuracy adapted from the U.S. Preventive Services Task Force (Appendix A2). 9 Good-quality studies met all the quality criteria and are considered reliable. Poor-quality studies had one or more serious methodological limitations (e.g., biased methods for selecting patients, inaccurate methods for measuring exposures or outcomes, high attrition or missing data, failure to adjust for potential confounders) and results are considered unreliable. Fair-quality studies do not meet the definition for good or poor quality and are considered intermediate in reliability. Scoping reviews do not necessarily perform quality assessment; however, we elected to do so because understanding methodological limitations was important for interpreting the available evidence.
Synthesizing the evidence
The evidence was synthesized qualitatively; meta-analysis was not performed because only one study was included in this scoping review. Scoping reviews often do not assess the strength of evidence. However, because ATA guideline development methods include assessments of the strength of evidence supporting recommendations, we elected to do so. The strength of evidence was assessed using GRADE methods, based on overall quality, consistency, directness, precision, and reporting bias. 10 The strength of evidence was graded “high,” “moderate,” or “low,” indicating the confidence in the findings 11 ; evidence that was too limited to permit conclusions was graded “insufficient.”
Results
Literature searches
Database searches resulted in 188 potentially relevant citations (Fig. 1). After dual review of abstracts and titles, six articles were selected for full-text review. Of these, four studies 12 –15 were excluded because they did not compare outcomes of different ultrasound follow-up intervals, and one study 16 was excluded because it evaluated ultrasound follow-up intervals for atypia of undetermined significance or follicular lesion of undetermined significance. One study evaluated outcomes associated with different ultrasound follow-up intervals in patients with initially benign FNA cytology and was included. 17

Literature flow diagram.
In patients with cytologically benign thyroid nodules and very low suspicion ultrasound patterns, how does the discontinuation of ultrasound surveillance compared with low-frequency ultrasound follow-up (≥2 years) affect the risk of missing clinically significant thyroid cancers?
No study compared discontinuation of ultrasound surveillance versus low-frequency ultrasound follow-up in patients with cytologically benign thyroid nodules and very low suspicion ultrasound patterns.
In patients with cytologically benign thyroid nodules and low to intermediate suspicion ultrasound patterns, how does a delayed ultrasound follow-up visit (e.g., five years) compared with an earlier follow-up (one to two years) affect the risk of missing clinically significant thyroid cancers?
One retrospective cohort study (n = 1254; number of thyroid nodules 1819) compared different intervals with first follow-up ultrasound in patients with a benign thyroid nodule (Table 1). 17 The study was conducted in the United States in patients evaluated between 1999 and 2010. Patients were included if they had a thyroid nodule >1 cm with an initial benign FNA biopsy (reported using Bethesda System or similar [pre-2009] criteria and terminology 18 ), were euthyroid, and underwent first ultrasound at least six months following aspiration. The study did not restrict inclusion to low or intermediate suspicion ultrasound patterns and did not report the proportion of low to intermediate suspicion nodules, but predated the introduction of ACR TIRADS, K-TIRADS, and EU-TIRADS. The mean age at diagnosis was 52.5 years and 89% of patients were female. The median duration of follow-up was 7.7 years (interquartile range 5.5–10.5 years).
Study Evaluating Different Ultrasound Follow-Up Intervals
FNA, fine needle aspiration; IQR, interquartile range; SD, standard deviation.
In the study, recommendations for all patients were to undergo repeat follow-up ultrasound one year following benign nodule diagnosis. Patients were stratified according to duration to first follow-up ultrasound: 0.5 to 1 year, >1 to 2 years, >2 to 3 years, >3 to 4 years, and >4 years. In the >4-year group, the interval to first ultrasound ranged from 4 to 14 years. The study did not control for confounders and was rated poor quality (Table 2). Other methodological limitations included unclear comparability of groups with different ultrasound intervals at baseline, failure to report attrition or missing data, and unclear blinding of outcome assessors or data analysts to ultrasound interval. In addition, analyses were based on the interval to the first follow-up ultrasound (irrespective of subsequent ultrasound intervals), and analyses did not control for variability in follow-up duration.
Quality Assessment
Few patients in the study were diagnosed with malignancy, regardless of the ultrasound follow-up interval (range 0.2–0.8%), and there were no cases of disease-related mortality rate. There was no difference between the >4-year interval to first follow-up ultrasound versus the 1- to 2-year interval in likelihood of malignancy (0.4% [1/223] vs. 0.3% [2/715]). Follow-up ultrasound at >4 years was associated with increased likelihood of 50% nodule growth (35.0% [78/223] vs. 15.1% [108/715]), which could explain the higher likelihood of repeat FNA (19.3% [43/223] vs. 5.6% [40/715]) and thyroidectomy (4.0% [9/223] vs. 0.8% [6/715]). Estimates for intervals at >2 to 3 years and >3 to 4 years were intermediate between >1 to 2 years and >4 years.
Discussion
Evidence on outcomes of different ultrasound surveillance strategies in patients with cytologically benign thyroid nodules with low or intermediate suspicion ultrasound patterns is extremely limited. Even though we utilized a scoping review approach with looser application of inclusion criteria (e.g., inclusion of studies that evaluated different ultrasound follow-up intervals than specified in the key questions or that were not restricted to low or intermediate suspicion ultrasound patterns), only one study 17 addressed the key questions. Although this study was not restricted to patients with low or intermediate ultrasound patterns, it predated the introduction of current thyroid imaging classification systems.
The study found no difference between first follow-up ultrasound at >4 years versus first follow-up ultrasound at 1 to 2 years in the likelihood of malignancy, although >4 years was associated with increased likelihood of nodule growth—potentially explaining an increased number of repeat FNA and thyroidectomy that was also observed with longer follow-up. The number of malignancies was very small (range 0.2–0.8%) at mean follow-up of 7.7 years, regardless of interval to first ultrasound, and no cancer-related deaths were recorded.
The study had important methodological limitations, including not controlling for potential confounders or variability in follow-up duration. In addition, it focused only on the interval to the first follow-up ultrasound, without accounting for subsequent follow-up intervals. Most FNA results were assessed before the introduction of the Bethesda System; however, the criteria and terminology used in the study were similar to the Bethesda System. The certainty of evidence was assessed as very low, due to methodological limitations, imprecision, inconsistency (unable to determine, due to availability of only one study), and indirectness (population not patients with low to intermediate suspicion ultrasound pattern nodules) (Table 3).
Overall Quality of Evidence, Ultrasound Follow-Up Strategies
Graphical and statistical assessment for small-sample effects and potential publication bias was not performed, because only one study was available.
The overall quality of evidence for all clinical outcomes (malignancy, cancer-related deaths, nodule growth, and subsequent procedures) was graded very low.
Downgraded for indirectness because the studies were not restricted to low to intermediate suspicion ultrasound pattern benign nodules and did not report the proportion of patients with low to intermediate suspicion nodules.
Nonetheless, the findings of this study are consistent with prior evidence on the natural history of cytologically benign thyroid nodules that suggested that the incidence of cancer is very low and cancer-related mortality rate rare, 3,12 –15 and may provide some support for extending the follow-up duration for low to intermediate suspicion benign thyroid nodules. To reduce unnecessary repeat FNA and thyroidectomy with longer follow-up intervals, growth thresholds for performing repeat FNA or thyroidectomy should account for the time interval between ultrasound evaluations.
Additional research is needed to clarify optimal ultrasound follow-up intervals for cytologically benign thyroid nodules. Studies should clearly describe ultrasound patterns and ideally evaluate patients with low and intermediate suspicion ultrasound patterns separately from high suspicion ultrasound patterns. Although randomized trials comparing different ultrasound intervals would be ideal, observational studies would also be informative. Such studies should control for potential confounders such as age, sex, presence of symptoms, and nodule characteristics (including tumor size), and control for variability in follow-up duration. Studies should also describe other factors that could impact interpretation of findings, including whether nodules are solitary, are multinodular, details regarding FNA biopsy (e.g., ultrasound-guided only or also palpation-guided), or the presence of other factors that might warrant follow-up (substernal extension and progressive growth or underlying risk factors for malignancy).
Studies should account for the interval to the first follow-up ultrasound as well as the subsequent ultrasound evaluations. Very large studies would be needed for sufficient power to evaluate outcomes such as malignancy or cancer-related death; however, smaller studies could also provide additional information on outcomes such as repeat FNA biopsy and thyroidectomy. Studies should also measure outcomes related to quality of life or anxiety, which could be impacted by the intensity of ultrasound follow-up, and measure harms or complications related to procedures performed during follow-up.
No study compared outcomes of discontinuation of follow-up ultrasound in patients with low or very low suspicion ultrasound patterns versus low-frequency (≥2 year interval) follow-up. Considerations for future research are similar to those described for studies comparing different follow-up intervals. In addition, studies evaluating discontinuation of ultrasound should report the number of negative follow-up ultrasounds obtained before discontinuation.
In conclusion, this scoping review found that evidence comparing different ultrasound follow-up intervals in patients with benign thyroid nodules is limited to one observational study, but suggests that malignancies are rare regardless of follow-up interval. Longer follow-up may be associated with more repeat FNA biopsy and thyroidectomy, which could be related to more interval nodule growth that meets thresholds for further evaluation. Research is needed to clarify optimal ultrasound follow-up intervals for low or intermediate suspicion benign thyroid nodules and outcomes of discontinuing ultrasound follow-up for very low suspicion nodules.
Footnotes
Authors' Contributions
All authors conceived the study. R.C. designed the study and R.C. and T.D. carried out the review. R.C. prepared the first draft of the article. R.C., T.D., S.E.M., E.S.C., C.D., C.C.S., S.J.M., and L.A.O. were involved in the revision of the draft article and have agreed to the final content.
Author Disclosure Statement
R.C., T.D., S.E.M., E.S.C., C.D., C.C.S., S.J.M., and L.A.O. reported no conflicts of interest. C.D. is an Associate Editor at Thyroid, but he had no role in the review of this article and was blinded to the peer review process.
Funding Information
The American Thyroid Association provided funding to Roger Chou through a consulting agreement to support the development of the guideline. The funder was not involved in the collection or analysis of data or reporting of results.
