Abstract

KEY POINTS
This study compared the diagnostic performance of five major thyroid ultrasound risk-stratification systems (ATA, EU-TIRADS, K-TIRADS, ACR-TIRADS, and C-TIRADS) in 3774 thyroid nodules.
The ATA and EU-TIRADS systems demonstrated the highest sensitivity for thyroid malignancy detection, at the cost of higher unnecessary biopsy rates.
ACR-TIRADS had the lowest unnecessary biopsy rate but had lower sensitivity for malignancy detection.
K-TIRADS recommended fewer biopsies for small nodules and more biopsies for large nodules, while C-TIRADS performed more biopsies than ACR-TIRADS without improving sensitivity for malignancy detection.
Differences between current ultrasound risk-stratification systems were mainly driven by biopsy size thresholds for small nodules and by criteria for avoiding biopsy in large nodules.
SUMMARY
Background
Multiple ultrasound risk-stratification systems (RSSs) is used to guide biopsy decisions for thyroid nodules, but important differences exist in classification criteria, biopsy thresholds, and diagnostic performance. This study 1 compared five major RSSs using a standardized international ultrasound lexicon.
Methods
This retrospective study included 3774 thyroid nodules >1 cm from 3143 patients who underwent ultrasound-guided biopsy between 2017 and 2024. Nodules were classified according to the ATA system, EU-TIRADS, K-TIRADS, ACR-TIRADS, and C-TIRADS.2–6 Diagnostic performance and unnecessary biopsy rates were evaluated according to nodule size.
Results
Among 3774 nodules, 721 (19.1%) were malignant or low-risk neoplasms. The distribution of nodules across risk categories differed substantially between systems, with marked variability in nodule classification. The ATA system and EU-TIRADS demonstrated the highest sensitivity for malignancy detection, but also the highest unnecessary biopsy rates. ACR-TIRADS demonstrated the lowest unnecessary biopsy rate but with lower sensitivity. K-TIRADS recommended fewer biopsies for small nodules and more biopsies for large nodules, while C-TIRADS performed more biopsies than ACR-TIRADS without improving sensitivity. For large nodules (>2 cm), ATA, EU-TIRADS, and K-TIRADS demonstrated very high sensitivity, whereas ACR-TIRADS and C-TIRADS classified more malignant nodules as not requiring biopsy.
Conclusions
The five ultrasound RSSs demonstrated substantial differences in thyroid nodule classification and diagnostic performance. Differences between systems were mainly driven by biopsy size thresholds for small nodules and by criteria for avoiding biopsy in large nodules.
COMMENTARY
The increasing number of thyroid ultrasound risk-stratification systems has made daily clinical practice more complicated. Many clinicians use more than one system, including ATA, EU-TIRADS, K-TIRADS, ACR-TIRADS, and C-TIRADS. Although these systems are based on similar ultrasound features, they differ in biopsy thresholds and in how they classify low-risk nodules. As shown in this study, these differences can substantially affect both biopsy recommendations and cancer detection.
The different systems also reflect different approaches to thyroid cancer diagnosis. ACR-TIRADS was designed to reduce unnecessary biopsies and overdiagnosis, particularly for small low-risk nodules, and therefore uses more restrictive biopsy criteria.4,7 In contrast, ATA and EU-TIRADS favor higher sensitivity and lower risk of missing malignancy, at the cost of substantially more biopsies. K-TIRADS showed an intermediate pattern, recommending fewer biopsies for small nodules but more biopsies for larger nodules.
These issues have become more important with the increasing use of active surveillance for small papillary thyroid carcinomas. Avoiding unnecessary biopsy may reduce patient anxiety, repeat imaging, surgery, and lifelong thyroid hormone treatment. However, reducing biopsy rates might increase the risk of delayed diagnosis, especially in larger nodules.
There is also a financial aspect to the differences between systems. More biopsies lead to more repeat FNAs, molecular testing, specialist consultations, surgeries, and follow-ups. The use of molecular markers may also be influenced by biopsy thresholds, as more biopsies would lead to more indeterminate cytologies. Therefore, the choice of TIRADS system may affect health care utilization and costs.
An international working group recently proposed a standardized ultrasound lexicon and is currently working on a unified International TIRADS (I-TIRADS). 8 At the same time, the upcoming ATA guidelines are expected to introduce a more flexible approach to biopsy decisions, with FNA size ranges rather than strict cutoffs for some risk categories. This may allow greater incorporation of patient preference and physician approach into decision-making.
Until international consensus is achieved, hospitals, health care networks, and national societies should ideally standardize which RSS is used within their systems. The use of different systems may lead to different recommendations regarding biopsy or follow-up for individual patients, potentially creating confusion for both patients and clinicians. Greater standardization and continued international collaboration may help improve consistency and quality in the management of thyroid nodules.
