Abstract
Purpose:
The aim of this study was to compare the diagnostic performance of ultrasound (US)-based risk-stratification systems for thyroid nodules in the 2015 American Thyroid Association (ATA) guidelines with those of the 2016 Korean Thyroid Association (KTA)/Korean Society of Thyroid Radiology (KSThR) and 2017 American College of Radiology (ACR) guidelines.
Methods:
From June 2013 to May 2015, a total of 902 consecutive thyroid nodules were enrolled in four institutions, and their US features were retrospectively reviewed and classified using the categories defined by the three guidelines. The malignancy risk of each category, as defined by all three risk-stratification systems, was calculated, and the diagnostic performance of the fine-needle aspiration (FNA) indications of the ATA guidelines were compared to those of the KTA/KSThR and ACR guidelines.
Results:
Of all nodules, 636 (70.5%) were benign and 266 (29.5%) malignant. The calculated malignancy risks for ATA categories 5, 4, 3, 2, and 1 nodule(s) were 71.7, 21.5, 2.6, 3.8, and 0%. Of all nodules, 7.6% (69/902) did not meet the ATA pattern criteria, but the malignancy risk was calculated to be 10.1% (7/69). The ATA guidelines afforded significantly higher diagnostic sensitivity (95.0%) than the ACR guidelines (80.2%; p = 0.001) but a lower specificity (38.1 vs. 68.9%; p < 0.001). On the other hand, the ATA guidelines exhibited a lower diagnostic sensitivity than the KTA/KSThR guidelines (100.0%; p = 0.07) but a higher specificity (28.2%; p < 0.001). The unnecessary FNA rate was the lowest when the ACR guidelines were used (25.8%), followed by the ATA (51.2%) and KTA/KSThR (59.4%) guidelines.
Conclusion:
The 2015 ATA guidelines afford relatively moderate sensitivity and an unnecessary FNA rate for thyroid cancer detection compared to the 2016 KTA/KSThR and 2017 ACR guidelines. US practitioners require a deep understanding of the benefits and risks of the US-based FNA criteria of different guidelines and potential impact on the diagnosis of low-risk thyroid cancers.
Introduction
U
Of the various guidelines, the American Thyroid Association (ATA) guidelines are widely used in practice. The revised 2015 ATA guidelines suggest a pattern-based approach for risk stratification and give the estimated risks of malignancy for each category (pattern) (1,4). Although the guidelines are effective in terms of risk stratification of thyroid nodules, the relatively complex US features of each category and the existence of a “not specified” pattern with an 18.2–19.0% malignancy risk are accepted limitations (6,7). The Korean Thyroid Association (KTA)/Korean Society of Thyroid Radiology (KSThR) proposed a simpler pattern-based approach in 2016 (3,4). This features a combination of solidity, echogenicity, and suspicious US features, and can be easily substituted for the ATA guidelines. The American College of Radiology (ACR) guidelines are a new risk-stratification system published in 2017. This system assigns points for all US features of a nodule, and the total score determines the nodule ACR Thyroid Imaging, Reporting, and Data System (TI-RADS) level (2,4). However, although the three international guidelines are widely used in different countries, few studies have compared their diagnostic performances (7). In addition, to the authors' knowledge, no study has compared the diagnostic performance of the guidelines for nodules <1 cm that are highly suspicious on US. This information would allow physicians to communicate more effectively with patients in the shared decision-making process for active surveillance.
The purpose of this study was to compare the diagnostic performance of US-based risk-stratification systems for thyroid nodules in the 2015 ATA guidelines with those of the 2016 KTA/KSThR and 2017 ACR guidelines.
Methods
This multicenter study was approved by the Institutional Review Boards of the four participating centers. Informed consent was waived for this retrospective review.
Study population
Patient data were retrospectively collected from four different hospitals (one primary medical center and three tertiary hospitals). From June 2013 to May 2015, a total of 1109 thyroid nodules (>5 mm in diameter) in 928 consecutive patients who had undergone thyroid US were initially enrolled. Of the 1109 nodules, 198 nodules of 169 patients were excluded because final diagnoses were not available, or the US characteristics could not be analyzed in nine nodules that were completely calcified. Therefore, a total of 902 nodules in 750 patients were finally included (594 females and 156 males; M age = 49.2 years; range 9–81 years). The data set was obtained from the KSThR database (SOMARTUS), which was used to validate the K-TI-RADS. The patients/nodules evaluated here include those discussed in a prior publication (8).
Final diagnoses were determined by the cytopathologic results of the Bethesda system, and surgical findings. The cytological results of non-diagnostic lesions and lesions of indeterminate significance (atypia of undetermined significance or only a suspicion of follicular neoplasm and malignancy) were excluded.
US examination and image analysis
All US examinations were performed using a 10–16 MHz linear probe and real-time US systems (EUB-7500, Hitachi Medical Systems, Tokyo, Japan; iU22 and HDI-5000, Philips Healthcare, Bothell, WA; Aplio SSA-770A, Toshiba Medical Systems Corp., Otawara-shi, Japan; Accuvix XG, Samsung Medison, Seoul, Korea). Five board-certified radiologists (D.G.N., W.J.M., Y.H.L., N.C., and E.J.H.) in the four different hospitals who specialized in thyroid imaging (with 8–20 years of clinical experience with thyroid US) performed all examinations. Each radiologist assessed the US features of all thyroid nodules for their own patients, blinded to FNA data and the final diagnoses.
Before starting the data collection, training sessions were held to establish a baseline consensus on the evaluation of US features by the five radiologists. They evaluated images of 50 masses not included in this study and assessed the US features during a consensus meeting. Internal nodule contents were categorized as solid (uniformly solid or nearly so), predominantly solid (<50% cystic), predominantly cystic (>50% cystic), and cystic (no obvious solid content). In partially cystic nodules, the configuration of the solid areas was categorized as concentric or eccentric. Nodule echogenicity was categorized as markedly hypoechogenic, hypoechogenic, isoechogenic, and hyperechogenic by reference to the predominant echogenicity compared to that of normal thyroid tissue and the anterior strap muscles. Calcification was classified as microcalcification, macrocalcification, or rim calcification. In rim-calcified nodules, the presence or absence of an extrusive soft-tissue component was noted to allow analysis using the ATA guidelines. Nodules with disrupted rim calcifications and small protrusions of soft tissue around the calcifications were considered positive for this feature. A spongiform appearance was defined as an isoechoic pattern with microcystic changes in >50% of the nodular volume. The comet-tail artifact was defined as intracystic echogenic foci accompanied by reverberation artifacts. Extrathyroidal extension status was not evaluated in this study because of limitations imposed by the retrospective design and the absence of standardized US criteria for diagnosis of such extensions.
Statistical analyses
All nodules were retrospectively classified using the US criteria of three international guidelines (Supplementary Tables S1 and S2; Supplementary Data are available online at
All nodules were dichotomized into two groups: those for which US-guided FNA was or was not indicated by the FNA criteria (i.e., sonographic features and nodule size; Supplementary Tables S1 and S2). The diagnostic performances of all criteria in terms of thyroid cancer were evaluated. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy of each set of guidelines were calculated, and these values were compared using the McNemar test. The potentially unnecessary FNA rate for the diagnosis of thyroid cancer was defined as the number of benign nodules among FNA-required nodules in the entire nodules, and calculated as percentages. The results were compared using the McNemar test.
All statistical analyses were performed with IBM SPSS Statistics for Windows v23 for Windows (IBM Corp., Armonk, NY) and R v3.4.1 for Windows (R Development Core Team, Vienna, Austria). A p-value of <0.05 was taken to indicate a significant difference.
Results
The mean nodule diameter was 1.5 ± 1.1 cm. The final diagnoses of 902 nodules were 636 (70.5%) benign and 266 (29.5%) malignant nodules. Final diagnoses were determined via surgical resection in 191 (71.8%) of 266 malignant nodules, which featured 186 papillary thyroid carcinomas (PTCs), including 24 of the follicular variant and five follicular carcinomas. Thirty-six benign nodules confirmed by surgery included 31 nodular hyperplasias, four follicular adenomas, and one thyroiditis. All 75 malignant nodules diagnosed via FNA or core-needle biopsy were PTCs.
Frequencies and malignancy risks by guideline categories
Table 1 and Supplementary Table S3 list the frequencies and malignancy risks of all categories of the three guidelines. For the 2015 ATA guidelines, all categories were generally within the ranges of the suggested malignancy risks, except the “very low suspicion” category. The malignancy risk of this category was slightly higher (3.8%; 6/157) than the suggested risk (<3%). Of all nodules, 7.6% (69/902) did not meet the criteria for any pattern, but the malignancy risk was 10.1% (7/69).
Frequencies and Malignancy Risks of Tumors of the Various Categories of the Three Guidelines
Data in parentheses are percentages; data in square brackets are confidence intervals.
ACR, American College of Radiology; ATA, American Thyroid Association; KTA/KSThR, Korean Thyroid Association/ Korean Society of Thyroid Radiology.
In terms of the 2017 ACR guidelines, nodules in the “not suspicious” category had a higher malignancy risk (3.7%; 4/109) than the suggested risk (<2%). Of a total of 109 nodules in the “not suspicious” category, 5 and 104, respectively, were matched to the “low suspicion” and “very low suspicion” categories of the ATA guidelines, and 91 and 18, respectively, to the “low suspicion” and “benign” categories of the KTA/KSThR guidelines. All predicted malignancy probabilities tended to rise through the higher risk categories of all guidelines (p < 0.001).
Diagnostic performance in terms of detecting thyroid cancer
Table 2 lists the diagnostic performances of FNA indications for prediction of thyroid malignancies in nodules ≥1 cm in diameter. The ACR guidelines afforded the lowest sensitivity and NPV (80.2% and 94.4%, respectively), followed by (in increasing order) the ATA (95.0% and 97.4%, respectively) and the KTA/KSThR (100.0% and 100.0%, respectively) guidelines. When nodules with an unclassified pattern in the ATA guidelines were considered as an intermediate suspicion group, the sensitivity of the ATA guidelines increased to 99.0%. The ATA guidelines afforded a significantly higher diagnostic sensitivity than the ACR guidelines (p = 0.001) but a lower sensitivity than the KTA/KSThR guidelines (p = 0.07).
Comparison of the Diagnostic Performances of Various Guidelines in Terms of Detecting Malignant Thyroid Nodules (≥1 cm)
PPV, positive predictive value; NPV, negative predictive value.
Table 3 lists the diagnostic performances of FNA indications for nodules with highly suspicious US features in terms of predicting thyroid microcarcinomas. The PPV and accuracy of the ATA guidelines (74.1% and 77.2%, respectively) were similar to those of the ACR (73.5% and 75.3%, respectively) and KTA/KSThR (74.9% and 77.8%, respectively) guidelines.
Comparison of the Diagnostic Performances of Various Guidelines in Terms of Detecting Malignant Thyroid Nodules (<1 cm)
Unnecessary FNA rates
Table 4 compares the unnecessary FNA rates. Of the three guidelines, the rate was the lowest when the ACR guidelines were adopted (25.8%), followed by the ATA (51.2%) and KTA/KSThR guidelines (59.4%). The ATA guidelines were associated with higher unnecessary FNA rates than the ACR guidelines but lower rates than the KTA/KSThR guidelines (p < 0.001).
Comparison of Unnecessary FNA Rates for the Diagnosis of Thyroid Cancer (≥ 1cm)
Data in parentheses are percentages.
FNA, fine-needle aspiration.
Discussion
The diagnostic performance of the 2015 ATA guidelines was compared to those of the 2016 KTA/KSThR and 2017 ACR guidelines. The ATA guidelines afforded a relatively moderate diagnostic sensitivity (95.0%), specificity (38.1%), and unnecessary FNA rate (51.2%) compared to the other guidelines. Of all nodules, 7.6% did not meet the criteria for any pattern of the ATA guidelines, but the malignancy risk was 10.1%. When an unclassified pattern was considered to arouse intermediate suspicion, the diagnostic sensitivity increased to 99.0%, with 28.0% specificity and a 59.6% unnecessary FNA rate.
US-based risk-stratification systems play essential roles when considering whether thyroid nodules should be subjected to FNA (1 –4). However, as the US features of each category and the size cutoff for FNA are not consistent among the guidelines, a deep understanding of the benefits and risks of the US-based FNA criteria of different guidelines is required for US practitioners to optimize patient management (7). In the time since 2015, various international societies have revised their guidelines. Thus, the most recent risk-stratification systems were compared in the present study. First, the malignancy risks were validated and compared by the categories of all three guidelines. It was a multicenter geographic external validation in terms of the malignant risks in each category. These were generally well matched in terms of suggested malignancy risk range. However, the malignancy risk of the “very low suspicion” category of the 2015 ATA guidelines exhibited a slightly higher malignancy risk (3.8%; 6/157) than suggested. As previously found by Yoon et al. and Ha et al. (6,7), 7.6% of nodules did not meet the criteria of any ATA pattern (i.e., isoechoic nodules with irregular margins, microcalcifications, and a taller-than-wide shape), and their malignancy risk was calculated to be 10.1% in the present study. Therefore, such nodules should be interpreted as being of intermediate suspicion in terms of malignancy risk. In the 2017 ACR guidelines, the malignancy risk of the “not suspicious” category (i.e., partially cystic and isohyperechoic nodules without calcifications, taller-than-wide, lobulated or irregular margins) was higher (3.7%) than the suggested malignancy risk, as reported earlier by Ha et al., who found that the actual risk of malignancy was 4.4% (7). The mean diameter of malignant tumors showing these US patterns was 27.3 mm (range 17–34 mm) in this study. As the US patterns of such nodules are not highly suggestive of benign status (i.e., the nodules are not simple cysts or spongiform in appearance), it is considered that FNA may be selectively indicated for such nodules if they are large.
Regarding the diagnostic performance of the three guidelines, significant differences were found between the ACR and the ATA and KTA/KSThR guidelines. The ACR guidelines were significantly less sensitive (80.2%) than the other two guidelines (95.0–100.0%). When the ATA “not specified” pattern was considered to arouse an intermediate level of suspicion, the sensitivities of the ATA and KTA/KSThR guidelines increased to 99.0–100.0%. On the other hand, the ACR guidelines afforded a significantly higher specificity (68.9%) than the other guidelines (28.2–38.1%). It is probably because of differences in the management guidelines for thyroid nodules 1.0–2.5 cm in diameter without highly suspicious US patterns. The ACR guidelines propose higher cutoffs for mildly (1.5 cm) and moderately (2.5 cm) suspicious nodules than do the ATA and KTA/KSThR guidelines (1.0 and 1.5 cm, respectively). The clinical significance of delayed diagnosis of thyroid cancers that meet these US criteria is not well known. Although Nam et al. reported that PTCs that did not meet the malignant US criteria exhibited better prognosis than PTCs that did meet the criteria, further studies are required to confirm that patients with PTCs without suspicious US features have tumors of lower stage and good outcomes in terms of the absence of distant metastasis or recurrence (9). Molecular testing may be a useful prognostic marker for thyroid cancer. However, it requires biopsy of a nodule.
Turning to the unnecessary FNA rates, the KTA/KSThR guidelines were associated with the highest such rate, and the ACR guidelines were associated with the lowest rate. The sensitivity of the KTA/KSThR guidelines for nodules >1 cm was 100.0%, higher than previously reported, in the present study (8). It is probably because of the higher prevalence of PTC, especially the classic type, in the study population. The low size criterion of low suspicion nodules for FNA may be a reasonable cause for the highest sensitivity of the KTA/KSThR guidelines. When size criterion of low suspicion nodules for FNA was increased from 1.5 to 2.0 cm in the KTA/KSThR guidelines, the diagnostic specificity increased from 28.2% to 44.1%, the unnecessary FNA rate decreased from 59.4% to 46.2%, and the sensitivity from 100.0% to 97.0%. It is potentially important to analyze the actual impact of these differences among the guidelines on the diagnosis of thyroid cancers that will benefit from surgery or possibly further treatment in the future. Recently, the management paradigm for thyroid nodules has become more conservative, thus minimizing FNA. The current guidelines suggest active surveillance of subcentimeter nodules that are highly suspicious on US, rather than US-guided FNA (1 –3). This is based on the fact that active surveillance, instead of immediate surgery, can be chosen for adult patients with low-risk papillary thyroid microcarcinomas (10,11). However, controversy exists around whether to perform FNA on subcentimeter nodules that exhibit highly suspicious US features because some patients with benign nodules mimicking the US features of malignancy may undergo unnecessary active surveillance. A recent study reported that the risk of malignancy ranged from 77.4% to 82.8% among suspicious thyroid nodules <1 cm in diameter on US (12). This study found that the FNA criteria of the ACR, ATA, and KTA/KSThR guidelines afforded similar diagnostic PPVs and accuracies for nodules with suspicious US features (approximately 74% and 77%, respectively). Knowledge of the diagnostic performance of US-based FNA criteria for nodules <1 cm that are highly suspicious on US would allow physicians to communicate more effectively with patients in the shared decision-making process.
This study had several limitations. First, only thyroid nodules that had undergone US-guided FNA were included, which was usually performed when suspicious US features were noted or on the largest nodule when no suspicious feature was detected. Therefore, selection bias may be in play. Second, the malignancy rate was high (29.5%), probably because three of the participating institutions were tertiary referral centers dealing with patients with more serious disease. The diagnostic performances of the guidelines may have been affected by this high malignancy rate. A high percentage of PTCs (97.4%) with a relatively low percentage of follicular carcinomas (2.6%) may also increase the diagnostic performances of the three guidelines. Third, the final diagnoses of benign nodules were based on both cytopathologic data and surgical histology. This may cause false-negative and false-positive results. The cytological results of indeterminate significance were excluded without surgical confirmation. This may have influenced a small proportion of follicular thyroid cancers or follicular variant PTCs. Fourth, extrathyroidal extension status was not evaluated because of limitations by the retrospective design, and completely calcified nodules were excluded in this study. They may affect the diagnostic performances of the ATA and ACR guidelines. Fifth, the pathological classification criteria of “noninvasive follicular thyroid neoplasm with papillary-like nuclear features” were not defined at the time of data collection and were not evaluated in this study. Sixth, the study did not evaluate the variation in interpretation among the readers. Seventh, the indications for FNA made on criteria in this study have differed from the recent guidelines. Since FNA was generally indicated for all nodules ≥1 cm, except for typical benign nodules, it is considered that the results may be more reliable to evaluate the diagnostic performance and unnecessary FNA rates.
In conclusion, the 2015 ATA guidelines afford a relatively moderate sensitivity and unnecessary FNA rates for detecting thyroid cancer compared to the 2016 KTA/KSThR and 2017 ACR guidelines. US practitioners require a deep understanding of the benefits and risks of the US-based FNA criteria of different guidelines and potential impact on the diagnosis of low risk thyroid cancers.
Footnotes
Author Disclosure Statement
There is no conflict of interest to be disclosed for any of the authors regarding this manuscript. No source of funding was received for this work.
