Abstract
PURPOSE:
The aim of this study was to evaluate the diagnostic value of conventional sonography and ultrasound elastography for thyroid nodules of each Bethesda category and to analyze their potential role in the corresponding management decision.
METHODS:
This retrospective study included 557 thyroid nodules diagnosed by conventional ultrasound (US) and real-time ultrasound elastography (RTE) before fine-needle aspiration (FNA) from 458 patients. The US, RTE, and cytological results were collected and analyzed according to different Bethesda categories. Differences in the distribution of sonographic features between groups were evaluated by the Chi-square test or Fisher exact test. The sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of conventional US and RTE for the diagnosis of malignant nodules in each category were then calculated and analyzed.
RESULTS:
The diagnostic accuracy of the comprehensive US diagnosis in all categories stayed at a relatively high level from 78.4% to 88.6%, and good specificities ranging from 77.3% to 100% were revealed in all Bethesda categories. As to RTE, the diagnostic accuracies in categories I–IV stayed at a relatively low level ranging from 44.6% to 65.6% except for better performance in categories V and VI (85.2%, 89.1%). Besides, the accuracies of comprehensive US (85.2%, 88.6%) and RTE (85.2%, 89.1%) is relatively low compared to the corresponding cytological diagnoses in categories V and VI.
CONCLUSIONS:
Conventional US is complementary to FNA for providing additional hints especially in categories I–IV for further clinical managements, while RTE failed to provide useful diagnostic information in general.
Keywords
Introduction
In 1909, Theodor Kocher was the first physician to be awarded the Nobel Prize for his excellent contributions in the fields of thyroid diseases [1]. Up to now, Thyroid nodule has become such a common thyroid disease that could be detected in nearly 19% to 68% of the general population with the widespread application of high-resolution ultrasound [2]. Though less than 5% of thyroid nodules are malignant [3], the incidence of thyroid cancer has grown over the past several decades and is predicted to continue to increase between 50% and 60% by 2020 [4, 5]. As is known to all, most thyroid tumors have good prognosis if early diagnosis and timely treatment could be acquired.
The cytological examination by fine-needle aspiration (FNA) has become a reliable and accurate tool to diagnose thyroid cancers. Cytopathologic results are currently used to help clinicians make more appropriate decision of whether to surgery or follow-up for the thyroid nodules. However, cytopathologic diagnoses derived from US-guided FNA are not always helpful: indeterminate results are not rare in routine clinical practice while false positive or negative results are also inevitable, and they all might make the clinicians feel confused about further decision-making. So far, how to interpret the cytological results of each category more precisely remains controversial [6–13].
Conventional ultrasound (US) is the main imaging modality in the assessment of thyroid nodules: in spite of the sophisticated sonographic features, hypoechogenicity, irregular margin, taller-than-wide shape, and microcalcification are associated with increased risk for thyroid cancer [14–16]. Additionally, real-time ultrasound elastography (RTE), which evaluates tissue strain in response to external compression [17], has been applied successfully for the malignancy detection of thyroid nodule with high sensitivity and specificity, and malignant lesions were usually present as hard in the elastography mode in most relevant studies [18–21].
Actually, there have been several studies about the application of elastography for the detection of thyroid nodules [22–26], some of which indicated a good utility in refining the diagnostic process of cytologically indeterminate or non-diagnostic thyroid nodules [24–26]. Also, previous studies have suggested that conventional US has an important role in the management of thyroid nodules after FNA [27]. However, it is still unclear what kind of complementary role could be provided by conventional US or elastography in the management of thyroid nodules at each category of the Bethesda System for Reporting Thyroid Cytopathology (BSRTC). It is also uncertain whether these two methods could always provide helpful information for all the cytological results. Therefore, the aim of the present study was to evaluate the value of conventional US and RTE in decision-making for thyroid nodules from each category of BSRTC.
Materials and methods
Patients
Between January 2011 and January 2018, 579 consecutive patients with 724 suspicious thyroid nodules underwent conventional US and RTE before US-guided FNA in our institution. Suspicious nodules were included for at least one of the following sonographic features: hypoechogenicity, irregular margin, taller-than-wide shape, and microcalcifications [28]. Of these nodules, 167 were excluded for the following reasons: (I) lack of follow-up US examinations after US-FNA; (II) US or RTE images were unsatisfactory or incomplete; (III) nodules were less than 5 mm in size measuring; (IV) some were lost to follow-up. In the end, 458 patients with 557 nodules were recruited for the study. The mean age was 46.6 y (range, 21–82 y), and 346 patients were female while 112 were male. The mean US-measured size of the 557 nodules that had undergone US-FNA was 11.2 mm (range, 5.0–49 mm). In addition, FNA was performed under real-time ultrasound guidance on all the patients, all of whom had signed the written informed consent before FNA after full explanation.
Conventional US
All thyroid US and RTE imaging examinations were performed with Hitachi HV-900 or Avius (Hitachi Medical, Tokyo, Japan) US scanner equipped with a 7.5–13.0 MHz linear-array transducer. All patients were examined by the same experienced radiologist who was blinded to the final diagnosis. For each patient, the number of target nodules could be one or more, and they were observed in both transverse and longitudinal planes. The following sonographic features were recorded for each nodule: echogenicity (hyperechoic/isoechoic/hypoechoic, compared with the surrounding normal thyroid parenchyma), margin (regular/irregular), taller-than-wide shape (absent/present), microcalcifications (absent/present, calcifications that were less than 1 mm in diameter with or without acoustic shadows). Suspicious sonographic features were the same as the aforementioned (hypoechogenicity, irregular margin, taller-than-wide shape, and microcalcification). Besides, if the nodule has 3 or more than 3 above suspicious features, it would be assessed as malignant in the final comprehensive US diagnosis.
RTE
RTE examinations was then performed by the same radiologist with the same transducer after conventional US. The probe was used to compress the tissue in a longitudinal view slightly with a pressure value of 3-4 on the numeric pressure scale displayed on the screen. Elastogram was displayed over the B-mode image on a color scale, in which the blue color indicates hard tissue, red indicates soft tissue, and green indicates intermediate hardness. Elastography score (ES) was assessed for each nodule similar to the four-point scoring system previously described by Scacchi et al: ES1: even green displayed in the whole lesion; ES2: green occupied at least 50% of the nodule; ES3: blue occupied more than 50% of the lesion; ES4: nodule in homogenously blue [29]. The diagnosis of malignancy was assigned to nodules with ES 3 or 4, and a benign diagnosis was assigned to nodules with ES 1 or 2. Each appropriate image of RTE was chosen and stored in the hardware.
US-guided FNA
US-guided FNA was performed by another skilled radiologist who specialized in the FNA of thyroid nodules for over 5 years at our institution. Each lesion was aspirated more than twice with a 25-gauge needle attached to a 5 mL plastic syringe. And an experienced cytopathologist was also present to accomplish preliminary assessment for the acquirement of abundant specimen in subsequently cytological diagnosis. In our study, all cases were reported using the six-tiered Bethesda System for Reporting Thyroid Cytopathology: (I) nondiagnostic or unsatisfactory, (II) benign, (III) atypia of undetermined significance (AUS)/follicular lesion of undetermined significance (FLUS), (IV) follicular neoplasm or suspicious for a follicular neoplasm, (V) suspicious for malignancy, and (VI) malignant [6]. In the present study, categories I, III, and IV were considered as indeterminate cytologic results which failed to provide definitively benign or malignant diagnoses.
Reference standard
According to the recommendation from American Thyroid Association (ATA) guideline [2], the reference standard for each lesion in our study was set as follows: The final diagnosis of malignancy was determined by histopathology of a surgical specimen. Lesions were considered benign if they were pathologically confirmed as benign by thyroidectomy or underwent US follow-up for at least 18 months (range: 18–74 months) without obvious change after an initial benign cytologic result.
Statistical analysis
Statistical analysis was performed by using a statistical package (SPSS 19.0, Chicago, IL, USA). Differences in the distribution of different sonographic features and ES (tissue hardness) between groups were evaluated by the Chi-square test or Fisher exact test. The sensitivity (Se), specificity (Sp), positive predictive value (PPV), negative predictive value (NPV), and accuracy of conventional US and RTE for the diagnosis of malignant nodules in each BSRTC category were calculated according to cross-tables. In all cases, two-tailed P values lower than 0.05 were considered to indicate statistical significance.
Results
Among all 557 thyroid nodules, 271 were benign, and 286 were malignant. Of the 286 malignant nodules confirmed by pathological results, there were 277 papillary carcinomas (including 267 classic and 10 follicular variants), 5 follicular carcinomas, 2 lymphomas, 1 medullary carcinomas, and 1 anaplastic carcinoma. While among the 271 benign nodules, 108 nodules were confirmed by surgery because of concomitant malignant lesions or the patients’ preferences for operation. There were 80 nodular goiters and 28 follicular adenomas in total. The remaining 163 nodules were confirmed as benign by no obvious change on sonographic features during the US follow-up of over 18 months after initial benign cytology.
The detailed distribution of the BSRTC category was listed below: 65 (11.7%) in Category I, 148 (26.6%) in Category II, 32 (5.7%) in Category III, 74 (13.3%) in Category IV, 54 (9.7%) in Category V and 184 (33.0%) in Category VI. The final results in each Bethesda category were shown in Table 1. The malignant rates confirmed by pathologic analysis in each category were as follows: Category I: 13.8% (9/65), Category II: 11.5% (17/148), Category III: 31.3% (10/32), Category IV: 21.6% (16/74), Category V: 94.4% (51/54), and Category VI: 99.5% (183/184).
Final results of each Bethesda category
Final results of each Bethesda category
I: Nondiagnostic; II: Benign; III: Atypia of undetermined significance (AUS)/Follicular lesion of undetermined significance (FLUS); IV: Follicular neoplasm or suspicious for a follicular neoplasm; V: Suspicious for malignancy; VI: Malignant.
Among analyzed sonographic features in this study, both irregular margin and taller-than-wide shape exhibited significant difference between benign and malignant nodules in categories I, II, and IV (P < 0.05). While irregular margin still keep significant difference in categories III and V, microcalcification were only significantly different in Category IV. Besides, hypoechogenicity was the only significant feature present in Category VI (Table 2). Given that nodules with more than 3 suspicious features were classified as malignant in final comprehensive US diagnosis, we calculated corresponding Se, Sp, PPV, NPV, and accuracy of significant sonographic features, the comprehensive US diagnosis and RTE (Tables 3, 4).
Distribution of analyzed sonographic features in each Bethesda category
Distribution of analyzed sonographic features in each Bethesda category
Diagnostic performance of significant sonographic features in each Bethesda category
Se sensitivity, Sp specificity, PPV positive predictive value, NPV negative predictive value. I: Nondiagnostic; II: Benign; III: Atypia of undetermined significance (AUS)/Follicular lesion of undetermined significance (FLUS); IV: Follicular neoplasm or suspicious for a follicular neoplasm; V: Suspicious for malignancy; VI: Malignant.
Diagnostic performance of different US methods in each Bethesda category
Se sensitivity, Sp specificity, PPV positive predictive value, NPV negative predictive value. US ultrasound, RTE real-time ultrasound elastography I: Nondiagnostic; II: Benign; III: Atypia of undetermined significance (AUS)/Follicular lesion of undetermined significance (FLUS); IV: Follicular neoplasm or suspicious for a follicular neoplasm; V: Suspicious for malignancy; VI: Malignant.
The detailed diagnostic performance in each category was as follows: In Category I, irregular margin showed perfect sensitivity of 100% but relatively unsatisfactory specificity of 69.6% and accuracy of 73.8%. while perfect specificity of 100% was present in taller-than-wide shape with excellent accuracy of 92.3% but a rather low sensitivity of 44.4%. Similar performance of taller-than-wide shape was also found in categories II and IV with the accuracy of 89.9% and 83.8%. The sensitivities of irregular margin stayed high in categories II, III, and V (range, 90%–100%), but were relatively low in Category IV (68.8%). Conversely, its corresponding specificity was up to 100% in Category V, but relatively low in categories II, III, and IV (74.0%, 54.5%, and 63.8%). Besides, microcalcification had a moderate performance in Category IV with 50.0% sensitivity, 79.3% specificity, and 73.0% accuracy. For Category VI, the only significant hypoechogenicity was present with excellent sensitivity of 98.4%, specificity of 0 (because of only one benignancy in Category VI), and accuracy of 97.8%.
On the whole, the diagnostic accuracy of the comprehensive US diagnosis in all categories stayed at a relatively high level from 78.4% to 88.6%, and good specificities ranging from 77.3% to 100% were revealed in all BSRTC categories. The aforementioned characteristics appeared obvious especially in Category VI with 88.5% sensitivity, 100% specificity and 88.6% accuracy (Table 4).
The sensitivities of RTE in the differentiation between benign and malignant thyroid nodules in each BSRTC category were 100%, 82.4%, 90.0%, 62.5%, 86.3%, and 89.6%, respectively (in category order), while corresponding specificities were unsatisfactory as 35.7%, 50.4%, 54.5%, 53.4%, 66.7% and 0, respectively. The best diagnostic performance of RTE was present in Category V with a sensitivity of 86.3%, specificity of 66.7% and accuracy of 85.2% (Table 4).
Discussion
FNA was first introduced to the United States from Sweden in the early 1980’s [30]. With the real-time guidance of ultrasound, it has developed into an accurate, cost-effective, and minimally invasive diagnostic technique to distinguish between benign and malignant thyroid nodules worldwide. As mentioned before, conventional US and RTE have been proved to be valuable in the assessment of thyroid nodules, thus we tried to confirm whether these two techniques are a useful tool to help make further decision for thyroid nodules from each category of BSRTC in the present study.
An FNA specimen is usually considered as non-diagnostic when there is cyst fluid only or virtually acellular specimen, etc [31]. According to the data shown in Table 1, non-diagnostic (Category I) cytology accounted for 11.7% of all while the rate of cancer prevalence was 13.8%, which was consistent with that of most literatures (8–20% in the non-diagnostic rate and 2–15% in the malignancy risk) [6–8, 33]. Though repeat FNA with US guidance was recommended after non-diagnostic results in the ATA guideline [2], not every patient in that case would like to undergo second aspiration. As a matter of fact, most of the recruited patients in our study chose continuative US follow-up instead, and only 20 underwent surgeries because of their own strong will or the presence of another concomitant malignant lesions. In addition, we noticed that irregular margin showed excellent sensitivity of 100% while taller-than-wide shape showed rather high accuracy of 92.3% and specificity of 100% in Category I. Also, the diagnostic accuracy of comprehensive US in non-diagnostic cytology was at a relatively high level with a sensitivity of 77.8%, specificity of 89.3% and accuracy of 87.7%. While RTE seemed to fail to provide valuable diagnostic information in this category with an accuracy of 44.6% (Fig. 1a). Based on the above results, it could be reliable for those non-diagnostic nodules without particular risk factors (such as irregular margin and taller-than-wide shape) to keep on clinical and imaging follow-up instead of repeat diagnostic FNA or surgery. And both of Ana et al. and Moon et al. arrived at similar conclusion with us in their studies [8, 9].

Sonogram in a 52-year-old woman with 5 mm thyroid nodule in right thyroid gland. The FNA result was Category I as non-diagnostic. The nodule had hypoechogenicity, irregular margin, and taller-than-wide shape. With its three suspicious features, the nodule was assessed as malignant at comprehensive US diagnosis. Besides, as green occupied over 50% of the nodule, the RTE diagnosis was benign with ES = 2. The patient underwent surgery due to thyroid malignancy in the other side of thyroid and this nodule was ultimately diagnosed as papillary thyroid carcinoma. Fig. 1b Sonogram in a 47-year-old woman with 8-mm thyroid nodule in right thyroid gland. The FNA result was Category II (Hashimoto’s thyroiditis). The nodule had hypoechogenicity, irregular margin, and taller-than-wide shape. With its three suspicious features, the nodule was assessed as malignant at comprehensive US diagnosis. However, as green occupied over 50% of the nodule, the RTE diagnosis was benign with ES = 2. The patient underwent surgery due to strong will out of anxiety and the nodule was ultimately diagnosed as papillary thyroid carcinoma. Fig. 1c Sonogram in a 44-year-old woman with 13 mm thyroid nodule in left thyroid gland. The FNA result was Category III (follicular lesion of undetermined significance). The nodule was isoechoic with regular margin. Without apparent suspicious features, the nodule was assessed as benign at comprehensive US diagnosis. RTE diagnosis was benign with ES = 2 as green occupied over 50% of the nodule. Atypical adenoma with follicular epithelium hyperplasia was ultimately revealed by total thyroidectomy. Fig. 1d Sonogram in a 59-year-old woman with 24 mm thyroid nodule in left thyroid gland. The FNA result was Category IV (follicular neoplasm). The nodule had hypoechogenicity, microcalcification, but regular margin. With two suspicious features, the nodule was assessed as benign at comprehensive US diagnosis. While RTE diagnosis was benign with ES = 2 as green occupied over 50% of the nodule. The patient underwent surgery and the nodule was ultimately diagnosed as follicular thyroid carcinoma. Fig 1e. Sonogram in a 70-year-old woman with 12 mm thyroid nodule in left thyroid gland. The FNA result was Category V (suspicious for papillary thyroid carcinoma). The nodule had isoechogenicity and irregular margin, with only one suspicious features, the nodule was assessed as benign at comprehensive US diagnosis. While blue occupied most region of the nodule, the RTE diagnosis was malignant with ES = 3. The patient underwent surgery and the pathological result was follicular adenoma. Fig. 1f Sonogram in a 45-year-old woman with 14 mm thyroid nodule in left thyroid gland. The FNA result was Category VI (poorly differentiated thyroid carcinoma). The nodule had hypoechogenicity, irregular margin, and taller-than-wide shape. with three suspicious features, the nodule was assessed as malignant at comprehensive US diagnosis. While blue occupied over 50% of the nodule, the RTE diagnosis was malignant with ES = 3. The patient underwent surgery and the pathological result was follicular neoplasm of uncertain malignant potential.
Bethesda category II is the only definitely benign category in BSRTC, and its false negative rate couldn’t be ignored as it occupied 26.6% of all with a malignant rate of 11.5% in this study, which was much higher than reported (less than 3%) [34]. Benign nodules are usually recommended to be followed up with US examinations at 6- to 18-month intervals after initial benign results, and unless there is significant growth or suspicious US changes. a repeated FNA won’t be considered in most institutions. Interestingly, we found irregular margin, and RTE both exhibited consistent diagnostic performance of higher sensitivity (94.1%, 82.4%) but lower specificity of 74.0% and 50.4% in Category II. While taller-than-wide shape and comprehensive US showed higher specificity (96.9%, 80.2%) but lower sensitivity of 35.3% and 64.7%. We could presume that if a thyroid nodule with a first benign cytologic result has irregular margin and hardness of “ES 3 or 4” or more suspicious US features, further repeat FNA or histology should be recommended to revise the initial cytologic diagnosis for further clinical management (Fig. 1b). This suggestion was also confirmed by Chernyavsky’s study on first benign FNA, they found the overall false-negative rate of all FNAs is reduced from 10.2% to 4.5% with a second FNA on those nodules with suspicious sonographic features [10].
In common, categories III and IV include main follicular patterned lesions of thyroid as indeterminate nodules. Previous studies have found that 10% to 40% of all FNAs yield indeterminate results [11, 12]. A significantly increased proportion of indeterminate cytology has been observed over the past decade since the implementation of the Bethesda classification and revised 2015 ATA guidelines [13]. Due to the uncertainty of categories III and IV, it could be definitely different in the clinical managements from periodic US examinations, repeat FNAs, core-needle biopsies even to surgeries. Associated risk of malignancy in categories III and IV has been reported as 5–10% and 20–30% [31], while the actual malignant rates in the present study were 31.3%, 21.6%, respectively. We speculated that unavoidable selecting bias might cause this difference. Follicular lesion is known as a unique pattern with benign sonographic appearance, soft structure of follicular cells. And the key diagnostic point of malignancy between follicular carcinoma and follicular adenoma lies in the histological findings of vascular or capsular invasion. All of the previously mentioned properties make the differentiation between benign and malignant follicular lesions very difficult (Fig. 1c-d). For Category III, irregular margin was the only significant sonographic feature with a 100% sensitivity, 54.5% specificity, and 68.8% accuracy; As to Category IV, all the sonographic features except hypoechogenicity showed significant difference between benignancy and malignancy along with the accuracy ranging from 64.9% to 83.8%. while microcalcification was only significantly different with 50.0% sensitivity, 79.3% specificity and 73.0% accuracy in category IV among all categories. Comprehensive US exhibited high sensitivity of 90% with 77.3% specificity and 81.3% accuracy in category III, but acquired a lower sensitivity of 50% with 86.2% specificity and 78.4% accuracy in category IV. Several researchers have reported the usefulness of elastography in the preoperative guidance of thyroid nodules with indeterminate cytology [24, 25], while our results indicated that RTE indeed didn’t performed well in categories III and IV with low-level accuracies of 65.6% and 55.4%, even though a little higher than that in categories I and II. Hence, we could speculate that if highly suspicious ultrasound characteristics such as irregular margin and microcalcification were found in the follicular lesion, more effective clinical intervention (repeat FNA, molecular testing or diagnostic surgery) should be made.
Regarding categories V and VI, there were usually more definite diagnoses with rather high accuracy and reliability in decision making focused on surgical intervention. The accuracies of comprehensive US (85.2%, 88.6%) and RTE (85.2%, 89.1%) is relatively low compared to corresponding diagnostic performance of FNA in Category V (94.4%) and VI (99.5%). Among the included 238 nodules in categories V and VI, four nodules were pathologically diagnosed as benign including three follicular neoplasms and one nodular goiter. Through reviewing their preoperative sonographic data, we found it difficult to distinguish the four lesions with either conventional US or RTE (Fig. 1e), and we thought other tools like genetic test might provide helpful information in that case. Besides, we considered “follicular neoplasm of uncertain malignant potential” as the only one benignancy in Category VI, and this might be the reason of false positive cytologic result, due to the aforementioned technical limitation of pathology in the follicular lesion (Fig. 1f).
Certainly, there are still several limitations in the present study. First, the prevalence of malignancy in the present study might be overestimated because many patients were referred to our tertiary hospital for further diagnostic FNA or surgery from primary hospitals. Second, the study has not evaluated the potential inter-observer variability on conventional US examinations, RTE, or FNA, as they were all performed by a single radiologist with rich experience in related operation, respectively. Third, we had to admit that not all of the enrolled patients had histological confirmation, and regular follow-up was preferred by most patients with benign or indeterminate FNAs, from whom we managed to acquire detailed follow-up information as long as 74 months. Additionally, as most thyroid carcinomas have indolent nature, longer follow-up is still required for more accurate assessment in future study.
The available data indicated that comprehensive US had good diagnostic performance in BSRTC categories I–IV, while RTE failed to provide useful diagnostic information in general. Therefore, conventional US is complementary to FNA for providing additional hints in further clinical managements. Learning about the above could help clinicians work out the most appropriate therapeutic schedule for patients with thyroid lesions.
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Acknowledgments
This study was funded by National Natural Science Foundation of China (grant number 81671700, 81701706, 81701700); Natural Science Foundation of Shanghai (16ZR1426000); Shanghai key discipline of medical imaging (2017ZZ02005).
