Abstract
Background:
Several studies have evaluated the ability of ultrasound elastography (USE) to diagnose malignant nodules. However, these studies had important limiting factors, selection bias and small sample size. The aims of the present study were to prospectively assess, in a large group of patients, the diagnostic power of USE for detecting malignancy in thyroid nodules, and to compare this technique with B-mode grayscale ultrasonography (BUS) and power Doppler ultrasonography (PD).
Method:
There were 194 patients with 237 thyroid nodules who were examined using BUS, PD, and USE. USE scores were classified according to the elasticity: score 1 as high, score 2 as intermediate, and score 3 as low (i.e., a high degree of stiffness). Fine-needle aspiration cytology (FNAC) was performed in all nodules at least two different times. Nodules having two benign FNAC readings that did not change the diameter during a 6-month follow-up period were classified as benign. Patients having thyroid nodules with indeterminate, suspicious, or malignant cytology had total or hemithyroidectomy to remove the nodule and treat the malignancy.
Results:
Fifty eight (25%) nodules in 45 (23%) patients were found to be malignant. USE had a limited sensitivity and a positive predictive value in detecting malignant thyroid nodules and was not superior to BUS. USE had almost the same specificity and a negative predictive value as BUS. A power Doppler type-3 pattern was not of sufficient sensitivity to detect malignancies in thyroid nodules.
Conclusions:
In contrast to earlier reports, this current study noted a lower sensitivity and specificity of USE for the diagnosis of malignancy in thyroid nodules than previously reported.
Introduction
A thyroid nodule that is very firm on palpation is more suspicious for malignancy. In addition to palpation, tissue stiffness can be evaluated by ultrasound elastography (USE), even those that are detected by imaging, but not palpable. Tissue stiffness is assessed by USE by measuring the amount of distortion that occurs when the lesion is subjected to external pressure (2).
The power of USE to detect malignant thyroid nodules has been the subject of many reports (3 –17), and these generally described the technique to be highly accurate. A recent meta-analysis evaluating most of these reports was also consistent with relatively high sensitivity and specificity for USE (18). Regardless of this, all of these studies had the major limiting factors of small sample size and/or selection bias (19). Therefore, we planned a prospective study, assessing the diagnostic power of USE to detect malignant thyroid nodules in a consecutive series of patients with thyroid nodules that were not preselected for thyroid surgery.
Materials and Methods
Patients
Patients with thyroid nodules were consecutively enrolled in this prospective study, and all participants were followed at least for 6 months. Overall, data regarding 237 thyroid nodules from 194 patients (157 women, mean age 43.7±13 years, range 20–76, and 37 men, mean age 47.5±14 years, range 20–74), who were admitted to our Endocrinology and Metabolism Outpatient Unit between July 2009 and August 2010, were included in the analyses. None of the patients had low serum TSH levels; their thyroid functions were either euthyroid or rarely hypothyroid. The thyroid nodules in the study had a longest diameter of 5–40 mm. The exclusion criteria were the presence of coalescent nodules, a spongiform appearance of nodules, or nodules having either more than 20% cystic content or eggshell calcifications, which are the known confounding factors for USE. In the case of a previous treatment with radioactive iodine or atrophic thyroiditis, the elasticity of the thyroid tissue might be altered, affecting the strain ratio (SR). Therefore, we excluded such cases during the measurements of mean SR values. Fine-needle aspiration cytology (FNAC) was performed on all nodules at least two different times. This was done under US guidance by an experienced endocrinologist. According to our experience (20,21) and the clinical guidelines (22,23), nodules having two benign FNAC readings and having unchanged nodule diameters (less than a 20% increase in at least two nodule dimensions with a maximum increase of 2 mm in solid nodules or in the solid portion of mixed cystic–solid nodules) during a 6-month follow-up period were classified as benign. Patients having thyroid nodules with suspicious or malignant cytology and/or presenting with compressive symptoms underwent thyroid surgery. Patients having follicular lesions that were identified by FNAC were evaluated by a Tc-99m or I131 thyroid scan. If an autonomously functioning nodule was not seen, these patients also underwent total or hemithyroidectomy. All samples were interpreted by an experienced cytologist as well as a pathologist. The nodules with and without macrocalcification were analyzed as a group as well as separately. The study was approved by the local ethics committee, and informed consent was obtained from each patient.
Thyroid grayscale US, power Doppler US, and US elastography
All participants were examined using high-resolution B-mode grayscale ultrasonography (BUS), power Doppler ultrasonography (PD), and real-time USE (Hitachi® EUB 7000 HV machine with 6–13-MHz linear transducer and elastography software), and video images of all lesions were recorded prospectively under color map-1. USE was performed in all patients before FNAC. US features such as echogenicity (hypoechoic, hyperechoic, or isoechoic, defined according to individual thyroid parenchyma echogenicity), the presence or absence of halo sign, the presence of micro/macrocalcification, margin features (presence of irregular margin or not), and blood flow pattern by PD, were evaluated. PD patterns were classified as the absence of blood flow or minimal peripheral/central blood flow as type 1, the presence of marked peripheral blood flow as type 2, and marked central (chaotic) blood flow as type 3.
We used two USE output formats. One was color elastography (color map-1), and the other was SR measurement. One displays red-green to blue image, red-green representing the most elastic lesion with a high elasticity score and blue the least elastic or the hardest lesion with a low elasticity score. Color-USE features were classified according to the presence of elasticity or red-green color within the whole lesion or in more than 50% of the lesion (elasticity score 1, soft), hardness or blue color in more than 50% of the lesion (elasticity score 2, intermediate), and no elasticity or blue color in the whole lesion with/out a posterior shadowing (elasticity score 3, hard). This scoring system was proposed by Itoh et al. (2) and later modified by Rago et al. (24). When using USE, the lesion is chosen by the operator, and a repetitive compression is applied longitudinally on the selected area of interest with the US probe. The pressure score is then standardized and displayed in a numeric scale by software. The pressure is kept optimal at the values of 3 or 4 as displayed on the screen of the computer for 10–15 seconds. Longitudinal scans were used. An elastogram in a defined color spectrum is obtained.
The SR measurements on USE images were calculated by measuring the average strain in a thyroid nodule (strain of the thyroid nodule) and comparing it to the average strain of a similar-sized area of an adjacent thyroid tissue (strain of the thyroid tissue). The SR measurement is made according to the following formula: SR=strain of the thyroid tissue/strain of the thyroid nodule. Thus, SR increases as a function of the relative stiffness of a thyroid nodule. The measurement of SR might, in part, helps to correct bias in the scoring of color images. All ultrasonographic evaluations were performed by the same endocrinologist (U.Ü.) experienced in these techniques, who was blinded for the results of cytology. The recorded video images of all lesions were also reviewed subsequently by another endocrinologist (M.F.E.) experienced in all US techniques. More than 90% interobserver agreement was seen for USE under color map-1 between the two operators. As a measure of our experience in ultrasonographic examination, 6000–7000 ultrasonographic evaluations had been performed in our outpatient unit each year for more than 10 years, with additional USE for more than 3 years.
Cytological and histopathological diagnoses
All cytological and histopathological examinations were made, as described elsewhere (25,26), by an experienced cytologist and a pathologist, who were both blinded for all information regarding ultrasonographic features. Sufficiency of FNAC samples was defined according to widely recognized guidelines (25). Histological diagnoses were based on World Health Organization criteria (26).
Statistical analyses
SR measurements are shown as the median and the minimum–maximum values. The Mann–Whitney U-test was used to compare the SRs of malignant and benign thyroid nodules. The diagnostic performance of the SR measurements was evaluated by the receiver-operating characteristic (ROC) curve. p-values<0.05 were considered as statistically significant. Analyses were carried out using SPSS software version 18.0 (IBM Corp.).
Results
Surgery, cytology, and histopathology
Among the 194 patients with 237 nodules, there were 51 patients (26%) with 72 nodules (30%) who underwent hemithyroidectomy (n=2) or total thyroidectomy (n=49). The indications for surgery in those nodules were cytological diagnosis of papillary thyroid carcinoma (PTC) (n=37), suspicion of PTC (n=7), suspicion of a medullary thyroid carcinoma (MTC) (n=3), large size of the nodules with benign cytology (n=7), presence of a Hürthle cell neoplasm (n=2), presence of a follicular adenoma (n=5), and presence of atypical cytology with unknown significance (n=11). Among the patients with a final malignant cytology, there were two patients whose first FNAC was read as benign. One nodule with an FNAC diagnosis of MTC was a follicular thyroid carcinoma (FTC) by histopathology.
After surgery, 81% of the nodules (58/72), which were present in 88% (45/51) of the patients, were found to be malignant by histopathology. Among these, the diagnosis was PTC for 47 nodules, follicular variant PTC for 3 nodules, oncocytic type PTC for 3 nodules, MTC for 2 nodules, and benign (colloid or adenomatous goiter and thyroiditis) for 14 nodules. From the entire cohort, 25% of the nodules (58/237) from 23% (45/194) of the patients were malignant.
Conventional US, power Doppler US, and US elastography
The sensitivities, specificities, positive, and negative predictive values (PPV and NPV, respectively) for BUS and PD to detect malignancy are listed in Table 1. Among the different features of BUS, the loss of halo sign was the most sensitive (91%) and had the highest NPV (94%). However, having an irregular margin was the most specific (85%) feature with the highest PPV (60%). Among the features of PD, the marked central blood flow (chaotic, type 3) pattern was the most specific (93%) and had the highest PPV (41%), but its sensitivity (16%) was low (Table 1).
US, ultrasonography; Sens., sensitivity; Spec., specificity; PPV, positive predictive value; NPV, negative predictive value; Acc., Accuracy.
The sensitivities, specificities, PPV, and NPV of color USE to detect malignancy are listed in Table 2. We studied the sensitivity and PPV of USE with respect to the sizes of the nodules and found that they were inversely correlated with the increase in the nodule size. In subcentimeter nodules, USE had an almost twofold higher PPV when compared to larger nodules (Table 2). Score 1 was seen in 103 nodules (44%), 12 having malignancy (12%). Score 2 was observed in 72 nodules (30%), 19 being malignant (26%). Score 3 was detected in 62 nodules, 27 of which were malignant (44%). When the nodules having macrocalcifications were excluded, the sensitivity, specificity, and PPV of USE were slightly increased (Table 2). The PPV of USE in nodules with macrocalcifications was decreased to half of that of nodules without macrocalcifications (25% and 50%, respectively). When the distribution of the malignant thyroid nodules according to different types of thyroid cancers and USE scores were analyzed, 20 of 47 nodules having the diagnos7is of classical type PTC had USE score 3, 16 had score 2, and the remaining 11 had score 1. All nodules with the diagnosis of the oncocytic variant PTC (3/3) and MTC (2/2) had USE score 3. Two of 3 nodules having the diagnosis of the follicular variant PTC had USE score 3, and the remaining one had score 2. Among the nodules having the diagnosis of FTC, 2 had USE score 2, with the remaining one having score 1, and none of the nodules having the diagnosis of FTC had USE score 3. The PPV of USE in detecting malignancy in the thyroid nodules with indeterminate cytology (75%) was higher than that of all other groups (Table 3). The combined diagnostic power of different features of US and USE score 3 in detecting malignant thyroid nodules is summarized in Table 4. As seen, the combination of different features of US with USE slightly increased the sensitivity and PPV of USE.
USE, Ultrasound elastography.
The comparison of the median SR of malignant thyroid nodules with benign thyroid nodules revealed that malignant nodules had significantly higher levels of stiffness in comparison to benign ones (median SR [min–max]: 3.26 [0.30–54.5] vs. 1.6 [0.30–20.9], p<0.001). The ROC curve for distinguishing malignant nodules from benign ones is shown in Figure 1. The area under the curve for diagnosing adenomas was 0.72 (95% confidence interval=0.64–0.80, p<0.001). The best cut-off point for differentiating benign and malignant nodules was 2.1 (sensitivity 69%, specificity 67%) for the strain index evaluation.

Receiver-operating characteristic (ROC) curve for distinguishing malignant from benign nodules. The area under the curve for diagnosing adenomas was 0.72 (95% confidence interval=0.64–0.80, p<0.001).
Discussion
This is the largest study evaluating the power of features of USE, BUS, and PD to diagnosis malignancy in thyroid nodules in unselected consecutive patients. We demonstrated that USE had limited sensitivity and PPV in detecting malignant thyroid nodules and did not have any superiority to BUS. On the other hand, we found that USE had similar specificity and NPV compared to BUS, and the PD type-3 pattern did not have enough sensitivity to detect malignancy. When BUS was combined with USE, the diagnostic power in detecting malignancy was further increased. However, the diagnostic power of USE was slightly decreased when it was used in nodules containing macrocalcifications. Furthermore, the sensitivity and PPV of the USE were further decreased as the nodule size increased.
Several studies evaluating the power of USE in detecting malignant thyroid nodules have been reported in the literature (reviewed in Table 5). In a recent meta-analysis, 92% sensitivity and 90% specificity for USE in thyroid nodules were reported (18). However, detailed evaluation indicates that half of the studies were of patients who only underwent thyroid surgery (3,4,10,12,15). The other half included patients in whom the final evaluation for malignancy in their thyroid nodules was FNAC as well as histopathology (3,5 –9,11,13,14,17). Therefore, these studies can be divided into two subgroups: group 1, being the former and group 2, the latter. It can be said that a major limiting factor for the group-1 studies was selection bias. The power of USE is expected to be higher for nodules in which surgery is necessary. Additionally, most of these studies had a small sample size. In fact, all group-2 studies had small sample sizes. Furthermore, when the high interobserver variation for USE reported by Park et al. (27) is taken into account, the power of studies with small samples sizes is limited. Recently, Vorlander et al. (12) reported a sensitivity of 70% and a PPV of 43% for USE in a series of 309 thyroid nodules. Similarly, Cakir et al. (28) also studied 292 patients with 391 nodules and reported a sensitivity of 58% and a PPV of 49% for the color USE. As seen in these two studies, the predictive power of the technique decreased with the increase of the number of study participants. Despite the large sample sizes of these studies, pertinent evaluation of the power of USE is difficult, since the study only included patients who were selected for thyroidectomy. Similar to the group-2 studies, unselected patients with thyroid nodules having both cytological and histopathological diagnoses were prospectively evaluated in the present study, and both the sensitivity and PPV of USE were found to be <50%. Therefore, we conclude that USE can be integrated into the guidelines as an ancillary technique to conventional methods, rather than a sole diagnostic tool. By using USE, nonsuspicious nodules that do not need further investigation might possibly be selected more easily than the suspicious ones, due to USE's high specificity and NPV.
The power analyses indicate be made according to SR measurements.
Only including nodules with diameters <10 mm.
ND, Not determined; No., number; Histopath., histopathology; SR, Strain Ratio; CE, color elastography.
One of the challenging problems with USE is to detect FTC. Bojunga et al. (18) reported four out of nine (44%) FTCs, and two metastatic adenocarcinomas could not be detected as stiff lesions by USE. Similarly, we noted that one out of three thyroid nodules with FTC had a USE score of 1, and the remainder had score 2. Further studies are needed to evaluate the exact power of USE to predict FTC preoperatively. Another issue about USE is its application in nodules with indeterminate cytological results. Rago et al. (24) studied the role of USE in thyroid nodules with indeterminate or nondiagnostic cytology and demonstrated about 95% sensitivity and 90% specificity in selecting candidate nodules for surgery. More recently, Lippolis et al. (29) also studied USE in thyroid nodules with indeterminate cytology and did not confirm the results of Rago et al. (24). In the present study, USE scores 2 and 3 could detect seven out of nine malignant nodules among 18 nodules with indeterminate cytology, and the sensitivities (78%) and specificities (44%) of USE in nodules with indeterminate cytology were not as high as in the study of Rago et al. (24). Nevertheless, we found that the power of USE was higher in nodules with indeterminate cytology than that of all nodules. The application of USE might gain more importance with forthcoming studies about nodules with indeterminate cytology.
In addition to the high interobserver variation for USE, another limiting factor in most of the studies was the scanning characteristics (longitudinal or transverse). It was reported that color-scale USE or SR measurement can be changed in terms of longitudinal or transverse application of USE, and the longitudinal scan had a higher power compared to the transverse scan (28). This may partly explain why the USE studies evaluating thyroid nodules reported variable cut-off values for SR in detecting malignant thyroid nodules. In the present study, we used only longitudinal scans, and we also observed differences between two different axis measurements.
Although we included consecutive patients with thyroid nodules, the rate of obtaining a diagnosis of malignancy in our cohort was 23% (25% in the nodules); this was higher than expected. One of the major reasons for this high malignancy rate was probably our exclusion criteria. We did not include patients with coalescent nodules or nodules having more than 20% cystic content. Cystic nodules and the spongiform appearance of a nodule, which includes multiple tiny cystic structures in more than 50% of nodular content, have a high probability of being benign (30). Furthermore, coalescent nodules are frequently seen in daily clinical practice, especially in residents of iodine-deficient regions; almost all of these nodules are benign. The use of novel elastography techniques, user independent and quantitative, like share wave-based USE, which provides local elasticity by using transient pulses to generate shear waves in the body and can therefore evaluate elasticity at each point of the target lesion, might overcome these challenges. In a recent study (31), share wave USE has shown a high PPV (92.3%) in detecting malignant thyroid nodules, even in patients with multinodular goiters. Another recent study demonstrated that share wave USE can accurately identify malignant thyroid nodules in patients with/out autoimmune thyroid disease (32). However, it should be kept in mind that these results come from only two studies with limited sample sizes. Therefore, they need confirmation in further studies with larger sample sizes.
In conclusion, the current study found a lower sensitivity and specificity of USE than previously reported. Similar to the historical studies evaluating the power of novel ultrasound techniques, the preliminary studies regarding USE in thyroid nodules have a tendency to overshoot the true sensitivity and specificity values for detection of malignancy. Although USE was not an alternative to BUS or FNAC, it was found in this study to be a reliable ancillary procedure to select thyroid nodules with a low risk for malignancy. The present study also underlines the need for improvements in qualitative assessment of color elastographic patterns. A technique that uses a more quantitative method, like shear wave elastography, might overcome the limitations of USE limited to color output reporting.
Footnotes
Acknowledgments
The authors are grateful to Dr. S. Altug Kesikli for his contributions in this study. This article is not supported by any grants/funds or fellowships.
Disclosure Statement
All authors declare no conflicts of interest.
