Abstract
Background:
The clinical management of thyroid nodules with indeterminate cytology (IC) remains challenging. The role of shear wave elastography (SWE) in this setting is controversial. The aim of the study was to assess the performances of SWE in terms of prediction of malignancy, reproducibility, and combined analysis with ultrasound (US) examination in thyroid nodules with IC.
Methods:
This prospective study was conducted in two referral centers. Eligible patients had a thyroid nodule ≥15 mm with IC (Bethesda class III–V) for which surgery had been recommended. Patients underwent a standardized US evaluation combined with a SWE exam followed by surgery. SWE parameters included mean (meanEI; kPa) and max (maxEI) elasticity values, and ratio (meanEI nodule/parenchyma).
Results:
One hundred and thirty-one nodules (median size 30 mm) in 131 patients were studied. IC was class III in 28%, class IV in 64%, and class V in 8% of cases. After surgery, 21 (16%) nodules were malignant, including nine papillary thyroid cancers (PTC), six follicular thyroid cancers, five poorly differentiated carcinomas, and one large B-cell lymphoma. SWE parameters were similar in benign and malignant nodules, including meanEI (20.2 vs. 19.6 kPa), maxEI (34.3 vs. 32.5 kPa), and ratio (1.57 vs. 1.38). In malignant nodules, meanEI, maxEI, and ratio were higher in the classic PTC variants (n = 4) than in the other PTC variants (n = 5; p < 0.02) and in non-PTC tumors (n = 12; p < 0.005). Intra- and inter-observer coefficients of variations for meanEI in nodules were 23% and 26%, respectively. The French Thyroid Imaging Reporting and Data System score, the American Thyroid Association US classification, and the EU-Thyroid Imaging Reporting and Data System were not associated with malignancy.
Conclusions:
Despite high elasticity values in classic PTC variants, conventional SWE indexes failed to discriminate between benign and malignant tumors in thyroid nodules with IC.
Introduction
T
Thyroid ultrasound (US) is a noninvasive main imaging tool that contributes to assessing the risk of malignancy in thyroid nodules. Several US features, including microcalcifications, irregular margins, and a taller-than-wide shape, are associated with thyroid cancer, which is most commonly papillary thyroid cancer (PTC). Over the last years, several US-based risk stratification systems have been developed, in particular the French Thyroid Imaging Reporting and Data System (TIRADS) score (2,3), the 2015 American Thyroid Association (ATA) classification (4), and, very recently, the EU-TIRADS (5). Such systems have been designed to evaluate the risk of malignancy of nodules using a scale based on several well-defined sonographic patterns. The diagnostic value of such stratification systems has been tested in a limited number of large retrospective (6) or prospective studies (3).
Cytology obtained after fine-needle aspiration biopsy (FNAB) reported using the Bethesda classification (7) remains the most accurate and cost-effective method for evaluating thyroid nodules. In approximately 25% of cases, however, indeterminate cytology (IC) does not allow differentiation between benign and malignant nodules. In this IC category, the risk of cancer varies with the cytological subtype and is about 5–15% in “atypia of undetermined significance/follicular lesion of undetermined significance” (AUS/FLUS or class III), 15–30% in “follicular neoplasm/suspicious for follicular neoplasm” (FN/SFN) or Hürthle cell neoplasm (class IV), and 50–75% in “suspicious for malignancy” (SUSP or class V). In these IC nodules, particularly in class IV or class V, surgery is generally recommended, even if the overall risk of cancer is estimated at 20–30% (8).
In recent years, diagnostic tools in the field of molecular biology or imaging have emerged with the objective of refining the diagnosis of thyroid nodules, especially those with IC. Elastography, which assesses tissue elasticity, has been proposed for distinguishing between benign and malignant nodules, the latter supposedly being harder than the former (9 –13). A meta-analysis based on eight studies (639 nodules, 60% of which underwent surgery) has shown high sensitivity (92% [confidence interval (CI) 88–96%]) and specificity (90% [CI 85–95%]) (14). Results of other studies are more controversial and suggest a more limited diagnostic value (15,16). Shear wave elastography (SWE) is a quantified technique also associated with promising results. In 2010, Sebag et al. suggested that SWE could predict malignancy in 93 patients, with a sensitivity of 85% and a specificity of 94% using a cutoff of 65 kPa (12). More recently, Samir et al. reported similar sensitivity and specificity in 35 nodules with IC but using a lower 22 kPa cutoff (17). However, these findings have not been confirmed in a large prospective study.
This prospective bicentric study was designed to test the diagnostic value of SWE in patients with nodules showing IC in terms of prediction of malignancy, reproducibility, and combined analysis with US.
Patients and Methods
Patients
This diagnostic study was approved by the Local Ethics Committee and the French Health Authorities (
Eligible patients had a thyroid nodule of at least 15 mm with IC according to Bethesda classification in the six months prior to inclusion, for which surgery had been recommended. Indeterminate cytology included classes III, IV, and V sub-categories. Patients with coalescent nodules preventing correct individualization were not included.
All patients underwent standardized thyroid imaging harmonized between participating centers, including a conventional US evaluation, combined with a SWE exam followed by surgery within four months after imaging.
Review of initial cytology
For each eligible patient, a cytological review was performed in the two weeks following patient consent by an experimented cytologist working in the other participating center and blind to the initial Bethesda sub-category. The objectives of this review were to verify the indeterminate status of the initial cytology, to allow exclusion of patients if necessary, and to estimate the inter-observer agreement regarding the indeterminate sub-category.
Conventional thyroid US
Neck US was performed using an Aixplorer® SuperLinear™ SL15-4 high-frequency linear transducer (SuperSonic Imagine S.A., Aix-en-Provence, France). Eight experienced sonographers (three in center 1 and five in center 2) were involved in this study.
In addition to a descriptive diagram to localize the nodule, to be transmitted to both the surgeon and pathologist, the nodule characteristics were prospectively documented: dimensions, echostructure (solid, mixed, spongiform), echogenicity (hyperechoic, isoechoic, mildly or markedly hypoechoic), margins (smooth, spiculated, irregular), shape (round/oval, taller than wide), presence of micro- or macrocalcifications, and vascularization.
Using the US parameters, each nodule was scored by an experienced thyroid sonographer according to the French TIRADS (3), the 2015 ATA (4), and the EU-TIRADS (5) classifications.
SWE
Patients underwent real-time SWE using the same equipment in each center (Aixplorer® SuperLinear™ system; SuperSonic Imagine S.A.), with a 4–15 MHz high-frequency linear transducer providing elasticity maps with a spatial resolution of 1 mm × 1 mm. Both US machines were calibrated using the same parameters.
SWE data acquisition and measurements were performed immediately after US examination. The protocol consisted in the same physician performing two (for nodules <25 mm) or three (for nodules ≥25 mm) direct SWE measurements on frozen SWE maps, successively repositioning the transducer between each acquisition. To optimize beam penetration, the “penetration” mode was selected. The gain was set at 70%. Pressure exertion on the transducer was limited in order to avoid compression artifacts.
Quantitative SWE parameters were measured using the device manufacturer's “Q-Box” quantification tool. For that purpose, for each position of the transducer, a first 5-mm diameter region of interest (ROI) was positioned in the nodule where the color map was the most homogeneous. SWE Q-Box parameters from this first ROI included mean (meanEI), min (minEI), max (maxEI), and standard deviation (SD) elasticity index in kPa. A second ROI of the same size was subsequently positioned inside the parenchyma next to the nodule and at a similar depth. The ratio between the mean elasticity computed in the nodule ROI over that computed in the parenchyma ROI was provided.
A subset of randomly selected patients was also included in an ancillary inter-observer reproducibility study. In these patients, the same acquisition protocol was performed by a second physician immediately after the first exam.
Surgery and histological examination
Thyroid surgery was performed in each participating institution and consisted in either lobectomy including the isthmus or total thyroidectomy. The surgeon oriented the resected specimen and localized the nodule for pathological diagnosis with the support of the descriptive diagram. The 2004 World Health Organization (WHO) criteria were used for diagnosis (18). No systematic histological central review was performed. However, when necessary, the material was sent to the pathologist of the other institution, and a consensus diagnosis was established.
Statistical analysis
Patient characteristics and patient subgroups were compared using the Wilcoxon or Kruskal–Wallis test (continuous variables) and chi-square or Fisher's exact test (nominal variables), as appropriate. Inter-reader agreement of cytology was assessed using the kappa coefficient. The initial indeterminate sub-category was used for data analysis.
Intra-observer and inter-observer SWE reproducibility were assessed using the intra-class correlation coefficient (ICC(2,1); two-way random, absolute agreement) (19), and the coefficient of variation (CV). The ICC reflects the ratio between the true and total variability. A high measurement reproducibility is indicated by an ICC near 1.00 where the variability is only due to the between subject variance. The CV is provided by the ratio between the standard deviation and the mean value. The CV varies from 0% to 100%, 0% of CV meaning that all measurements performed in the same subject are identical. A low CV thus indicates high measurement repeatability.
All tests were two-sided, and a p-value of <0.05 was considered statistically significant. Analyses were performed with SAS v9.4 (SAS Institute Inc., Cary, NC).
Results
Patient characteristics
From May 2013 to September 2015, 140 patients were enrolled in this study. Nine patients were subsequently excluded for the following reasons: consent withdrawal (n = 4), spontaneous shrinkage of thyroid nodule (n = 1), surgery canceled because of comorbidities (n = 1), or surgery performed outside the participating centers (n = 3). Overall, 131 patients were eligible to be included (98 females/33 males; M age = 52 ± 15 years; Table 1). Median serum thyrotropin (TSH) was 1.69 μIU/mL (range 0.01–14.4 μIU/mL). Although 11 patients had a TSH value <0.4 μIU/mL, only one nodule was found to be autonomous on 99mTc scintigraphy.
n = 104; b n = 95; c n = 19.
Bethesda class III: atypia of undetermined significance/follicular lesion of undetermined significance; class IV: follicular neoplasm/suspicious for follicular neoplasm or Hürthle cell neoplasm; class V: suspicious for malignancy.
TSH, thyrotropin; SD, standard deviation.
Cytological and pathological data
According to initial cytology, 37 (28%) patients had a nodule scored class III, 84 (64%) class IV, and 10 (8%) class V. After review, the IC status was confirmed in all cases. However, only 59% of cytological subcategories were confirmed, leading to a kappa coefficient of 0.23 [CI 0.09–0.38]. This inter-observer agreement was 41% in class III, 65% in class IV, and 60% in class V.
Thyroid surgery was performed 11 days (range 1–98 days) after US and SWE. Of 131 nodules, 21 (16%) were pathologically confirmed as malignant, and 110 were benign. Malignant nodules included nine PTC (classic variant, n = 4; follicular variant [FVPTC], n = 3; oncocytic variant, n = 1; diffuse sclerosing variant, n = 1), six follicular thyroid carcinomas (FTC; FTC with minimal invasion, n = 3; Hürthle cell carcinoma [HCC], n = 3), five poorly differentiated carcinomas (PDTC), and one large B-cell lymphoma. In classes III, IV, and V, the proportion rates of cancer were 13%, 13%, and 50%, respectively (Fig. 1). Benign nodules included 61 follicular adenomas, 26 nodular hyperplasia, 15 oncocytic adenomas, six various other pathological diagnoses (one multinodular goiter, one Graves' disease, one trabecular hyalinized adenoma, three lymphocytic thyroiditis), and two tumors of uncertain malignant potential.

Correlation between cytologic and pathologic data. Bethesda classification is used for cytology: class III, atypia of undetermined significance/follicular lesion of undetermined significance; class IV, follicular neoplasm/suspicious for follicular neoplasm or Hürthle cell neoplasm; class V, suspicious for malignancy; FVPTC, follicular variant of papillary thyroid cancer; FTC, follicular thyroid cancer; HCC, Hürthle cell carcinoma; PDTC, poorly differentiated carcinoma.
The proportions of cytological categories were similar (p = 0.08) in center 1 and center 2 (class III: 23% and 35%; class IV: 68% and 60%; class V: 10% and 5%, respectively). Likewise, the proportion of cancer was similar in both centers (17% vs. 15%). The 12 malignant nodules in center 1 corresponded to four classic PTC, two FVPTC, one FTC with minimal invasion, one HCC, and four PDTC. In center 2, there were no classic PTC, one FVPTC, one oncocytic PTC, one diffuse sclerosing PTC, two HCC, two FTC with minimal invasion, one PDTC, and one B-cell lymphoma.
US data
The median size of the 131 nodules was 30 mm (range 15–71 mm). The main US features for benign and malignant nodules are detailed in Table 2. No sonographic parameter related to nodule size, echostructure, echogenicity, shape, margins, presence of micro- or macrocalcifications, or vascularization was associated with malignancy, and the French TIRADs, the ATA, and the EU-TIRADS scores were not associated with malignancy either. However, differences were found between data obtained in centers 1 and 2. Although no sonographic parameters were associated with malignancy in center 2, echogenicity (p = 0.02), margins (p = 0.03), French TIRADS (p = 0.03), and ATA classification (p = 0.05) were associated with malignancy in center 1.
US, ultrasound; TIRADS, Thyroid Imaging Reporting and Data System; ATA, American Thyroid Association.
SWE data
Intra- and inter-observer reproducibility of SWE
Intra-observer reproducibility was assessed in all 131 patients and inter-observer reproducibility in 47 patients (20 patients in center 1 and 27 patients in center 2). The CV and ICC of the Q-Box measurements performed by the same physician (intra-observer reproducibility) and by two different ones (inter-observer reproducibility) were estimated for the thyroid nodule (meanEI, maxEI, and SD) and for the parenchyma (meanEI). The intra-observer CV for meanEI for all observers combined was 23%, with a slight variability between centers (21.7% and 25.6% for centers 1 and 2, respectively) and a great variability between raters (CV ranging from 17.4% to 35.2%, depending on the observer). The intra-observer CV was slightly higher in the nodule than in the parenchyma (23% vs. 19%; p = 0.03), while the ICCs were similar (0.79 vs. 0.75). The intra-observer CV for maxEI and SD were 23.6% (range 19.5–30.7%) and 30.9% (range 30.3–31.8%), and the ICCs were 0.59 and 0.79, respectively. The inter-observer CV for meanEI was 29.3% and 23.9% for centers 1 and 2, respectively, and the ICCs were 0.69 and 0.68. The inter-observer CV for meanEI was not significantly higher in the nodule than in the parenchyma (26.2 vs. 20%; p = 0.07), and the ICCs were identical (0.68). Finally, the inter-observer CV for maxEI and SD were 25.9% and 32.8%, and the ICCs were 0.60 and 0.34, respectively.
SWE data in malignant and benign nodules
As shown in Figure 2, the SWE parameters were similar in the benign and malignant nodules, including mean EI (20.2 ± 12.4 vs. 19.6 ± 14.9; p = 0.46), max EI (34.3 ± 21.8 vs. 32.5 ± 19.8; p = 0.57), and ratio (1.6 ± 2.7 vs. 1.4 ± 1.8; p = 0.20). In malignant nodules (Table 3), however, values of mean EI, max EI, and ratio were higher in the classic PTC variants (n = 4) than in the other PTC variants (n = 5; p < 0.02) and in non-PTC tumors (n = 12; p < 0.005). No significant difference was found between data obtained in center 1 and center 2. The findings for benign and malignant nodules for center 1 were meanEI (21.4 ± 13.9 vs. 24.2 ± 18.1; p = 0.93), maxEI (35.0 ± 23.6 vs. 37.3 ± 22.8; p = 0.86), and ratio (1.3 ± 0.9 vs. 1.8 ± 2.3; p = 0.93), and for center 2 they were meanEI (18.8 ± 10.5 vs. 13.6 ± 5.8; p = 0.23), maxEI (33.6 ± 19.7 vs. 26.1 ± 13.5; p = 0.27), and ratio (1.9 ± 4.0 vs. 0.8 ± 0.3; p = 0.13). Cases of benign and malignant nodules are presented in Figure 3.

Shear wave elastography (SWE) parameters in malignant and benign nodules.

SWE, ultrasound (US; French Thyroid Imaging Reporting and Data System [TIRADS]), and pathological data from four patients. (
PTC, papillary thyroid carcinoma; FVPTC, follicular variant of PTC; HCC, Hürthle cell carcinoma; FTC, follicular thyroid carcinoma; PDTC, poorly differentiated thyroid carcinoma.
Discussion
The results are reported of a prospective study assessing the diagnostic value of SWE in patients presenting thyroid nodules with IC, a group of patients for whom the decision making between surgery and surveillance is crucial. This study included a relatively large number of patients harboring dominant nodules with a mean size of 3 cm. Cytology was reviewed and pathology obtained for all nodules. A standardized US examination and SWE was performed very shortly before surgery. The main result is that SWE fails to discriminate between benign and malignant tumors in nodules with IC.
SWE is an ultrasound technique that has been evaluated so far in a few, mainly retrospective, clinical studies. In 2010, Sebag et al. (12) showed promising results in a heterogeneous group of 93 patients (61 with solitary and 32 with multiple nodules) of whom 79 underwent surgery. Cytology was available in only the 14 patients who did not undergo surgery and was scored as benign. The proportion of thyroid cancer was 25%, with a majority of PTC (69%) and no information about histological variants. In this study, SWE showed a sensitivity of 85% and a specificity of 94% using a cutoff of 65 kPa. Up to now, only one recent study has analyzed the relevance of SWE in patients with IC (17). In this single-center study, 35 patients presenting dominant nodules with class III (n = 16) or class IV (n = 19) cytology underwent US and SWE examinations, which were subsequently removed. Thirty-one percent of nodules were malignant with 54% of PTC, all of them FVPTC. Quite good sensitivity (82%) and specificity (88%) were also reported but using a lower cutoff of 22 kPa.
The present data contrast with the conclusions of these previous studies. The study population, mainly the proportion of pathological variants, is a key point to be considered in the analysis of elastography data. The rationale for elastography is based on the increased stiffness of malignant over benign nodules similar to the hardness on physical examination. The reason why stiffness is correlated with malignancy is not clearly understood. A recent study suggests that it could be correlated with fibrosis, which is more evident in the classic PTC variant than in the FVPTC (20). Overall, PTC tumors represent approximately 80% of malignant nodules, most of them harboring the classic variant. Those classic PTC variants are generally detected on FNAB through the finding of a malignant cytology (Bethesda class VI). As a result, classic PTC is underrepresented in nodules with IC, and in contrast, other malignant lesions such as FVPTC, FTC, or PDTC are overrepresented in this cytological category. In a recent large study on 1291 patients, the proportion of classic PTC was 51% in Bethesda class VI and only 16% in Bethesda class III, whereas FVPTC represented up to 71% of Bethesda class III nodules, and only 20% were found to be in the Bethesda class VI category (21). Similarly, in 513 patients with IC nodules, the proportion of PTC other than FVPTC (including classic PTC) was 27%, while FVPTC and FTC were found in 64% and 9% of cases, respectively (22). The pathological distribution of the study group is consistent with that of the previous series, with a low proportion of classic PTC (19%), a significant number of FVPTC (14%), FTC (14%), and HCC (14%), and probably a higher proportion of PDTC than expected (24%). Although high elasticity indexes were confirmed in the limited subgroup of four classic PTC variants, the histological distribution in IC nodules is likely the main explanation of the absence of elasticity difference between malignant and benign nodules.
This distribution also accounts for the fact that the French TIRADS scores, the EU-TIRADS scores, and the ATA classification were not associated with malignancy in the whole series of patients. The US criteria of risk of malignancy are pertinent in large scales of non-selected nodules where the classic PTC is the most frequent pathological subtype. In contrast, as suggested in a recently reported meta-analysis (23), none of the US features are able to determinate the risk of malignancy with an acceptable sensitivity in nodules with IC. FTC exhibits other differences in sonographic features compared to PTC. These tumors are more likely to be iso- to hyperechoic, non-calcified, round (width greater than anterioposterior dimension) nodules with regular smooth margins (24). Similarly, the FVPTC is also more likely to have similar sonographic findings, as observed in FTC (25). Lastly, the association observed in center 1, and not in center 2, between the French TIRADS or the ATA scores and malignant nodules could be explained by the concentration of all four classic PTC variants in center 1.
High reproducibility is considered a prerequisite for a robust diagnostic method. SWE reproducibility data are limited. Using inter- and intra-rater agreement, along with a day-to-day agreement, Swan et al. recently presented suboptimal reproducibility data (26). In the present study, the averaged intra-observer agreement in terms of ICC was good (0.79 in the nodule and 0.75 in the parenchyma). In terms of CV, the reproducibility study shows CV values ranging from 17.4% to 35.2%, depending on the region (nodule or parenchyma) and the rater. Coefficients of variation tended to be higher in the nodule than in the parenchyma, possibly related to an increased heterogeneity of the nodule. The discrepancies between rater reproducibility scores demonstrate the difficulty of selecting a representative ROI in a heterogeneous nodule. Although the protocol recommended positioning the ROI where the color map seemed to be the most homogeneous, the choice of this representative region is still subjective. In order to explore the whole nodule better and to overcome the 5 mm diameter ROI, three measurements were performed in large nodules >25 mm. Moreover, the color filling of the shear window, which should be as complete as possible, can in some situations be quite poor and unstable, especially in large and deeply located nodules. Quality parameters should be defined in order to identify acquisition failures clearly. Further imaging development is needed to analyze the nodule in its totality and to extract more relevant criteria to differentiate IC nodules.
This study also raises some issues regarding the limitations of cytology, its reproducibility, and the associated risk of malignancy. The central review confirmed the IC status in all cases, but a total inter-observer agreement in cytological sub-category analysis was found in only 59% of patients. A similarly limited inter-observer and also intra-observer reproducibility was recently reported in a large prospective study (27), confirming that the diagnosis between each IC sub-category is not unequivocal and is, in part, observer dependent. This can also affect the associated risk of malignancy, which is variable from one series to another. A recent meta-analysis in patients undergoing surgery for IC nodules reported a risk of malignancy of 27% in Bethesda class III and 31% in Bethesda class IV but with wide range of proportions (28). Altogether, the proportion of malignant tumors in the present cohort (16%) was at the low end of the expected range. The limited number of patients in each cytological IC class or histological variant was a limitation to demonstrate significant SWE or US differences between subgroups.
In conclusion, although high elasticity indexes in classic PTC variants are confirmed, SWE fails to discriminate between benign and malignant tumors in the whole group of nodules with IC. These data suggest not using the current SWE indexes to select patients for surgery in routine practice.
Footnotes
Acknowledgments
The study was supported by grants from the “Fondation de l'Avenir,” the French Society of Endocrinology and the Diaxhonit Company. We are indebted to all clinical research associates working in each center, especially Chantal Rieux, Theary Cheav, and Paul Ihout. We are also grateful to Aixplorer for designing the electronic case-report form and to Helen Lapasset for her assistance in reviewing the manuscript.
Author Disclosure Statement
H.M. has served as an expert and guest speaker for SuperSonic Imaging. The other authors have no conflicts of interest to declare.
