Abstract
Background:
Molecular diagnostic testing is increasingly used in the management of indeterminate thyroid nodules. Limited data exist regarding the influence of clinical factors on gene expression classifier (GEC) test performance. This study examined the positive and negative predictive value of GEC as stratified by nodule size.
Methods:
A prospectively maintained pathology database from a single tertiary referral center was queried from 2012 to 2015 for indeterminate thyroid nodules that underwent GEC testing. Nodule size, patient demographics, Bethesda classification, and Hürthle cell–predominant nodules (HCNs) were evaluated as predictors of GEC performance.
Results:
Two hundred and thirty-one patients with 245 indeterminate nodules were examined. Assuming all nodules to be benign unless proven malignant on histopathology, the sensitivity and specificity of GEC testing were 95.2% and 60.1%, respectively. The malignancy rate among resected nodules was 25.3%. The positive predictive value was consistent across nodule sizes: 45.5% for nodules <1 cm, 42.9% for nodules 1–1.9 cm, 36.0% for nodules 2–2.9 cm, 54.2% for nodules 3–3.9 cm, and 50.0% for nodules ≥4 cm. The negative predictive value ranged from 93.3% to 100% and was not affected by nodule size. HCNs had a high rate of GEC suspicious results (77.4% vs. 50.5% for nodules without Hürthle cell predominance, p < 0.01), though this did not correspond to a difference in the rate of malignancy (25.8% vs. 25.3%).
Conclusions:
Nodule size did not affect GEC test performance in the present cohort. GEC benign results remain reliable in large nodules. GEC suspicious nodules >3 cm carry a similar risk of malignancy compared to smaller nodules, and do not warrant more aggressive treatment. GEC testing has limited clinical utility for HCNs due to the high rate of false-positive results.
Introduction
T
Gene expression classifier (GEC) testing has been validated to further characterize the risk of malignancy in indeterminate thyroid nodules (7 –14). For nodules classified into Bethesda categories III or IV, the negative predictive value of a benign GEC result is 94%, while the positive predictive value of a suspicious result is approximately 40% (12). Hence, patients with GEC benign results may be observed, while those with GEC suspicious results generally proceed to diagnostic thyroid lobectomy.
To date, little is known regarding the effect of clinical factors on the pre- and post-test probabilities of malignancy for nodules undergoing GEC testing. Clinical factors such as nodule size, patient demographics, and cytologic features have the potential to interact with GEC performance, rendering the test more or less useful in certain subpopulations. There is a concern regarding larger nodules in particular, where GEC testing may have a lower negative predictive value due to sampling error (9). Alternatively, if the malignancy rate is very high in larger nodules that are GEC suspicious, total thyroidectomy may be considered as initial treatment rather than diagnostic lobectomy. The objective of this study was to characterize further the effect of nodule and patient characteristics on the performance of GEC testing in indeterminate thyroid nodules, with the hypothesis that the positive predictive value of GEC is increased among larger nodules.
Methods
Study design
Following Institutional Review Board approval, a prospectively maintained database of patients with cytologically indeterminate thyroid nodules who underwent GEC testing within the University of California, Los Angeles Medical Center between September 2012 and February 2015 was evaluated. All pathology reports followed the Bethesda System for Reporting Thyroid Cytopathology (4).
Patients
Patients were included if they had FNA results of Bethesda category III (atypia of undetermined significance or follicular lesion of undetermined significance [AUS/FLUS]) or IV (suspicious for follicular neoplasm or follicular neoplasm [SFN/FN]). GEC testing of thyroid nodule aspirates was ordered routinely by the pathology department as a reflex test for Bethesda category III and IV cytologic results. Patients who did not undergo surgery were recommended to have a surveillance ultrasound in six months. Patient outcomes were determined by review of patient electronic medical records, updated February 2016. Nodules were considered to be stable if there was <20% change in the longest dimension on subsequent imaging studies. Additionally, the pathology database was queried for the total number of thyroid FNAs and breakdown of Bethesda category results during the study period.
Patients who did not have documentation of follow-up in the electronic medical record following thyroid FNA were considered lost to follow-up and were excluded from the study. Patients with “insufficient result” on GEC testing were also excluded from analysis.
Cytopathology and GEC testing
GEC reporting included the location of the tested nodule in one of the following sectors of the thyroid: left superior, left mid, left inferior, right superior, right mid, or right inferior. When histopathology revealed papillary thyroid microcarcinoma (i.e., ≥ 1 focus of papillary thyroid carcinoma [PTC] measuring <1 cm in diameter), the index thyroid nodule was considered malignant only if the malignancy was found in the same thyroid sector where the FNA was performed.
Nodules were classified as Hürthle cell–predominant (HCNs) based upon Bethesda guidelines (4). For AUS/FLUS lesions, this was defined as “a moderately or markedly cellular sample composed of a virtually exclusive population of Hürthle cells, yet the clinical setting suggests a benign Hürthle cell nodule” or “there is a predominance of Hürthle cells in a sparsely cellular aspirate with scant colloid.” For SFN/FN lesions, this was defined as moderate to markedly cellular samples “consisting exclusively (or almost exclusively) of Hürthle cells.” Additional cytomorphologic features also needed to be seen, such as crowded, syncytial-like arrangements, scant to no colloid, the absence of a lymphoid background, and/or presence of transgressing vessels. A lesion was classified as HCN if the following pathology comments were noted: predominant Hürthloid cell changes, predominance of Hürthle cells, and suspicion for Hürthle cell neoplasm. Each Hürthle cell subgroup was analyzed separately, and the results were the same. Therefore, they were combined into a single group for data analysis.
The pathology specimens were blindly reviewed by an independent second pathologist without knowledge of the patient's clinical background to determine the presence of Hürthle cell predominance. Two different criteria were used when labeling nodules as Hürthle cell predominant: (i) lesions diagnosed as Hürthle cell predominant initially, and (ii) only lesions diagnosed as Hürthle cell predominant as agreed by the initial pathology as well as a second independent pathologist. Of the 245 thyroid nodules included in the study, 228 were reviewed. The remaining specimens were not available for review; many unavailable cases were sent to the authors' institution for consultation and returned to an original outside facility. Since the results did not change significantly using the second criterion for Hürthle cell predominance, the results were reported using the first criterion.
Predictors and outcome variables
Variables analyzed included age, sex, most recent thyrotropin (TSH) within 12 months of thyroid FNA, largest nodule diameter reported by preoperative neck imaging, presence of Hürthle cell predominance on FNA cytology, Bethesda diagnostic category, and GEC test result. The initial recommendation for management based upon GEC test results, whether the patient underwent subsequent surgical resection, and presence of malignancy on final pathology in the index nodule were also recorded. Expected sensitivities, specificities, and negative predictive values were calculated as a range of possible values from an assumed false negative rate of 7% to an “ideal” false negative rate based upon the assumption that nodules are benign until proven to be malignant.
Statistical analysis
Patient demographic data and thyroid nodule characteristics are reported as means ± standard deviations or median ± interquartile range for continuous variables, and frequencies and percentages for categorical variables. The association between categorical variables was assessed using chi-square tests or Fisher's exact tests. Univariate and multivariable logistic regressions were performed to evaluate the association between nodule size and malignancy rate for GEC suspicious nodules and GEC benign nodules separately. Multi-variable regression analyses included adjustments for age, sex, TSH (log-transformed), nodule size, Bethesda diagnostic category, and presence of Hürthle cell predominance on FNA. The associations between study variables and malignancy rate were reported as odds ratios (OR).
In all analyses, nodule size was analyzed as a dichotomous variable using size <3 cm versus ≥3 cm, as a dichotomous variable using a 4 cm cutoff, and as a continuous variable. The malignancy rate of indeterminate thyroid nodules was determined by dividing the number of histopathologically malignant lesions by the total number of thyroid nodules. Nodules that were not surgically resected were classified as benign.
A power analysis was performed to confirm that the sample size was sufficient to detect a clinically significant difference in malignancy rate between subgroups of nodule size. A p-value of <0.05 was considered statistically significant. Analyses were performed using SAS v9.3 (SAS Institute, Inc., Cary, NC).
Results
There were 2382 thyroid FNAs performed at the authors' institution during the study period. The breakdown of Bethesda category results is depicted in Figure 1. Of these, there were 245 indeterminate thyroid nodules in 230 patients that made up the final study cohort. Patient demographics and nodule characteristics are reported in Table 1. There were significantly more thyroid nodules in female patients (n = 196) than there were in male patients (n = 49). The majority of patients had Bethesda category III nodules (n = 217) compared with category IV nodules (n = 28).

Patient flow chart, utilization of Bethesda diagnostic category. AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance; SFN/FN, suspicious for follicular neoplasm/follicular neoplasm; SFM, suspicious for malignancy.
N = 219 patients.
SD, standard deviation; TSH, thyrotropin; IQR, interquartile range; GEC, gene expression classifier; FNA, fine-needle aspiration.
The result of GEC testing was suspicious in 132 cases (53.8%) and benign in 113 (46.2%). Clinical outcome (observation vs. surgery) and rate of malignancy based on GEC test result is reported in Figure 2A and B (Bethesda category III and Bethesda category IV, respectively). Among the 92 thyroid nodules that had a benign GEC result and did not undergo initial surgery, 61 (66.3%) patients had a follow-up ultrasound. Eight (13.2%) patients had an interval increase in size of the index nodule, where two of the eight underwent subsequent surgery; both had malignant disease. The remaining six patients with an increase in nodule size either continued observation or have not yet undergone surgery. Six (9.8%) patients had an interval decrease in size, where only one underwent surgery and was found to have benign disease. The remaining 47 (77%) had stable nodules on ultrasound, and none underwent surgery.

(
Among the three thyroid nodules that were GEC benign and found to contain malignant foci on final pathology, one patient underwent surgery because of large tumor size (3.5 cm), one patient underwent surgery due to patient preference, and one patient sought a second opinion at a nearby hospital that recommended surgery. These patients were found to have a papillary microcarcinoma, follicular variant of PTC, and classic PTC, respectively.
The reasons that GEC suspicious thyroid nodules did not undergo surgery (n = 28) included patient preference (n = 18), surgery recommended but patient had not undergone surgery after more than six months (n = 4), prioritization of management of other malignancy (n = 2), no follow-up with endocrine surgery (n = 1), psychiatric issues precluding surgery (n = 1), excessive operative risk due to comorbidities (n = 1), and repeat GEC test was benign (n = 1). The reasons that GEC benign thyroid nodules underwent surgery (n = 21) included clinical indications such as large size, rapid growth, or compressive symptoms (n = 10), patient preference (n = 6), surgery performed at outside hospital (n = 3), and contralateral GEC suspicious nodule (n = 2).
The sensitivity and specificity of GEC testing for all indeterminate nodules were 95.2% and 60.1%, respectively. The malignancy rate among all indeterminate nodules was 25.6%. The malignancy rate among GEC benign or GEC suspicious lesions was not significantly affected by patient age, sex, nodule size, or pre-biopsy TSH. Expected GEC test performance and malignancy rate by subgroup are reported in Table 2. The positive predictive value was consistent across nodule size (from 45.5% for nodules <1 cm to 50.0% for nodules >4 cm; p = 0.29). The negative predictive value ranged from 93.3% to 100% and was similarly unaffected by nodule size (p = 0.69). Of note, HCNs had a higher rate of GEC suspicious results compared with nodules without Hürthle cell predominance (77.4% vs. 54.0%; p < 0.01), but ultimately had a similar rate of malignancy (22.6% vs. 25.6%). Blinded review of the pathology slides for Hürthle cell predominance by a second independent pathologist had an 87.7% inter-rater concordance rate. Out of 28 discordant diagnoses, the second pathologist interpreted the case as Hürthle cell predominant in 22 (78.6%) cases.
SEN, SPC, and NPV results range from a previously reported false negative rate of 7% to an “ideal” false negative rate (all patients assumed to have benign disease until proven to be malignant by histopathology).
SEN, sensitivity; SPC, specificity; PPV, positive predictive value; NPV, negative predictive value.
In univariate analysis, nodule size did not predict malignancy in patients with GEC suspicious (OR = 1.13 [confidence interval (CI) 0.92–1.40]; p = 0.25) or GEC benign nodules (OR = 1.52 [CI 0.83–2.79]; p = 0.18). In multi-variable regression analysis, controlling for age, sex, TSH, and Bethesda diagnostic category, nodule size was not a significant predictor of malignancy in GEC suspicious nodules (Tables 3 –5). The presence of Hürthle cell predominance was associated with a significantly decreased risk of malignancy in multi-variable regression analysis of GEC suspicious nodules (Tables 3 –5). The multi-variable regression analysis among GEC benign nodules is not shown because low frequencies of male patients, Bethesda category IV lesions, and HCNs made the model unstable. In these analyses, nodule size was not associated with a significantly increased risk of malignancy.
Statistically significant values are shown in bold.
Log-transformed
OR, odds ratio; CI, confidence interval.
Statistically significant values are shown in bold.
Log-transformed.
Statistically significant values are shown in bold.
Log-transformed.
Power analysis showed that the number of patients in this study was adequate to detect an absolute difference in malignancy rate of ≥25% between study subgroups, with 80% power and alpha set at 0.05.
Discussion
This analysis of 245 indeterminate thyroid nodules demonstrated that nodule size did not predict malignancy in GEC benign or GEC suspicious thyroid nodules. Large nodules (i.e., those ≥3 cm in diameter and those ≥4 cm in diameter that were GEC suspicious) did not carry an increased risk of malignancy compared to smaller nodules. Likewise, large nodules that were GEC benign did not carry an increased risk of malignancy compared to smaller nodules. These findings held true when size was assessed as a continuous variable as an alternative to using the aforementioned size thresholds. The only clinical factor that influenced ideal GEC test performance was Hürthle cell predominance, which was associated with an increased rate of GEC suspicious results despite a relatively low malignancy rate.
There were two initial reasons to examine the influence of nodule size on GEC test performance: (i) a concern for sampling error in large nodules leading to false negative results, and (ii) published reports demonstrating an increased prevalence of widely invasive and/or aggressive carcinomas among large follicular thyroid nodules (15 –17). The above points have been examined in the literature prior to the advent of molecular profiling for thyroid nodules, but not after its introduction, leading to two clinical questions. First, can large nodules that are GEC benign be safely observed? Second, should patients with large nodules that are GEC suspicious proceed directly to total thyroidectomy?
In the “pre-molecular profiling” era, several reports demonstrated a high rate of false negative cytology results in large thyroid nodules (18 –21). Similarly, some authors reported that thyroid nodules >4 cm with indeterminate cytology carry a risk of malignancy as high as 40% compared with 5–30% in indeterminate nodules overall (22,23). Both of these observations have been attributed to sampling error within heterogeneous nodules.
It is important to note that the above problems with sampling error in large nodules have not been consistently reported. Several recent large series have shown that benign cytology results remain reliable in large nodules (24 –26), and that large cytologically indeterminate nodules do not carry a significantly increased risk of malignancy compared to smaller nodules (27 –31). The present work found that GEC results remained reliable across all nodule sizes. This may be attributable to the fact that all thyroid FNAs within the health system are performed under ultrasound guidance, permitting sampling of the highest yield area(s) within a large, potentially heterogeneous nodule. It is also possible that the GEC test is intrinsically less vulnerable to sampling error, as nucleic acid amplification may permit detection of abnormal transcripts present in small quantities.
The other important finding from the current study is that GEC testing has decreased specificity in HCNs. While the majority test GEC suspicious, the final surgical pathology is more often benign. This confirms the recent study by Brauner et al., which evaluated GEC test performance in 72 patients with HCNs and found a specificity of only 7.5% (32). They reported that a suspicious GEC result does not increase the probability of malignancy in HCNs. In both studies, a GEC benign result was almost always a true negative and could allow patients to avoid surgery. However, this does not impact the management of most patients with HCNs, as the GEC results are usually suspicious. The cost-effectiveness of routine GEC testing has been debated (33). It is impacted significantly by the underlying malignancy rate at any given institution and the subsequent rate of GEC benign results (34). Since the rate of GEC benign results in HCNs is low, routine GEC testing of HCNs is very unlikely to be cost-effective and should likely not be performed from the standpoint of population health management.
There are several limitations to this study. Common to many studies examining GEC testing, the majority of nodules that tested GEC benign were not surgically resected. It was assumed that these nodules were truly benign, which may underestimate the number of false-negative tests. This assumption is supported by a recent study that reported nodules with indeterminate cytology and GEC benign results had a similar low rate of nodule growth compared to cytologically benign nodules (35), In addition, all patients were gathered from a single institution experience, which limits the generalizability of the findings given the known inter-institution variability in GEC test performance (10,36). Lastly, because this study was powered to detect a 25% difference in malignancy rate among subgroups based on nodule size, larger studies will be required to detect smaller differences in GEC performance.
In conclusion, nodule size did not affect GEC test performance in this cohort. GEC benign results remain reliable in large nodules. GEC suspicious nodules >3 cm carry a similar risk of malignancy to smaller nodules, and do not warrant more aggressive treatment. GEC testing has limited clinical utility for Hürthle cell–predominant nodules due to the high rate of false-positive results.
Footnotes
Acknowledgments
Dr. James Wu received funding from the H. H. Lee Research Award.
Author Disclosure Statement
Dr. Michael Yeh is a consultant for Veracyte, Inc. The other authors have no conflicts of interest to disclose.
