Abstract
Background:
Molecular testing (MT) has become standard practice to more accurately rule out malignancy in indeterminate Bethesda III (BIII) thyroid lesions. We sought to assess the adoption of this technology and its impact on cytology reporting, malignancy yield, and rates of surgery across community and academic sites affiliated with a tertiary medical center.
Methods:
We performed a retrospective cross-sectional study including all fine-needle aspirations (FNAs) analyzed at our institution from 2017 to 2021. We analyzed trends in MT utilization by platform and by community or academic site. We compared BIII call rates, MT utilization rates, rates of subsequent surgery, and malignancy yield on final pathology before and after MT became readily available using chi-square analysis and linear regression.
Results:
A total of 8960 FNAs were analyzed at our institution from 2017 to 2021. There was broad adoption of MT across both community and academic sites. There was a significant increase in both the BIII rate and the utilization of MT between the pre- and post-MT periods (p < 0.001 and p < 0.001). There was no significant change in the the malignancy yield on final pathology (57.1% vs. 50.0%, p = 0.347), while the positive predictive value of MT decreased from 85% to 50% (p = 0.008 [confidence interval 9.5–52.5% decrease]).
Conclusions:
The use of MT increased across the institution over the study period, with the largest increase seen after a dedicated pass for MT was routinely collected. This increased availability of MT may have led to an unintended increase in the rates of BIII lesions, MT utilization, and surgery for benign nodules. Physicians who use MT should be aware of potential consequences of its adoption to appropriately counsel patients.
Introduction
The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) was introduced in 2009 to standardize terminology and the reporting of results of thyroid fine-needle aspiration (FNA) specimens. 1 Its utility in predicting malignancy has since been validated by multiple studies. One large meta-analysis analyzing results from more than 6000 FNAs in patients who later underwent surgical excision found the TBSRTC to have a sensitivity and specificity of 97% and 50.7%, respectively, as well as a positive predictive value (PPV) and negative predictive value (NPV) of 55.9% and 96.3%, respectively. A robust correlation was observed between the diagnostic category and histological follow-up, except in the case of Bethesda III (BIII) lesions, defined as atypia of undetermined significance/follicular lesion of undetermined significance (AUS/FLUS). The authors noted a significant decrease in the PPV and diagnostic accuracy of the TBSRTC when these lesions were included in the analysis. 2
The difficulty of predicting malignancy risk in the BIII category has led to many thyroidectomies being performed on benign lesions. According to the original TBSRTC meta-analysis, 84.1% of surgically excised BIII lesions ultimately had benign final pathology. 2 To decrease the rate of unnecessary surgery, molecular testing (MT) of FNA samples gained popularity to more effectively rule out malignancy among BIII nodules. Over the past decade, MT for BIII lesions has entered into standard practice and is recommended in the 2015 American Thyroid Association guidelines as a diagnostic alternative or adjunct to repeat biopsy after consideration of “worrisome clinical and sonographic features.” 3 Studies assessing the performance of various MT platforms have reported NPV of 86.9–97%. 4 –6 Negative MT results in BIII nodules putatively prevent unnecessary thyroidectomies and decrease costs. 7
While the performance of these testing platforms has been well described, their adoption in various settings over time is less understood. The Columbia University Irving Medical Center (CUIMC) Cytopathology Laboratory offers the full spectrum of commercially available MT platforms and receives FNA specimens from both academic and community-affiliated settings. Given these circumstances, we sought to evaluate our experience in terms of the adoption of MT and the practice patterns of academic versus affiliated community sites.
In addition, recent studies report an increase in BIII call rates following changes in institutional protocols. 8,9 While MT platforms are available at CUIMC since 2017, one important change occurred in 2020 as a result of the SARS-CoV-2 coronavirus pandemic: all patients at CUIMC underwent a dedicated pass for MT at the time of initial FNA to obviate the need for a repeat procedure (and thus, potential SARS-CoV-2 exposure) if MT was deemed necessary. This made MT readily available for all FNA specimens, whereas previously MT was often collected at CUIMC on a selective basis (e.g., if there was a high suspicion for malignancy based on clinical history, sonographic characteristics, or the on-site analysis) or at a second biopsy. We therefore sought to assess whether the widespread availability of MT has impacted atypia rates, rates of surgery, and the predictive values of the various platforms at our institution. A portion of this work was presented at the 2023 American Thyroid Association annual meeting in Washington, DC. 10
Materials and Methods
We performed a retrospective cross-sectional study including all patients who underwent FNA at CUIMC or at an affiliated community institution that sent FNA specimens for cytological analysis at CUIMC, from 2017 to 2021. MT platforms included an in-house next-generation sequencing (NGS) platform as well as commercially available platforms: Afirma (GEC, GSC ±XA), RosettaGX, ThyraMIR/ThyGeNEXT, and Thyroseq (V1, V2, V3) (Supplementary Data S1). We analyzed trends in MT usage over time (all platforms aggregated), as well as adoption of various platforms across CUIMC and community affiliates using descriptive statistics.
As recommended by the 2015 American Thyroid Association Guidelines, all patients included in the study underwent a comprehensive, high-resolution neck ultrasound. This ultrasound included evaluation of the thyroid parenchyma, gland size, the presence, size (in three dimensions) and characteristics of any nodules, and the presence or absence of any suspicious cervical lymph nodes in the central or lateral compartments. More specifically, the description of the nodules included echogenicity, composition (solid, cystic, spongiform), margins (smooth, lobulated, ill defined), presence and type of calcifications, shape, and vascularity. The risk of malignancy as well as the size of the nodule determined the need for FNA biopsy based on either the 2015 American Thyroid Association Guidelines or the American College of Radiology Thyroid Imaging and Reporting Data System (ACR TI RADS) criteria. 3,11 All ultrasounds were performed by board-certified radiologists and/or an Association for Medical Ultrasound (AIUM)-accredited provider.
The decision to collect a dedicated pass for MT was made by the treating physician performing the FNA, based on their assessment of the patient's risk factors and overall clinical picture. Given that the cytopathology department offered all of the commercial platforms, Afirma (GEC, GSC ±XA), RosettaGX, ThyraMIR/ThyGeNEXT, and ThyroSeq (V1, V2, V3), the choice of testing platform was left to the treating physician's preference.
In the pre-MT period, if a patient had a BIII result but did not have a dedicated pass for MT collected at the time of the biopsy, they would undergo a second biopsy with collection of a dedicated pass for MT and repeat cytological examination (Supplementary Data S1). If the patient required a second FNA and the cytology was downgraded from a BIII to a BII, the dedicated pass for MT was not submitted. In the post-MT period, a dedicated pass was collected at the first biopsy. If the cytology was determined to be a BIII lesion, then the cytopathology laboratory would inform the treating physician and submit the MT that was collected by the treating physician. If the cytology was a BI, BII, or BVI, the dedicated pass for MT was not submitted.
To determine whether the increase in availability of MT in 2020 due to the SARS-CoV-2 pandemic led to an increase in atypia rates, we divided the study period into two groups with 2017–2019 as the “pre-dedicated pass” group and 2020–2021 as the “post-dedicated pass” group. We compared BIII atypia rates with rates of subsequent MT, surgery, and malignancy on final pathology before and after a dedicated pass for MT was readily available using chi-square analysis. Finally, we performed linear regression to assess the change in rates of each diagnostic category over time.
At CUIMC, patients who had positive MT results, including the presence of mutations such as BRAF or RAS, were routinely recommended to proceed with surgery. Calculation of MT predictive values was restricted to patients with positive MT results who subsequently underwent surgery and had surgical pathology available for comparison. Patients who had positive MT but did not have surgical pathology available were excluded from the calculation of predictive values. We chose not to calculate NPVs in our data as we did not have histological data on patients who had negative MT results as those patients were not typically operated on.
To examine the impact of experience on BIII call rates, cytopathologists were separated into two groups. For the purposes of this study, cytopathologists who had read greater than 500 FNAs were considered more experienced, while those who had read less than 500 FNAs were considered less experienced. We compared BIII call rates of the more and less experienced groups over the entire study period, as well as the change in call rates of just the more experienced cytopathologists in the pre- and post-dedicated pass periods, using chi-square analysis.
This study received institutional review board approval, protocol #AAAD4780.
Results
A total of 8960 FNAs were analyzed at our institution from 2017 to 2021 (Fig. 1). Of the 8960 FNAs, 1071 (12.0%) were categorized as BIII, of which 577 (53.9%) came from CUIMC and 494 (46.1%) from community-affiliated sites. The trends of MT utilization among CUIMC and community-affiliated institutions during the study period are demonstrated in Figure 2. Pre-dedicated pass, the rates of MT on BIII lesions submitted by community-affiliated institutions were higher compared with CUIMC (77.1% vs. 41.9%, p ≤ 0.01). However, post-dedicated pass, the use of MT for BIII lesions at CUIMC rose steadily from 2018 until the year 2020, and subsequent rates of MT at CUIMC and community-affiliated institutions were not significantly different (82.6% vs. 83.5%, p = 0.92) (Fig. 2). The sharpest annual increase in MT use at CUIMC occurred from 2020 to 2021 (69.5–91.2%), after the introduction of a dedicated pass for MT.

Specimen profile.

CUIMC versus community-affiliated institutions MT rates among BIII nodules by year. BIII, Bethesda III; CUIMC, Columbia University Irving Medical Center; MT, molecular testing.
We further analyzed MT adoption across CUIMC and community-affiliated instututions by MT platform (Fig. 3a, b). Community affiliates primarily used the ThyraMIR/ThyGeNEXT platform during the entire study period. Pre-dedicated pass, CUIMC primarily used the NGS platform. Post-dedicated pass, starting in 2020, ThyraMIR/ThyGeNEXT and Afirma platforms were increasingly used, and by 2021, the NGS was entirely replaced by the commercial platforms. The use of ThyroSeq increased among both CUIMC and community affiliates after 2020, although it was utilized less frequently than ThyraMIR/ThyGeNEXT and Afirma.

MT platform use from community-affiliated institutions
Cytology results including both CUIMC and community-affiliated institutions are summarized in Table 1 and Figure 4. The BIII atypia rate increased over every year studied from 7.6% in 2017 to 18.2% in 2021. The sharpest increase occurred from 2019 to 2020, the first year of the “post-dedicated pass” period, increasing from 9.0% to 14.0%. Linear regression of call rates for each diagnostic category over the years studied demonstrated a decrease in the Bethesda II (BII) call rate (p = 0.04), an increase in the BIII call rate (p = 0.02), and no significant change in call rates of the remaining diagnostic categories.

Frequency of cytology categories by year.
Yearly Diagnostic Categories
Bold values emphasize that the percentage of BII results decreased each year from 2017 to 2021 while the percentage of BIII results increased, both were significant.
AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance.
Twelve cytopathologists interpreted FNA specimens at CUIMC during the study period. One cytopathologist, who read 644 specimens, was excluded from this portion of the analysis as they left CUIMC and did not read any specimens in the post-dedicated pass period. Of the 11 cytopathologists who were included, 7 were considered more experienced (read >500 FNA specimens) and 4 were considered less experienced (read <500 FNA speciments). Overall, less experienced cytopathologists had higher BIII call rates than their more experienced colleagues (17.2% vs. 12.0%, p < 0.01). However, the BIII call rate still significantly increased between the pre- and post-dedicated pass periods even among more experienced cytopathologists (8.5% vs. 16.5%, n = 3717 and 3530 respectively, p < 0.01).
Utilization of MT for BIII lesions increased dramatically over the period studied, from 46.5% in 2017 to 91.7% in 2021. Comparing the pre- and post-dedicated pass periods, there was an overall increase in the BIII call rate (8.2% vs. 16.4%, p < 0.001), as well as the rates of MT per BIII nodule (52.4% vs. 83.1%, p < 0.001) (Table 2). We did find that the rates of surgery performed on BIII lesions decreased (22.9–12.2%, p < 0.001). However, we also noted no differences in positive rates for MT among BIII nodules (23.6–20.4%, p = 0.053) or in the malignancy yield among patients with BIII lesions who underwent surgery (57.1–50.0%, p = 0.347). In patients with positive MT results that underwent surgery, there was a significant decrease in the PPV of MT, from 85.0 to 50.0% (p = 0.008 [confidence interval 9.5–52.5% decrease]) (Table 2). While the proportion of BIII lesions which underwent surgery decreased, the absolute number of surgeries performed on BIII nodules per FNA remained unchanged (1.9% vs. 2.0%, p = 0.66, n = 4820 and 4100, respectively).
Comparison of Pre- and Post-Dedicated Molecular Testing Periods
p < 0.05 indicates statistical significance.
BIII, Bethesda III; FNA, fine-needle aspiration; MT, molecular testing; PPV, positive predictive value.
Discussion
Our study examines the adoption of MT in academic and community settings and highlights potential downstream effects of implementing routine MT in BIII nodules. We found that routinely taking a dedicated pass for MT increased MT utilization as well as impacted the choice of MT platform utilized. Although increased use of MT was associated with decreased rates of thyroidectomy for BIII lesions, counterintuitively, PPV for malignancy in resected BIII nodules decreased over time. Our results suggest one potential explanation could be that routine practice of a dedicated pass for MT may lead to a general migration of some nodules from the BII to BIII category, even among experienced cytopathologists.
In the past decade, MT has entered into standard practice as a way to more accurately rule out malignancy in BIII lesions. 3 Our data demonstrate a broad adoption of this technology across multiple practice environments, with community affiliate sites initially utilizing MT at a greater rate than CUIMC for BIII lesions. Utilization of MT began to accelerate after 2020, potentially due to more providers obtaining a dedicated pass for MT during initial FNA in response to the SARS-CoV-2 pandemic. This shift from selective to widespread availability of samples for MT may help explain this increased usage. We noted variability in terms of MT platforms utilized among both CUIMC and community-affiliated sites, potentially owing to variations in familiarity with MT on the part of clinicians performing FNAs.
Because ThyroSeq and Afirma require a dedicated pass, the clinician performing the FNA must consciously incorporate MT specimen collection into their workflow, whereas ThyraMIR/ThyGeNEXT can be performed off of the cytopathology slides, and pathologists can order the test regardless of whether a dedicated pass was taken or not. This is likely why ThyraMIR/ThyGeNEXT was popular early in the study period. Other contributing factors affecting MT platform choice include provider preference, the expansion of each platform test at different times, and the level of detail available regarding the presence of mutations/fusions.
The performance of these MT platforms is well documented. A 2012 study by Alexander et al. assessed the performance of Afirma's Gene Expression Classifier (GEC) in identifying malignancy in 265 indeterminate thyroid nodules, which had corresponding histopathological specimens. Of the 265 total indeterminate nodules, 85 were malignant, 78 of which were correctly identified by the GEC. Analysis of the 7 false-negative aspirates found 6 of them to have insufficient sampling. The NPV for BIII lesions was 95%. 4 A validation prospective multicenter study in 2018 after the introduction of Afirma's Genomic Sequence Classifier (GSC) demonstrated a similarly high NPV of 96%. 12 Similarly, the multicenter validation study in 2019 by Steward et al. for the most recent version of ThyroSeq v3 reported an NPV of 97% for indeterminate thyroid nodules. 5 Finally, the multicenter study by Lupo et al. for ThyraMIR/ThyGeNEXT demonstrated an NPV of 97%. 6
Given their effectiveness at ruling out malignancy in BIII lesions, some authors have advocated for a more conservative approach with greater utilization of this technology to potentially spare more patients surgery. 13 There are compelling cost-effectiveness data to support this perspective, including a study by Li et al. which found a reduction in mean discounted cost of $1435 per patient using the molecular test with equivalent quality-adjusted life years (QALY). Their model indicated that 57% of patients with benign nodules underwent surgery without MT. With routine application of the molecular test, this number fell to 14%. 7 The prospect of avoiding surgery for benign nodules makes this technology an attractive option from both a patient morbidity as well as cost-effectiveness perspective.
However, a number of real-world studies challenge the conclusions of this model, which assumes an MT sensitivity of 0.91, specificity of 0.75, and a malignancy prevalence of 0.3. Under such parameters, roughly 56% of MT results would be benign and theoretically lead to avoidance of surgery. However, the much lower rate of benign MT results demonstrated by several studies (9–41%) suggest that the cost of routine MT may be greater than expected due to greater numbers of tests needed to avoid one surgery. 4,14 –16 In other words, the prevalence of malignancy in many real-world populations does not reflect that of Li et al. model. Furthermore, Noureldine et al. found that MT did not affect surgical decision-making in 91.6% of patients. 16
The issues around MT are further compounded by the observed interinstitutional variability of predictive values, which depend on each institution's inherent prevalence of malignancy. 17 An understudied phenomenon which may artificially dilute the institutional prevalence of malignancy is the inadvertent inclusion of benign nodules, which may have otherwise been called as BII, among the population of BIII lesions undergoing MT. Despite the TBSRTC recommending that BIII should be used as a category of last resort, limited to ∼7% or less of all thyroid FNAs, 1 wide variations have been observed in BIII call frequency, from 3% to as high as 51.6%. 8,9,18 The inadvertent inclusion of more benign lesions in the testing pool may decrease the PPV, as we noted in our experience, and create considerable risk of false positives. There are several important reasons why this phenomenon may be occurring.
First, significant interobserver variability has been found among cytopathologists, with highest rates of discordance among the BIII category. 19 Interestingly, Cibas et al. found greater specificity among pathologists specialized in thyroid cytopathology versus community general pathologists. The specialized cytopathologists were more likely to call lesions benign while avoiding an increase in the false-positive rate. 19 This is consistent with our data, with higher BIII call rates among cytopathologists who have read less than 500 FNAs at our institution. However, there was still a significant increase in BIII call rates in the post-dedicated pass period even in experienced cytopathologists.
Second, institutional BIII call rates are not immune to influence from external factors such as changes in institutional protocol or the introduction of new technologies. For example, Sacks et al. observed a significant increase in BIII calls after Afirma became available at their institution (10.7% vs. 13.4%, p < 0.005). 8 Interestingly, they did not observe a decrease in the rate of surgery (37.7% vs. 45.1%, p = 0.11), nor did they observe a change in the rate of malignancy on final pathology for those BIII patients who underwent surgery (25.3% vs. 36.0%, p = 0.12). 8 In our experience, we similarly noted an increase in the BIII call rate from 8.2% in the pre-MT period to 16.4% in the post-MT period (p < 0.001). While we did observe a significant decrease in the rate of surgery for BIII lesions, we similarly did not observe a significant change in the rate of malignancy on final pathology.
Another example of external influence on BIII call rates comes from Ramonell et al., who observed a decline in the proportion of BII (benign) calls (43.1% vs. 21%, p < 0.001) and an increase in the proportion of BIII calls (28.3% vs. 47.7%, p = 0.002) after the institutional implementation of the Thyroid Imaging Reporting and Data System (TIRADS). 9 In addition, the authors performed linear regression to assess the change in yearly rates of each diagnostic category and found no changes in the yearly rates of nondiagnostic (BI), follicular neoplasm/suspicious for follicular neoplasm (BIV), and malignant (BVI) cytology results. They did, however, observe an average yearly increase in the BIII call rate of 4.8% per year (p < 0.001) and a complimentary decrease in the average yearly BII call rate of 4.4% (p < 0.001), suggesting migration of BII lesions into the BIII category. Our own data tell a similar story when analyzed via linear regression, with an increase in BIII calls (p = 0.02), a complimentary decrease in BII calls (p = 0.04), and no changes in the remaining diagnostic categories.
Ramonell et al. expected that overcalling of BIIIs would result in a concomitant decrease in the positivity rate of their MT due to the inclusion of more benign nodules, which they did not observe (17.7% vs. 18.4%, n = 378). 9 We similarly did not observe a statistically significant decrease in positive MT results among BIII nodules (23.6% vs. 20%, n = 634, p = 0.053). As mentioned earlier, this theory that the higher rate of BIII lesions is due to inclusion of more true benign lesions is further supported by the significant decrease in PPV among the BIII lesions that underwent surgery that we observed.
Our study has several limitations. First, our data reflect the experience of a single academic, tertiary care institution, and their generalizability to other institutions is uncertain. Second, follow-up is not available for all patients in our study. In particular, some patients with positive MT results never received surgery, and there is no follow-up to indicate whether this represents a case of patient choice or if the patient received surgery at a different institution. Multiple testing platforms were used in our study, which, although perhaps more representative of “real world” clinical practice, also makes it difficult to draw conclusions about any one test in particular.
In addition, due to the retrospective nature of our study, we were unable to comment on the reporting system used for thyroid nodule assessment in individual community practices. Finally, we attributed the increase in the use of MT in 2020 and beyond to a large shift in clinical practice of collecting a dedicated pass for MT at the initial biopsy. However, some providers had been collecting a dedicated pass before 2020, and some level of MT was occurring over all the years studied. Despite these limitations, we do believe that our data provide important insight to the adoption of MT and potential downstream effects of increased adoption that has not been highlighted previously.
In summary, our data demonstrate broad adoption of MT in both the community and academic setting. Since MT became readily available, the BIII call rate has doubled, and nearly all BIII lesions undergo MT. Despite this, the PPV of MT has decreased in our experience. Few would dispute the impressive ability of these tests to rule out malignancy with NPVs ranging from 86.9% to 97%, 4 –6 but with more MT being performed on benign lesions for which it has not been validated, false positives will occur. Benign lesions may contain many of the same point mutations, which are routinely tested for by MT platforms, such as RAS and BRAF. 20 As a consequence, physicians and patients will be confronted with difficult decisions when receiving positive MT results. Our data suggest that inadvertently including benign lesions in the BIII category may negatively affect the performance of MT and possibly negate its ability to prevent unnecessary surgery.
It is imperative that physicians who use MT in clinical practice have an in-depth understanding of its original intention to avoid unnecessary surgery, its limitations, and how to appropriately interpret the results to maximize shared decision-making with patients. It is worth noting that that the most recent 2023 update of the Bethesda System now suggests surveillance as a reasonable management strategy for BIII lesions, whereas in 2017, repeat FNA, MT, or surgery were options. 1,21 Broader adoption of a surveillance only approach, taking into account patient-specific factors and preferences, with a more selective reliance on MT, may help to maintain the utility of MT while preventing negative downstream effects of widespread use. This, in addition to quality improvement initiatives, examining institutional BIII call rates is the framework we will adopt going forward as a result of our findings.
The results of our study tell an interesting but also cautionary tale: medical advancements have tremendous capability of shifting management and practice algorithms to optimize care for patients. However, if those same shifts change the setting in which the advancements were designed to be used in, the utility of those advancements could be diminished or negated, and we can end up back where we started.
Footnotes
Authors' Contributions
M.D.S. contributed to data collection, analysis, interpretation of results, and article preparation. E.J.K, R.L., R.V., J.A.L., J.H.K., and C.M.M. contributed to interpretation of results and article preparation. A.A. contributed to data collection
Author Disclosure Statement
J.H.K. is a consultant for Medtronic. M.D.S., E.J.K., R.L., A.A., R.V., J.A.L., and C.M.M. have no conflicts of interest to disclose.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Data S1
