Abstract
Background:
Molecular testing (MT) refines risk stratification for thyroid nodules that are indeterminate for cancer by fine needle aspiration (FNA) cytology. Criteria for selecting nodules for MT vary and remain largely untested, raising questions about the best strategy for maximizing the usefulness of MT while minimizing the harms of overtesting. We used a unique data set to examine the effects of repeat FNA cytology-based criteria for MT on management decisions and nodule outcomes.
Methods:
This was a study of adults (age 25–90 years; 281 women and 72 men) with cytologically indeterminate (Bethesda III/IV) thyroid nodules who underwent repeat FNA biopsy and Afirma Gene Expression Classifier (GEC) testing (N = 363 nodules from 353 patients) between June 2013 and October 2017 at a single institution, with follow-up data collected until December 2019. Subgroup analysis was performed based on classification of repeat FNA cytology. Outcomes of GEC testing, clinical/sonographic surveillance of unresected nodules, and histopathologic diagnoses of thyroidectomies were compared between three testing approaches: (i) Reflex (MT sent on the basis of the initial Bethesda III/IV FNA), (ii) SemiRestrictive (MT sent if repeat FNA is Bethesda I–IV), and (iii) Restrictive (MT sent only if repeat FNA is Bethesda III/IV) testing approaches.
Results:
Restricting MT to nodules that remain Bethesda III/IV on repeat FNA would have missed 4 low-risk cancers and 3 noninvasive follicular thyroid neoplasms with papillary-like nuclear features (NIFTP) (collectively 2% of the test population) but would have avoided diagnostic surgery for 42 benign nodules (12% of the test population). The Restrictive testing strategy was more specific (delta 0.126 confidence interval [CI 0.093 to 0.159] and 0.129 [CI 0.097 to 0.161], respectively) but less sensitive (delta −0.339 [CI −0.424 to −0.253] and −0.340 [CI −0.425 to −0.255], respectively) than the Reflex and SemiRestrictive approaches for detecting NIFTP or cancer.
Conclusions:
Repeat FNA cytology can guide the selection of cytologically indeterminate thyroid nodules that warrant MT. The Restrictive model of performing Afirma GEC only on nodules with two separate biopsies showing Bethesda III/IV cytology would reduce the rate of diagnostic surgery for histologically benign nodules while missing only rare low-risk tumors. Given the low but nontrivial risks of thyroidectomy, the higher specificity of the Restrictive testing approach disproportionately outweighs the potential harms.
Introduction
Molecular tests (MT) help refine the preoperative cancer risk assessment of cytologically indeterminate thyroid nodules (1). The Afirma Gene Expression Classifier (GEC; Veracyte, Inc., South San Francisco, CA), an early version of such a test, used gene expression profiling and machine-learning algorithms to stratify cytologically indeterminate nodules as having either low- (GEC-Benign [GEC-B]) or intermediate-risk (GEC-Suspicious [GEC-S]) profiles, warranting either clinical observation or diagnostic lobectomy, respectively (2). Current guidelines provide a general framework for MT of cytologically indeterminate thyroid nodules and do not offer specific recommendations about the role of repeat biopsy in selecting nodules for testing (3,4). This flexibility allows clinicians to adapt MT for their practice setting but has also led to inconsistent reports of test utilization and performance (5).
Veracyte endorses a Reflex testing model (Fig. 1A), where separate samples for cytology and molecular testing are collected from every nodule during the initial fine needle aspiration (FNA). The cytology is interpreted by a centralized laboratory, and aspirates resulted as Bethesda III or IV are reflexed to Afirma using the concurrent MT sample (6 –11). Reflex testing is simple but not suitable for all settings. This approach requires two additional FNA passes for potential MT from every nodule, even though the majority of those nodules ultimately do not require Afirma testing. Further, for nodules classified as Bethesda III due to atypia in a suboptimal sample, cytologic evaluation of a repeat FNA alone may suffice in some cases to resolve the indeterminate findings in the initial FNA (12 –14).

Comparison of Reflex, Restrictive, and SemiRestrictive strategies for selecting nodules for molecular testing. (
These considerations have prompted institutions to develop their own criteria for selecting nodules for Afirma testing. Some have adopted a two-staged Restrictive testing approach (Fig. 1B), where nodules classified as Bethesda III/IV on initial FNA undergo a repeat FNA, at which time material is collected for both Afirma and repeat cytology. Under this model, Afirma testing is confined to nodules that remain Bethesda III/IV on repeat FNA (15 –19). Afirma testing would not be performed if the repeat FNA is Bethesda V (Suspicious for Malignancy) or VI (Malignant), given the low negative predictive value (NPV) of Afirma in these categories (2). Likewise, nodules classified as Bethesda I (Nondiagnostic) or II (Benign) on repeat FNA would forego Afirma testing in favor of additional biopsy or surveillance, respectively.
Despite the appeal of this Restrictive strategy, its effectiveness compared with other Afirma testing strategies remains without an evidence base. Indeed, among surgical cohorts of thyroid nodules classified as Bethesda III, a benign repeat FNA did not necessarily confer a low cancer risk (20,21), posing a quandary for the clinician and patient: Should such nodules be treated as benign or indeterminate? If the latter, would the Restrictive model miss cancers that could have been detected using more permissive testing criteria?
In recognition of these concerns, we prospectively implemented a two-staged SemiRestrictive testing approach (Fig. 1C), in which Afirma testing was performed for cytologically indeterminate nodules classified as Bethesda I–IV on repeat FNA. Although our SemiRestrictive model had been devised to exclude nodules classified as Bethesda V/VI on repeat FNA from Afirma testing, we found that most (seven of nine) such cases had been sent for Afirma off-protocol during the study period. Therefore, with very few exceptions, Afirma GEC had been performed regardless of repeat FNA cytology. This population provided us with a unique opportunity to compare how different cytologic selection criteria (i.e., Reflex, SemiRestrictive, or Restrictive) for MT would have influenced test performance at our institution. The goal of this study was to correlate the Bethesda classification of the repeat FNA with Afirma GEC results, histologic outcomes of resected nodules, and surveillance outcomes of unresected nodules. We investigated the harms versus benefits of foregoing the Afirma test when the repeat FNA is interpreted as Bethesda I or II, as per the Restrictive strategy.
Materials and Methods
Study population and data collection
Institutional review board approval was obtained for this analysis of all thyroid FNA samples from Beth Israel Deaconess Medical Center that were submitted for Afirma GEC testing between June 2013 and October 2017. Nodules with an initial Bethesda III/IV cytology and repeat FNA (for Afirma testing and cytology) were identified. Afirma results, Bethesda classification (22) of initial and repeat FNAs based on cytology report review, sonographic nodule size, patient age, and sex were recorded for each nodule. Nodules under surveillance were classified as “concerning” if they developed 20% growth in at least two dimensions, >50% volume increase, or suspicious sonographic features, as per the American Thyroid Association guidelines (4); nodules lacking those findings were classified as “stable.” Follow-up data were collected until December 2019. Histopathologic diagnoses and surgery type were recorded for resected nodules. Histologic review was performed to confirm the subclassification of papillary carcinoma and to determine whether nodules diagnosed as follicular variants of papillary carcinoma warranted reclassification as noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP), as per the 2018 NIFTP criteria (23). Nodule size, laterality, and location were used to correlate the Afirma-tested nodule to the nodule followed by surveillance and/or resection.
Thyroid ultrasound, FNA, and Afirma GEC testing
Thyroid ultrasound was performed by Endocrine Certification in Neck Ultrasound (ECNU) certified endocrinologists and radiologists with thyroid expertise. The FNA was performed under continuous ultrasound guidance by using a 25-gauge needle. Initial FNA of a nodule typically consisted of four passes collected into CytoLyt for preparation of a ThinPrep (Hologic, Inc.) slide. For nodules classified as Bethesda III/IV on initial FNA, repeat FNA typically consisted of two passes collected into a vial of nucleic acid preservative for Afirma GEC testing and four passes collected into CytoLyt for ThinPrep cytology. Samples for Afirma GEC were submitted to Veracyte, Inc. as per our SemiRestrictive testing policy (Fig. 1C), with the addition of seven samples classified as Bethesda V/VI on repeat FNA that were submitted for Afirma testing off-protocol.
Data analysis
Fisher's exact test and Wilcoxon rank-sum test were used to compare categorical and continuous variables, respectively (GraphPad Prism v.8.4.1). To compare performance characteristics of the three testing strategies, we considered repeat FNA cytology and Afirma GEC results as a combined testing modality and defined “positive” or “negative” verdicts (regarding indication for surgical referral on the basis of the FNA) as follows:
Reflex: (Any cytology result) + GEC-B = negative (Any cytology result) + GEC-S = positive
SemiRestrictive:
(Bethesda I, II, III, or IV) + GEC-B = negative
(Bethesda I, II, III, or IV) + GEC-S = positive
Bethesda V or VI = positive
Restrictive
Bethesda I or II = negative
(Bethesda III or IV) + GEC-B = negative
(Bethesda III or IV) + GEC-S = positive
Bethesda V or VI = positive
We assumed 37.5% positive predictive value (PPV) and 94.5% NPV of Afirma testing based on the clinical validation study (2). We estimated sensitivity and specificity for the three different testing strategies by using conditional probability formulae. To account for the uncertainty due to sampling variation in this study as well as those used to estimate the PPV and NPV of Afirma testing, we computed the standard error and 95% confidence intervals (CIs) of the performance measure estimates by using the bootstrap method (24). The analyses described earlier were performed by using SAS 9.4 (SAS, Inc., Cary NC).
Results
Study population characteristics
Figure 2 shows exclusion criteria and case selection for this study. During the study, 383 FNAs from 382 thyroid nodules (373 patients) were submitted for Afirma GEC. Nine patients underwent Afirma GEC testing for two separate nodules. For one patient, the same nodule was tested twice over a three-year period. We excluded 12 nodules from analysis due to test cancellation (2 cases) or absence of prior indeterminate cytology (10 cases) (Fig. 2). Seven nodules had nondiagnostic Afirma results (GEC-ND); features of these nodules are summarized in Tables 1, 2, and 3 but have been excluded from subsequent analysis. The final cohort consisted of 363 nodules with (i) initial FNA classified as Bethesda III/IV, (ii) repeat FNA, and (iii) diagnostic Afirma GEC result; 215/363 (59%) nodules from 210 patients were classified as GEC-B, while 148/363 (41%) nodules from 143 patients were GEC-S. Table 1 summarizes demographic data and nodule size for the study population. Table 2 summarizes the Bethesda classifications of the initial and repeat FNAs. In accordance with the study design, the repeat FNAs were predominantly interpreted as Bethesda I–IV. Although our institutional policy was to forego Afirma testing when the repeat FNA cytology was Bethesda V/VI, seven of nine such aspirates were sent for Afirma GEC off-protocol. One was classified as GEC-B, while the remaining six were classified as GEC-S. The remaining two cases (both Bethesda V) were resected without Afirma testing.

Flow diagram demonstrating identification of the study population from the pool of thyroid nodules that were sampled for Afirma GEC testing. One nodule was tested twice with the Afirma GEC during the study period, as detailed in footnote a of Table 3. GEC, Gene Expression Classifier.
Patient and Nodule Characteristics for Samples Sent for Afirma Gene Expression Classifier Testing
Nodules with indeterminate cytology (Bethesda III or IV) on initial FNA underwent repeat FNA to collect samples for cytology and Afirma GEC testing. Patient sex, age, and nodule size data are presented based on the Afirma GEC-B, GEC-S, or GEC-ND.
Sex distribution between GEC-B and GEC-S groups showed no difference (p = 0.42, Fisher's exact test).
Patient age between GEC-B and GEC-S groups showed no difference (p = 0.11, Mann–Whitney unpaired t-test).
Nodule size between GEC-B and GEC-S groups showed no difference (p = 0.74, Mann–Whitney unpaired t-test).
FNA, fine needle aspiration; GEC, Gene Expression Classifier; GEC-B, Benign Gene Expression Classifier result; GEC-ND, nondiagnostic Gene Expression Classifier result; GEC-S, suspicious Gene Expression Classifier result; N, number; SD, standard deviation.
Fine Needle Aspiration Classification of Nodules That Were Sent for Afirma Gene Expression Classifier Testing
Nodules with indeterminate cytology (Bethesda III or IV) on initial FNA underwent repeat FNA to collect samples for cytology and Afirma GEC testing. If the repeat FNA cytology was Bethesda I, II, III, or IV, Afirma testing was performed on the concurrent sample collected for molecular testing as per our institutional SemiRestrictive policy for Afirma testing.
Distribution of Bethesda III and Bethesda IV interpretations on initial FNA did not differ between GEC-B and GEC-S groups (p = 0.70, Fisher's exact test).
The seven nodules that were classified as Bethesda V (N = 5) or Bethesda VI (N = 2) and submitted for Afirma testing represent deviations from institutional protocol; these samples are included in this table for the purposes of comparing Reflex testing protocols (i.e., Afirma test sent only on the basis of initial FNA cytology of Bethesda III or IV) with the SemiRestrictive and Restrictive testing approaches described herein.
Outcomes of nodules following Afirma GEC testing
Surveillance or surgical follow-up data were available from 189 of 215 (88%) GEC-B nodules and 134 of 148 (91%) GEC-S nodules (Table 3). As expected, far fewer GEC-B nodules than GEC-S nodules were resected (15% vs. 86%, respectively, of nodules with available follow-up, p < 0.0001). Surgical and surveillance outcomes of GEC-B nodules are summarized in Table 4 and Supplementary Table S1, respectively. Of the 28 resected GEC-B nodules, 25 (89%) were histologically benign (hyperplastic/adenomatoid nodules or adenomas), 1 was NIFTP, and 2 were carcinomas. Although diagnostic surgery was generally recommended for GEC-S nodules, several patients opted for surveillance, the records of which were available for 19 GEC-S nodules (Supplementary Table S2). Of the 115 GEC-S nodules that were resected (Table 5), 80 (70%) were benign (17 with Hurthle-cell features), 15 (13%) were NIFTP, and 20 (17%) were carcinomas.
Follow-Up of Nodules Sent for Afirma Gene Expression Classifier Testing
One of these tumors underwent Afirma testing twice, separated by 35 months. Ultrasound surveillance after the first Afirma test (GEC-B) revealed nodule growth, prompting repeat biopsy and Afirma testing 35 months after the initial FNA. The second Afirma test was GEC-S, prompting hemithyroidectomy. The tumor was diagnosed as a follicular variant of PTC (well-circumscribed/partially encapsulated tumor with bluntly invasive growth), prompting completion thyroidectomy.
One nodule was initially monitored by ultrasound after a GEC-B result. Hemithyroidectomy was performed 3 years after the initial FNA because of sonographic evidence of nodule growth (3.4 cm to 3.9 cm). The tumor was diagnosed as follicular carcinoma (encapsulated with extensive angioinvasion).
N/A, not available; PTC, papillary thyroid carcinoma.
Histologic Classification of Afirma Gene Expression Classifier-Benign Nodules That Were Resected (N = 28)
Nodules with initial indeterminate cytology (Bethesda III/IV) and repeat cytology classified as Bethesda I, II, III, IV, or V.
Partial thyroidectomy for these two benign cases (hyperplastic/adenomatoid nodule and follicular adenoma) was followed by completion thyroidectomy because of multiple incidental papillary microcarcinomas (separate from the index nodule aspirated for Afirma testing).
After the GEC-B result, this 3.4 cm nodule was monitored by ultrasound for 3 years. Nodule growth to 3.9 cm prompted surgery.
For one of these cases, the repeat FNA cytology (concurrent to the Afirma sample) was interpreted as Bethesda V, prompting total thyroidectomy.
In spite of the benign GEC result, total thyroidectomy was prompted in this case by a combination of the cytology report (Bethesda III interpretation that included papillary carcinoma in the differential diagnosis) and sonographic features.
FV-PTC, follicular variant of papillary thyroid carcinoma; NIFTP, noninvasive follicular thyroid neoplasm with papillary-like nuclear features.
Histologic Classification of Afirma Gene Expression Classifier-Suspicious Nodules That Were Resected (N = 115)
Selected for nodules with initial indeterminate cytology (Bethesda III/IV) and repeat cytology classified as Bethesda I, II, III, IV, V, or VI.
One of these tumors underwent Afirma testing twice, separated by 35 months (as described in footnote a, Table 3). Ultrasound surveillance after the first Afirma test (GEC-B) revealed nodule growth, prompting repeat biopsy and Afirma testing. The second Afirma test was GEC-S, prompting partial thyroidectomy. Histologic examination revealed a well-demarcated/partially encapsulated tumor with nuclear atypia of papillary carcinoma and bluntly invasive growth, best classified as encapsulated follicular variant of papillary carcinoma.
Three patients with NIFTP underwent completion thyroidectomy because of papillary microcarcinoma (separate from the index nodule tested by Afirma) with lymphovascular invasion and/or multifocality.
Review of pathology report and medical records indicates that the index nodule was a 1.1 cm follicular variant of papillary carcinoma with BRAF mutation (specific genotype not stated), present in a background of multifocal papillary microcarcinomas ranging from <0.1 to 0.7 cm.
Comparison of Afirma GEC performance between different testing strategies
We stratified nodule outcomes by Bethesda classification of the repeat FNA and GEC result (Fig. 3). For this analysis, Bethesda III/IV nodules were pooled together based on their similar recommendations for MT. Itemization of outcomes by each Bethesda category is provided in Supplementary Tables S3 and S4. Nodules classified as Bethesda III/IV on repeat FNA comprise the Restrictive testing group, those classified as Bethesda I–IV on repeat FNA comprise the SemiRestrictive group, and further inclusion of the seven nodules classified as Bethesda V/VI on repeat FNA approximate the Reflex testing group. Table 6 summarizes the benign-call rate, cancer prevalence, and cancer/NIFTP prevalence for these three testing approaches. The Restrictive approach would have missed four cancers (all Bethesda II on repeat FNA) and three NIFTPs (two Bethesda I and one Bethesda II) that would have otherwise been flagged as GEC-S under the SemiRestrictive and Reflex approaches. The clinicopathologic features of these seven cancer/NIFTP cases are summarized in Table 7; all four cancers were American Joint Committee Cancer (AJCC) stage I and considered low risk for structural disease recurrence. Conversely, Afirma testing of nodules classified as Bethesda I or II on repeat FNA, as per the Reflex and SemiRestrictive models, led to 42 cases where GEC-S results prompted diagnostic surgeries for histologically benign tumors. Altogether, the Restrictive strategy was more specific (delta 0.126 [CI 0.093 to 0.159] and 0.129 [CI 0.097 to 0.161], respectively) and it was less sensitive (delta −0.339 [CI −0.424 to −0.253] and −0.340 [CI −0.425 to −0.255], respectively) than the Reflex and SemiRestrictive approaches for detecting NIFTP or cancer.

Stratification of nodule outcomes based on repeat FNA cytology and Afirma GEC results. 1Four nodules with initial cytology of Bethesda III/IV and repeat cytology of Bethesda I had nondiagnostic Afirma results. Two of these nodules were stable on surveillance. Two nodules were lost to follow-up. 2Three nodules with initial cytology of Bethesda III/IV and repeat cytology of Bethesda II had nondiagnostic Afirma results. One nodule was resected by hemithyroidectomy; final histologic diagnosis was benign. The other two nodules were monitored and clinically/sonographically stable. One nodule was tested by Afirma twice, as described in Table 3 footnote a: initially with a GEC-B result (classified above as “Surveillance: Concerning” in the GEC-B column) and subsequently with a GEC-S result 35 months later (classified above as “Surgery: Malignant” in the GEC-S column). GEC-B, Gene Expression Classifier-Benign; GES-S, Gene Expression Classifier-Suspicious.
Calculation of Benign-Call Rate, Cancer Prevalence, and Cancer+Noninvasive Follicular Thyroid Neoplasm with Papillary-Like Nuclear Features Prevalence for the Different Testing Approaches Analyzed in This Study
The differences between the SemiRestrictive and Reflex approaches were not statistically significant (p = 0.31 for benign-call rate, p = 0.24 for cancer prevalence, and p = 0.10 for cancer+NIFTP prevalence. Fisher's exact test).
The differences between the Restrictive and Reflex approaches were not statistically significant (p = 0.41 for benign-call rate, p = 0.46 for cancer prevalence, and p = 0.25 for cancer+NIFTP prevalence. Fisher's exact test).
Benign-call rate is calculated by dividing the number of Benign Afirma results (GEC-B) by the total number of Afirma tests that would have been performed by using each testing strategy. The cases that comprise each testing approach (Reflex, SemiRestrictive, or Restrictive) are designated in Figure 3. Nodules with nondiagnostic GEC results are not included from benign-call rate calculations.
Prevalence calculations are made only among nodules with available clinical/sonographic or surgical follow-up data.
Clinicopathologic Features of Seven Cancers/Noninvasive Follicular Thyroid Neoplasm with Papillary-Like Nuclear Features That Were Detected by the Reflex and SemiRestrictive Approaches But Would Not Have Undergone Afirma Testing by the Restrictive Testing Approach
In all four patients with carcinoma, preoperative ultrasound was performed and showed no evidence of abnormal lymph nodes. Central lymph node dissection was not performed in these four cases, but peri-thyroidal lymph nodes were identified (all negative for tumor) in the thyroidectomy specimens from cases 1, 2, and 4. Cases 1–3 received radioactive iodine ablation; post-treatment scans have all been negative for disease. Case 4 was treated by hemithyroidectomy without radioactive iodine; postoperative ultrasound revealed prominent submandibular lymph nodes, which were benign on biopsy.
AJCC, American Joint Committee Cancer; AMES (age, metastasis, extent, size) prognostic scoring system; ETE, extrathyroidal extension; F, female; LVI, lymphovascular invasion; (m), multifocal primary tumors; MACIS, (metastasis, age, completeness of resection, local invasion, size) prognostic scoring system; microPTC, papillary thyroid microcarcinoma; pT, pathologic T-stage.
Discussion
We had a unique opportunity to examine how different cytology-based strategies (Reflex, SemiRestrictive, or Restrictive) for selecting thyroid nodules for Afirma testing would have influenced management decisions and outcomes at our institution. By foregoing Afirma testing for nodules classified as Bethesda I or II on repeat FNA, the Restrictive approach would have missed 7 NIFTPs and cancers (2%, or 7/363 of the test population) that would have been detected by the Reflex and SemiRestrictive approaches. These seven tumors were all low-risk neoplasms. Based on the generally slow progression of such tumors, current recommendations (4) for nodules classified as Bethesda I (repeat biopsy, close observation, or resection depending on sonographic pattern) or Bethesda II (surveillance) would likely have sufficed to catch these tumors before the development of adverse outcomes. At the same time, the Restrictive strategy would have averted diagnostic surgeries, otherwise prompted by GEC-S results in the Reflex and SemiRestrictive approaches, in up to 42 histologically benign nodules (12%, or 42/363 of the test population). Given the low but nontrivial risks of thyroidectomy, we conclude that this benefit of the Restrictive strategy disproportionately outweighs both the potential harms of missing rare, low-risk tumors and the uncertainty that may be caused by disparate initial (Bethesda III/IV) and repeat (Bethesda I/II) FNA cytology interpretations.
The financial costs of a permissive Afirma testing strategy is also a consideration, particularly for nodules classified as Bethesda II on repeat FNA. The low prevalence of cancer and NIFTP in this group diminishes the Afirma GECs contribution to risk stratification, making the GEC an expensive confirmation of the low risk already conveyed by the repeat FNA cytology. Taken together, this analysis supports adoption of the Restrictive testing strategy at our institution. Although this study was limited to the Afirma GEC, the role of the Restrictive approach in regulating the pretest probability of NIFTP/cancer in the test population should be broadly applicable to other MT platforms, including those with a genotyping component that can help rule in disease.
The resection rate of cytologically indeterminate nodules can serve as one measure of an MT's effect on clinical decision making. In previous reports from our institution before the implementation of MT for thyroid FNAs, the resection rate of indeterminate nodules ranged from 37% (in 2010–2011) to 53% (2006–2008) (25,26). In the current study, GEC-tested nodules had an overall resection rate of 39%, raising questions about the impact of MT on reducing surgical rates among cytologically indeterminate nodules. The prevalence (12%) of NIFTP/cancer (Table 6) and PPV (26%) of the GEC (Fig. 3) in our overall study population provides insight into this matter. These values are low compared with those of the GEC clinical validation study (24–25% prevalence and 37–38% PPV) (2) as well as the pooled values for disease prevalence (20%) and PPV (37%) among post-validation studies for the GEC (27).
We considered several reasons for the low NIFTP/cancer prevalence in this study. First, given the high inter-observer variability among cytopathologists for the Bethesda III category (28), it is possible that our institutional criteria for Bethesda III during the study period were more lenient and, thus, included more benign nodules compared with those selected for Afirma testing at other institutions. The availability of MT itself may have been a contributing factor. The implementation of MT has been reported to coincide with an increase in indeterminate cytologic interpretations and a concomitant decrease in benign diagnoses (29,30). It is also likely that our relatively permissive policy for Afirma testing contributed to a greater share of histologically benign nodules sent for Afirma testing. A Restrictive testing strategy would have increased the prevalence of NIFTP/cancer in the GEC-tested population to 16% (Table 6) and PPV to 32% (23 NIFTP/cancer out of 72 GEC-S in the Restrictive group with available clinical or surgical follow-up), more closely approximating the corresponding values from the GEC validation study (2) and meta-analysis of post-validation reports (27).
Taken together, our findings highlight the importance of monitoring disease prevalence among cytologically indeterminate nodules and applying rigorous criteria for selecting those nodules that warrant MT. Although this study examined the role of repeat FNA cytology and Restrictive testing criteria for guiding the selection of nodules for MT, other strategies for mitigating the overuse of MT may also be effective. Second-opinion reviews of cytologically indeterminate aspirates have been shown to improve diagnostic accuracy (31 –34) and would likely have reclassified a subset of Bethesda III FNAs in our study as benign. Limiting the Afirma test to those nodules with indeterminate cytology on second-opinion review would be expected to further increase NIFTP/cancer prevalence among tested nodules. Integration of a nodule's sonographic features with clinical and cytologic findings may offer additional opportunities for the judicious selection of nodules for MT.
With a few exceptions (35 –37), most studies of Afirma GECs performance in the real-world setting have limited their outcome appraisal to resected nodules (9,11,16,18,38 –46). In contrast, our study includes longitudinal sonographic/clinical follow-up as reference outcomes for unresected GEC-tested nodules, as recommended by Duh et al., thus providing a more comprehensive and representative study population for evaluating test performance (47).
This study has several limitations. First, this analysis is limited to our institutional experience with the Afirma GEC, a test that has been superseded by Veracyte's Afirma Gene Sequencing Classifier (GSC) since 2017. We restricted our analysis to the years that the GEC was used at our institution to permit longitudinal follow-up of nodules. This design also allowed us to compare the cytologic criteria for ancillary molecular testing without introducing the additional variable of an updated version of the Afirma test. Compared with the GEC, the Afirma GSC showed superior specificity (68%, compared with 52% for GEC) and similar sensitivity (91%, compared with 90% for GEC) in its clinical validation study, with early reports from real-world practice settings supporting this finding (2,48 –50). Therefore, it is possible that the relative harms of a permissive testing approach that we observed for the GEC would be lower for the more technologically advanced GSC. Second, the Reflex testing model in this study approximates but is not equal to Reflex testing in practice, in which the sample for MT is collected at the time of initial FNA. We further recognize that testing strategies that favor nodule surveillance over diagnostic lobectomy are not necessarily more cost-effective; the economic cost of each of the testing strategies was not calculated in the current study. Finally, there are regional and global differences in the management of cytologically indeterminate nodules, based, in part, on contrasting perspectives regarding clinical surveillance, diagnostic surgery, and risk tolerance (27,51). Strategies for selecting nodules for MT described here are likely to be different for Asian practices where active surveillance, rather than diagnostic surgery, is the mainstay for indeterminate nodules (52).
In conclusion, this is the first study, to our knowledge, that provides data about the role of repeat FNA cytology in selecting nodules for molecular testing. The different cytologic criteria for selecting nodules for Afirma GEC testing present trade-offs between sensitivity and specificity for detecting NIFTP and cancer. Reflex testing does not involve a repeat biopsy. This strategy, as well as the relatively permissive SemiRestrictive testing strategy, detected rare cases of NIFTP and low-risk cancer (representing 2% of the test population) that would not have undergone Afirma testing under the Restrictive testing approach. However, the higher sensitivity of these permissive testing strategies is offset by their lower specificity: Afirma testing of nodules classified as Bethesda I or II on repeat FNA prompted the resection of 42 histologically benign nodules (representing 12% of the test population) by virtue of a suspicious GEC result. Considering the low mortality and relatively slow rate of tumor progression of low-stage differentiated thyroid cancers, the Restrictive testing approach offers a better balance between the harms of overtesting versus missing NIFTP or cancer at the time of repeat FNA.
Footnotes
Acknowledgments
The content is solely the responsibility of the authors and does not necessarily represent the official views of Harvard Catalyst, Harvard University and its affiliated academic health care centers, or the National Institutes of Health.
Authors' Contributions
All authors have significantly contributed.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This work was conducted with support from Harvard Catalyst | The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health Award UL 1TR002541) and financial contributions from Harvard University and its affiliated academic health care centers.
Supplementary Material
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
