Abstract
Background:
Thyroid nodules with indeterminate cytology are increasingly subjected to molecular testing. We evaluated the diagnostic performances of Afirma Genomic Sequencing Classifier (GSC) and ThyroSeq v3 in thyroid nodules with high versus low/intermediate suspicion ultrasound classification.
Methods:
In this prospective cohort study, we analyzed all Bethesda III and IV thyroid nodules that underwent fine-needle aspiration biopsies in the University of California Los Angeles Health System from July 2017 to April 2020. All patients underwent molecular testing with Afirma GSC or ThyroSeq v3 as part of an institutional randomized trial (NCT02681328). Nodules were categorized according to the American Thyroid Association (ATA) ultrasound risk classification. The benign call rate and the positive predictive value of molecular testing were compared between ATA high suspicion versus all other categories.
Results:
A total of 343 patients with 375 indeterminate thyroid nodules were included. The malignancy rate in ATA high suspicion nodules was not significantly increased by a suspicious Afirma GSC result (77.8% for all ATA high suspicion nodules vs. 87.5% for nodules with ATA high suspicion and suspicious Afirma GSC results, positive likelihood ratio [LR] = 2.0, 95% confidence interval [CI 0.5–8.0], p = 1.0) or by a positive ThyroSeq v3 result (80.0% vs. 80.0%, positive LR = 1.0 [CI 1.0–1.0], p = 1.0). The rate of malignancy in ATA low/intermediate suspicion nodules increased from 21.0% to 56.3% with a suspicious Afirma GSC result (positive LR = 4.8 [CI 3.4–6.9], p < 0.0001) and decreased to 3.8% with a benign Afirma GSC result (negative LR = 0.1 [CI 0.07–0.3], p < 0.0001). Similarly, the rate of malignancy in ATA low/intermediate suspicion nodules increased from 24.3% to 66.7% with a positive ThyroSeq v3 result (positive LR = 6.2 [CI 4.0–9.7], p < 0.0001) and decreased to 2.1% with a negative ThyroSeq v3 result (negative LR = 0.07 [CI 0.02–0.3], p < 0.0001).
Conclusions:
Afirma GSC and ThyroSeq v3 performed well in ruling out malignancy in sonographically low/intermediate suspicion thyroid nodules but has limited diagnostic value in sonographically high suspicion nodules. Molecular testing can prognosticate more aggressive thyroid cancers, which can inform treatment decisions.
Introduction
Thyroid nodules are identified on ultrasound in up to 68% of adults (1). Ultrasound features and, when appropriate, fine-needle aspiration (FNA) biopsy are used to risk stratify thyroid nodules and guide subsequent management (2,3). Molecular testing is now commonly employed to further evaluate the 20% of thyroid nodules that are cytologically indeterminate, which have a 10–40% risk of malignancy (4,5). However, few studies have evaluated the performance and utility of molecular testing based on ultrasound risk classification.
Afirma Genomic Sequencing Classifier (GSC) and ThyroSeq v3 are two commercially available molecular tests with sensitivities of 91% and 94% and specificities of 94% and 82%, respectively (6 –8). Our recent randomized controlled trial (NCT02681328) directly comparing these two tests showed no significant difference in diagnostic performance in evaluating cytologically indeterminate nodules, ultimately reducing diagnostic surgeries by 49% when returning a benign molecular test result (9). However, given the high cost of molecular testing, routine testing for all indeterminate nodules may not be cost-effective (10 –15). Combining molecular testing with other risk factors such as ultrasound categorization may improve both diagnostic performance and cost-effectiveness (14,16).
Previous studies from our institution using Afirma Gene Expression Classifier (GEC) suggested that molecular testing may enhance the diagnostic value for sonographically low suspicion nodules, but not for sonographically high suspicion nodules (14,16). This was true for both the American Thyroid Association (ATA) and the American College of Radiology Thyroid Imaging Reporting and Data System (ACR TI-RADS) criteria, as there was high concordance between the two sonographic imaging systems in our institution (16).
ATA imaging criteria was used as the primary sonographic evaluation of thyroid nodules in this study, but a subset of nodules was retrospectively assigned ACR TI-RADS reads to assess for concordance between ATA and ACR TI-RADS. Both Afirma GSC and ThyroSeq v3 have markedly higher benign call rates (BCRs) compared with Afirma GEC, thus allowing more patients to safely avoid diagnostic surgery (8,15 –19). The aim of this study was to evaluate the diagnostic performance of Afirma GSC and ThyroSeq v3 in conjunction with ultrasound risk classification of cytologically indeterminate thyroid nodules.
Materials and Methods
Study population
This prospective cohort study included all patients who underwent thyroid FNA in the University of California Los Angeles Health System from July 31, 2017 to April 21, 2020. The indication for FNA was made according to the ATA risk classification system. Thyroid nodules with indeterminate cytology (Bethesda III: atypia of undetermined significance or follicular lesion of undetermined significance, or Bethesda IV: follicular neoplasm or suspicious for a follicular neoplasm) were reflexively sent for molecular testing. Nodules were block randomized by month to Afirma GSC or ThyroSeq v3 as part of our randomized controlled trial, “Effectiveness of Molecular Testing Techniques for Diagnosis of Indeterminate Thyroid Nodules” (9).
Nodules with an Afirma GSC suspicious or ThyroSeq v3 positive result, regardless of the mutation found on molecular testing, were generally recommended for surgery. Nodules with an Afirma GSC benign or ThyroSeq v3 negative result were generally recommended for observation. Nodules with a ThyroSeq v3 currently negative result were considered as negative (7,19). Although the ATA risk classifications were made at the time of initial sonographic evaluation of the nodules, the treatment plan for cytologically indeterminate thyroid nodules was predominantly made based on the molecular test results. This study was approved by the institutional review board at our institution.
Ultrasound imaging technique
Thyroid ultrasound was performed by technologists licensed with Registered Diagnostic Medical Sonographer certification in an ultrasound department accredited by the American College of Radiology. Diagnostic ultrasound machines (iU22 or Epiq7; Philips Ultrasound) were used to image thyroid and cervical lymph nodes with linear 12- to 5-MHz and curvilinear 8- to 5-MHz transducers. Images were obtained in both longitudinal and transverse planes to demonstrate nodule margins, size, composition, echogenicity, shape, presence of color Doppler flow, and ancillary features (e.g., calcifications, colloid). If multiple nodules were present, the largest nodules and those with higher risk features in each thyroid lobe were imaged. Images were electronically stored and reviewed (Centricity; GE Healthcare).
Study design
All thyroid nodules were imaged using ultrasound and risk stratified based on the 2015 ATA management guidelines (2). The ATA classification system has five risk categories: benign (<1% risk of malignancy), very low suspicion (<3% risk of malignancy), low suspicion (5–10% risk of malignancy), intermediate suspicion (10–20% risk of malignancy), and high suspicion (70–90% risk of malignancy) (2). Sonographic features of thyroid nodules that contribute to their ATA risk classification include nodule composition (solid, cystic, or spongiform), echogenicity, margin regularity, presence of calcification, type of calcifications, shape (e.g., taller than wide), vascularity, and extrathyroidal extension (2). Nodules were categorized into ATA high suspicion versus all other ATA categories (referred to as ATA low/intermediate suspicion) due to the significantly increased risk of malignancy for the ATA high suspicion category (2).
Additionally, various studies have shown that 3.4–7.6% of nodules could not meet the criteria for the ATA classification (20 –23) and were read as unclassified nodules. A subset of thyroid nodules was also given ACR TI-RADS reads to assess for concordance with the ATA classification system. The ACR TI-RADS classification system is organized into five categories: TR1: benign (<2% risk of malignancy), TR2: not suspicious (<2% risk of malignancy), TR3: mildly suspicious (<5% risk of malignancy), TR4: moderately suspicious (5–20% risk of malignancy), and TR5: highly suspicious (>20% risk of malignancy). Using a points-based system, sonographic categories that are assessed include nodule composition, echogenicity, shape, margins, and the presence of echogenic foci (24).
Most ultrasound reads were performed by 12 radiologists and endocrinologists with fellowship training with an average of 9.9 years of experience (standard deviation [STD], 3.7 years). An additional 125 nodules with sufficient ultrasound imaging saved in the patient's chart that did not receive an ATA risk classification at the time of initial FNA or received an ATA unclassifiable read were retrospectively evaluated and assigned ATA reads by a core group of 4 radiologists (coauthors M.P., K.B., M.D., R.M.), who have an average of 11.5 years of experience. The radiologists were blinded to the pathology and molecular test results.
To assess the concordance between the ATA classification system and the ACR TI-RADS classification system, nodules that received retrospective ATA reads were also given ACR TI-RADS reads by the same core group of radiologists for subset analysis. Each nodule's diameter and location were recorded and used to identify resected nodules in surgical histopathology reports. Nodules missing electronically stored ultrasound images or had insufficient cellular material sent for molecular test analysis were excluded (Fig. 1).

Flow diagram depicting the inclusion and exclusion criteria to generate the final study cohort.
The ATA results and the molecular test results were compared with the reference standard of surgical histopathology of resected thyroid nodules. Any neoplasm, carcinoma, or noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) found on histopathology was considered malignant. The pathologists were blinded to the ATA results. Unresected nodules with Afirma GSC suspicious or ThyroSeq v3 positive results were excluded. Due to the well-validated high negative predictive value (NPV) of Afirma GSC (96–100%) (8,17,25) and ThyroSeq v3 (7,19,26), unresected nodules with Afirma GSC benign or ThyroSeq v3 negative results were considered benign unless repeat ultrasound, cytology, or molecular testing on 12+ month follow-up indicated suspicious results.
Statistical analysis
Descriptive statistics (median, interquartile range [IQR]) were calculated for patient demographics and clinical information. Positive predictive value (PPV) and malignancy rates were calculated for Afirma GSC and ThyroSeq v3 molecular tests according to the reference standard described above. Positive likelihood ratio (LR) was calculated as sensitivity/(1 − specificity), and negative LR was calculated as (1 − sensitivity)/specificity. Chi-square tests were used to compare differences between independent groups. Generalized estimating equations were used to compare differences between overlapping groups. BCRs, LRs, PPV, NPV, and rates of malignancy were calculated with 95% Wilson confidence intervals [CIs]. All comparisons had ≥80% power unless otherwise stated.
Logistic regression models were used to analyze univariate and multivariate models. Nodules were assumed to be independent, including multiple nodules arising from the same patient. Covariates were included in multivariate models based on significance in association (p < 0.05). Multivariate analyses were summarized with an area under the curve (AUC). Based on our initial clinical trial where the BCRs of Afirma GSC and ThyroSeq v3 were calculated to be 53% and 61%, respectively (9), to detect a similar effect size with 80% power at an alpha level of 0.05, 162 nodules were needed for each arm. Assuming a 10% loss to follow-up, the sample size was calculated to be 178 nodules per group. All analyses were carried out using the R software. p-Values <0.05 were considered statistically significant.
Results
The study cohort consisted of 375 thyroid nodules from 343 patients (Fig. 1). The median age was 54 years (IQR 43–66 years), and 271 patients (79.0%) were female (Table 1). The median nodule size was 2.0 cm (IQR 1.3–3.0 cm) (Table 1). One hundred sixty-two nodules (39.8%) returned a suspicious/positive molecular test result (88 nodules were Afirma GSC suspicious, 74 nodules were ThyroSeq v3 positive). Of those, 133 nodules (82.1%) were resected with a 63.9% rate of malignancy or NIFTP based on histopathology (43 nodules were Afirma GSC suspicious, 42 nodules were ThyroSeq v3 positive). Of the 29 unresected nodules (16 Afirma GSC suspicious, 13 ThyroSeq v3 positive), 20 nodules (69.0%) were observed with surveillance ultrasounds based on patient preference, 2 nodules (6.9%) had surgery deferred due to other comorbidities, and 6 nodules (20.7%) were lost to follow-up. One nodule (3.4%) was resected but did not receive histopathological analysis. These nodules were excluded.
Patient Demographics and Baseline Characteristics of Indeterminate Thyroid Nodules
One patient (female, White) received both Afirma GSC and ThyroSeq v3 molecular testing for two separate nodules.
AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance; FN/SFN, follicular neoplasm/suspicious for follicular neoplasm; GSC, Genomic Sequencing Classifier; IQR, interquartile range.
Of the 245 nodules with Afirma GSC benign or ThyroSeq v3 negative results, 32 nodules (13.1%) (20 Afirma GSC benign, 12 ThyroSeq v3 negative) were resected and returned a 2.9% malignancy rate. Seven nodules (21.9%) were recommended for resection based on clinician recommendation, 7 nodules (21.9%) were increasing in size, 6 nodules (18.8%) had concurrent known malignancy or an Afirma GSC suspicious or ThyroSeq v3 positive molecular test in another thyroid nodule, 5 nodules (15.6%) were causing compressive symptoms, 3 nodules (9.4%) had repeat Afirma GSC suspicious or ThyroSeq v3 positive molecular tests, 2 nodules (6.3%) were based on patient request, 1 nodule (3.1%) had concurrent hyperparathyroidism, and 1 nodule (3.1%) had a positive ipsilateral lymph node from a history of thyroid malignancy. Sixteen nodules (50.0%) were adenomatoid nodules, 6 nodules (18.8%) were follicular adenomas, 6 nodules (18.8%) were papillary thyroid carcinomas, 2 nodules (6.3%) were Hurthle cell adenomas, 1 nodule (3.1%) was a Hurthle cell carcinoma, and 1 nodule (3.1%) showed chronic lymphocytic (Hashimoto) thyroiditis.
Two hundred thirteen nodules with Afirma GSC benign and ThyroSeq v3 negative molecular test results were conservatively managed with serial thyroid ultrasounds. One hundred thirty-six nodules (63.8%) had repeat ultrasounds showing stable nodule size with no new suspicious sonographic findings on 12+ month follow-up, 12 nodules (5.6%) had repeat biopsies with benign cytology, and 5 nodules (2.3%) had repeat biopsies with indeterminate cytology and Afirma GSC benign or ThyroSeq v3 negative molecular tests. Fifty-seven nodules (26.8%) did not receive follow-up surveillance at our institution for reasons including patients declining follow-up care, patients moving to a new medical system, patient mortality from pre-existing comorbidities, and being lost to follow-up. Two nodules (0.9%) showed interval growth, but patients refused surgery, and 1 nodule (0.5%) had a repeat Afirma GSC suspicious molecular test, but the patient refused surgery; these nodules were excluded. No adverse events occurred as a result of ultrasound, FNA biopsy, or surgical resection.
Thyroid nodules had the following ATA ultrasound classifications: 1 nodule (0.3%) was benign, 13 nodules (3.5%) were very low suspicion, 157 nodules (41.9%) were low suspicion, 172 nodules (45.9%) were intermediate suspicion, and 32 nodules (8.5%) were high suspicion. Ten nodules (2.5%) initially received ATA unclassifiable reads but were retrospectively assigned ATA classifications.
For the 14 nodules with benign or very low suspicion ATA risk classifications, 11 nodules were biopsied due to large or enlarging nodule size (ranging from 2.0 to 6.0 cm), 1 nodule was biopsied due to preference of the referring physician, 1 nodule was biopsied due to patient preference, and 1 nodule was biopsied due to increased fluorodeoxyglucose uptake in the thyroid nodule on positron emission tomography imaging in a patient with a history of colorectal cancer. The BCRs across all ATA categories for Afirma GSC and ThyroSeq v3 were 64.7% [CI 57.7–71.3%] and 61.4% [CI 53.3–69.0%], respectively. ATA low/intermediate suspicion nodules returned a BCR of 63.9% [CI 58.6–68.9%] compared with 38.9% [CI 23.1–56.5%] for ATA high suspicion nodules (p = 0.006) collectively between Afirma GSC and ThyroSeq v3 (Supplementary Fig. S1).
A subset of 125 thyroid nodules that received retrospective ATA reads by the core group of 4 radiologists also received retrospective ACR TI-RADS reads with the following classifications: 1 nodule (0.8%) was TR1, 12 nodules (9.6%) were TR2, 44 nodules (35.2%) were TR3, 52 nodules (41.6%) were TR4, and 16 nodules (12.8%) were TR5. TR1–4 was considered concordant with ATA low/intermediate suspicion nodules, and TR5 was considered concordant with ATA high suspicion nodules. The concordance rate between ATA versus ACR TI-RADS in this subset of nodules was 89.6%.
The PPV and NPV of Afirma GSC across all ATA categories is 59.7% [CI 47.4–70.7%] and 96.2% [CI 91.4–98.8%], respectively. The PPV of Afirma GSC in ATA high suspicion versus low/intermediate suspicion nodules was 87.5% [CI 47.4–99.7%] versus 56.3% [CI 43.3–68.6%] (p = 0.132). There was 40.1% power to detect the effect. The PPV and NPV of ThyroSeq v3 across all ATA categories is 68.9% [CI 59.2–82.9%] and 97.9% [CI 92.8–99.8%], respectively. The PPV of ThyroSeq v3 in ATA high suspicion versus low/intermediate suspicion nodules was 80.0% [CI 44.4–97.5%] versus 66.7% [CI 52.1–79.2%] (p = 0.4846). There was 11.0% power to detect the effect. In ATA high suspicion nodules, the rate of malignancy was 77.8% [CI 40.0–97.2%] (Afirma GSC group) and 80.0% [CI 44.4–97.5%] (ThyroSeq v3 group) a priori to the molecular test results (Supplementary Fig. S2).
The rate of malignancy in ATA high suspicion nodules after receiving a suspicious Afirma GSC and positive ThyroSeq v3 result was 87.5% [CI 47.4–99.7%] (p = 1.0) (Fig. 2A) and 80.0% [CI 44.4–97.5%] (p = 1.0) (Fig. 2B), respectively. The negative LRs for Afirma GSC suspicious and ThyroSeq v3 positive results in ATA high suspicion nodules are 2.0 [CI 0.5–8.0] (Fig. 2A) and 1.0 [CI 1.0–1.0] (Fig. 2B), respectively. One ATA high suspicion nodule received a benign Afirma GSC result (BCR = 11.1% [CI 0.3–48.3%]) and was found to be benign on histopathology (Fig. 2A), while no ATA high suspicion nodules received a negative ThyroSeq v3 result (BCR = 0% [CI 0.0–30.9%]) (Fig. 2B).

Malignancy rates stratified by sonographic risk categorizations alone and by sonographic risk categorizations in conjunction with molecular marker test results. In the first branch, malignancy rates were reported for sonographic risk categories alone (sonographically low/intermediate suspicion nodules and sonographically high suspicion nodules) before molecular marker test results. Nodules from each sonographic category were then assigned a molecular marker test outcome. BCRs and SCRs were calculated for each molecular marker test. New malignancy rates were reported for each sonographic risk category after receiving molecular marker test results. Positive and negative LRs were reported to compare the difference in malignancy rates as calculated by malignancy rates for molecular marker test results in conjunction with sonographic risk categorizations (post-test malignancy rates) divided by malignancy rates for sonographic risk categorizations alone (pretest malignancy rates). (
In ATA low/intermediate suspicion nodules, the rate of malignancy was 21.0% [CI 13.6–25.3%] (Afirma GSC group) and 24.3% [CI 17.7–32.1%] (ThyroSeq v3 group) a priori to the molecular test results (Supplementary Fig. S2). The rate of malignancy in ATA low/intermediate suspicion nodules after receiving a benign Afirma GSC and negative ThyroSeq v3 result was 3.8% [CI 0.5–6.6%] (p < 0.0001) (Fig. 2A) and 2.1% [CI 0.3–7.3%] (p < 0.0001) (Fig. 2B), respectively. The negative LRs for Afirma GSC benign and ThyroSeq v3 negative results in ATA low/intermediate suspicion nodules are 0.1 [CI 0.07–0.3] (Fig. 2A) and 0.07 [CI 0.02–0.3] (Fig. 2B), respectively.
The rate of malignancy in ATA low/intermediate suspicion nodules after receiving a suspicious Afirma GSC and positive ThyroSeq v3 result was 56.3% [CI 41.6–67.9%] (p < 0.0001) (Fig. 2A) and 66.7% [CI 50.9–78.0%] (p < 0.0001) (Fig. 2B), respectively. The positive LRs for Afirma GSC suspicious and ThyroSeq v3 positive results in ATA low/intermediate suspicion nodules are 4.8 [CI 3.4–6.9] (Fig. 2A) and 6.2 [CI 4.0–9.7] (Fig. 2B), respectively. The BCRs for ATA low/intermediate suspicion nodules were 67.2% [CI 61.3–75.0%] (Fig. 2A) and 65.5% [CI 57.9–73.5%] for Afirma GSC and ThyroSeq v3, respectively (Fig. 2B).
Univariate logistic regression was used to assess the level of association of different clinical parameters with malignant thyroid nodules (Table 2). Age, Bethesda category, ATA risk classification, Afirma GSC, and ThyroSeq v3 were significantly associated with malignancy (p < 0.05 for each covariate) (Table 2). The strongest multivariate regression models consisted of age, Bethesda category, ATA classification, and the respective molecular tests. Model 1, consisting of age, Bethesda category, ATA classification, and ThyroSeq v3, had an AUC of 0.914 when differentiating between malignancy and NIFTP versus benign nodules and an AUC of 0.830 when differentiating between malignancy and NIFTP. Model 2, consisting of age, Bethesda category, ATA classification, and Afirma GSC, had an AUC of 0.877 when differentiating between malignancy and NIFTP versus benign nodules and an AUC of 0.745 when differentiating between malignancy and NIFTP.
Odds Ratios and p-Values were Calculated for Each Covariate Over the Indicated Reference Ranges
A:B signifies the odds ratio for A relative to B.
ATA, American Thyroid Association; CI, 95% confidence interval.
Of the 92 nodules considered to have malignant thyroid pathology, 58 nodules (63.0%) were papillary thyroid carcinomas, 6 nodules (6.5%) were follicular thyroid carcinomas, 2 nodules (2.2%) were minimally invasive Hurthle cell carcinomas, 2 nodules (2.2%) were poorly differentiated thyroid carcinomas, 1 nodule (1.1%) was an anaplastic thyroid carcinoma, and 23 nodules (25.0%) were NIFTP (Table 3). Of the 58 papillary thyroid carcinomas, 26 nodules (44.8%) were the follicular subtype, 18 nodules (31.0%) were the classic subtype, and 2 nodules (3.4%) were tall cell variants (Table 3).
Each Tumor Type was Analyzed According to the Number of Nodules, Tumor Subtypes, Tumor Size, Tumor Disease Stage, and Notable Invasive or Metastatic Features Found on Surgical Histopathological Reports
NIFTP, noninvasive follicular thyroid neoplasm with papillary-like nuclear features.
The median nodule diameter was 2.0 cm (IQR, 1.1–2.8 cm). Thirty nodules (32.6%) were tumor stage pT1N0/pT1NX, 18 nodules (19.6%) were tumor stage pT2N0/pT2NX, 8 nodules (8.7%) were tumor stage pT3N0/pT3NX, 9 nodules (9.8%) were tumor stage pT1N1, 2 nodules (2.2%) were tumor stage pT2N1, 1 nodule (1.1%) was tumor stage pT3N1, and 1 nodule was tumor stage pT4N1M1. Two nodules (2.2%) exhibited extrathyroidal extension, 9 nodules (9.8%) exhibited capsular invasion, 5 nodules (5.4%) exhibited lymphatic invasion, 9 nodules (9.8%) exhibited vascular invasion, and 13 nodules (14.1%) had nodal metastases.
Discussion
Our study is the first that uses ultrasound risk classification and current versions of commercially available molecular marker tests to risk stratify cytologically indeterminate thyroid nodules. Our findings suggest that cytologically indeterminate thyroid nodules with highly suspicious ultrasound features have a malignancy rate approaching 80%, which is unaffected by suspicious/positive molecular marker tests (suspicious Afirma GSC: positive LR = 2.0, positive ThyroSeq v3 results: positive LR = 1.0).
Therefore, molecular testing has limited utility in detecting malignancy in indeterminate ATA high suspicion nodules, although molecular testing may still provide relevant clinical information for prognostication and treatment planning. In ATA low/intermediate ultrasound suspicion nodules, the rates of malignancy were significantly lower with a benign Afirma GSC or negative ThyroSeq v3 result. The relatively high BCRs for Afirma GSC and ThyroSeq v3 (67.2% and 65.5%, respectively) in ATA low/intermediate suspicion nodules highlight the value of molecular testing in all other ultrasound risk categories in decreasing the number of diagnostic surgeries.
Our current study utilized the updated molecular test panels, Afirma GSC and ThyroSeq v3. The specificity of Afirma GSC increased to 94.3%, compared with 61.4% for GEC (8), and the specificity of ThyroSeq v3 improved to 82% (7,8,26 –28). With the improvement in specificity, the BCR for Afirma GSC improved to 64.7% [CI 57.7–71.3%] compared with 43.5% for Afirma GEC. Despite this, our findings are consistent with our prior study evaluating Afirma GEC performance across ultrasound risk categories (14,16). The study found that ATA high suspicion nodules returned a suspicious Afirma GEC 81% of the time, and a positive GEC did not significantly increase specificity or PPV (16).
In addition, our subgroup analysis of 125 nodules (30.7%) that received ACR TI-RADS reads yielded a concordance rate of 89.6% between ATA and ACR TI-RADS, which is similar to the 90% concordance rate between ATA and ACR TI-RADS from our previous study (16). This suggests that the ATA reads are largely generalizable to other ultrasound classification systems at our institution.
Our univariate and multivariate logistic regression analyses reached similar conclusions found in Figge et al. (29) in that ThyroSeq v3 had the best diagnostic performance in predicting malignancy. However, while Figge et al. reported that the ATA risk classification was not significantly associated with malignancy, they postulated that their small sample size of ATA high suspicion nodules may have impacted the significance of this association (29). Given that our sample size of ATA high suspicion nodules is 45% larger, we found that ATA risk classification is indeed a significant predictor of malignancy. This is consistent with the positive trend between ATA risk category and risk of malignancy found in Figge et al. (29).
Afirma GSC and ThyroSeq v3 performed the best in predicting malignancy, which may be influenced by the sample size of each molecular test being an order of magnitude greater than that for ATA high suspicion nodules. Furthermore, the vast majority of molecular tests were used for ATA low/intermediate suspicion nodules, which our analyses have shown to provide significant a posteriori knowledge in risk stratifying malignancy. Conversely, ATA low/intermediate suspicion nodules have up to 20% risk of malignancy (2), so an ATA low/intermediate result performs poorly in ruling out malignancy. As the vast majority of our data consists of nodules with ATA low/intermediate risk suspicion, the effect size of the ATA risk classification system may be lower. Although variables were independently assessed for association with malignancy, in reality the variables cannot be evaluated in a vacuum as the variables may be interconnected.
Thus, although Afirma GSC and ThyroSeq v3 outperformed ATA risk classification in predicting malignancy, a direct comparison between molecular tests and ultrasound features cannot yet be drawn due to the imbalanced sampling. Although our multivariate regression model was able to uncover variables that were significant in determining the risk of malignancy in cytologically indeterminate thyroid nodules, we have not yet applied the multivariate regression model for clinical decision-making as our data set is still imbalanced between ATA high and ATA low/intermediate nodules. When a greater sample of high suspicion nodules is attained, the multivariate regression model may potentially be integrated into clinical practice. Molecular testing should still be used judiciously based on clinician assessment and integration of available clinical data, including ultrasound risk classification.
Ultrasound features remain a crucial factor in the risk stratification of thyroid nodules with indeterminate cytology (30,31). However, ultrasound risk classification has not been routinely adopted into clinical practice guidelines for determining the treatment plan for indeterminate thyroid nodules, including indication for molecular testing. Additional data are needed to guide clinical practice in this area. Our study suggests that routine use of ultrasound to further risk stratify indeterminate thyroid nodules may help optimize and narrow the use of costly molecular tests to nodules without high-risk features, while promoting direct surgical management of thyroid nodules with high-risk ultrasound features.
Other studies similarly support this approach, by conversely showing that high-risk ultrasound characteristics can predict increased risk of mutations (31 –34), with one study showing 82% PPV for BRAF positivity (32). In our prior study, both ACR TI-RADS and ATA criteria showed similar performance in predicting suspicious Afirma GEC. Deep learning-based artificial intelligence via ultrasonography may further improve diagnostic performance in differentiating malignant from benign thyroid nodules (35 –37), with one model producing 97% specificity and 90% PPV for recognizing nodules with high-risk mutations on an institution-specific 23 gene panel (36).
This study is based on our initial randomized trial that subjected all nodules with indeterminate cytology to molecular testing (9), compared with other trials that had a more selective approach in which only suspicious subsets of nodules were subject to molecular testing. Our study results are thus more generalizable. When considering the low BCR of molecular testing in high-risk ultrasound category nodules and significantly higher BCR in low/intermediate ultrasound nodules, selective molecular testing in the latter group may improve diagnostic value and cost-effectiveness. The improved specificity of Afirma GSC and ThyroSeq v3 also leads to increased accuracy as rule-out tests (8,26 –28). Thus, molecular test-negative, cytologically indeterminate nodules may be more safely monitored and avoid unnecessary diagnostic surgeries.
For ultrasound to be routinely used in determining the management of cytologically indeterminate thyroid nodules, the variability inherent to different ultrasound classification guidelines must be taken into account. Although many studies have demonstrated high concordance rates between ATA and ACR TI-RADS (23,38,39), other studies have shown variability in diagnostic performance between ATA and ACR TI-RADS (13,22,40,41). We hypothesized that the diagnostic performance between ATA and ACR TI-RADS would be highly concordant in cytologically indeterminate nodules based on the results of our prior study (16).
Our similarly high concordance rates between ATA and ACR TI-RADS may be at least partially attributable to the exclusive evaluation of cytologically indeterminate nodules and the retrospective nature of their evaluation. A recent study by Yim et al. demonstrated high concordance rates (84.1–100%) among ATA, ACR TI-RADS, and the Korean Thyroid Association/Korean Society of Thyroid Radiology guidelines in high and intermediate suspicion thyroid nodules (40). Moreover, the PPV and specificity for malignancy for high suspicion nodules are robust across multiple ultrasound classification systems (2,38,42), indicating that sonographically high-risk nodules can be reliably detected among various ultrasound classification guidelines. Thus, we consider our findings to be similarly applicable to both ATA and ACR TI-RADS classification systems.
Cost-effectiveness studies of molecular testing for thyroid nodules with indeterminate cytology have shown mixed results. Models using Afirma GEC showed potential cost-effectiveness by avoidance of surgery and reducing two-stage thyroidectomy (11,43); however, a different model utilizing Afirma GEC showed that diagnostic lobectomy may be more cost-effective when including expenses of long-term follow-up in both 5- and 10-year spans, such as serial ultrasounds and repeat molecular testing (10). A retrospective cohort study that utilized ThyroSeq v2 showed no significant decrease in surgery rates or overtreatment, and the addition of molecular testing instead increased overall costs (44). The first hypothetical model that utilized the updated and more specific ThyroSeq v3 and Afirma GSC showed that either molecular study was more cost-effective than diagnostic lobectomy, including 20 years of surveillance (12).
Cost-effectiveness is driven by a variety of factors, including specificity of the molecular tests, costs of molecular testing (ranging from $3000 to $7500) versus surgery (ranging from $9000 to >$20,000), and underlying prevalence of malignancy in the population studied (15,45 –47). In particular, the cost of molecular studies can make a significant impact on cost-effectiveness (15) and could be controlled either by the direct costs of molecular testing or by frequency of testing within an institution. Restricting molecular testing to cytologically indeterminate nodules that are sonographically low/intermediate risk may improve cost-effectiveness (14).
Another important consideration for using molecular testing is whether the results would impact patient management; in other words, if a suspicious molecular test would not influence the decision for surgery, the test should likely not be performed. Most nodules (75.9%) with Afirma GSC suspicious and ThyroSeq v3 positive molecular test results that remained unresected were due to patient preference to avoid surgery and conservatively manage their thyroid nodules or patients who were poor surgical candidates due to medical comorbidities. Patients who would not undergo surgery regardless of the molecular test result would likely derive little benefit from undergoing molecular testing.
Conversely, 65.6% of nodules with Afirma GSC benign or ThyroSeq v3 negative molecular test results that were resected were due to clinician recommendation, increasing nodule size, compressive symptoms, or from patient request. If nodule will be resected regardless of the molecular test result, the information provided by the molecular tests would also be of marginal benefit in these cases. Therefore, prior knowledge of a patient's ability or desire to undergo surgery before receiving the molecular test is also an important clinical factor in the cost-effectiveness of molecular testing in real-world settings. Judicious management of molecular tests should include into consideration whether the outcome of the test will impact future surgical management.
This study is limited by the nonoperative management of many nodules with benign/negative molecular test results as it not feasible to surgically resect all cytologically indeterminate nodules with Afirma GSC benign or ThyroSeq v3 negative molecular tests in real-world settings due to the well-validated high NPVs of these molecular tests (7,8,17,19,25,26). However, our BCRs for Afirma GSC and ThyroSeq v3 are consistent with that of prior studies (9,17,48,49), and the high sensitivity of both molecular tests have been previously validated (6,9,16,17,48,50). Although only a subset of nodules received ACR TI-RADS reads, the high concordance rate between ATA and ACR TI-RADS in the subset analysis is consistent with the results from our prior study (16). Thus, we considered our findings to be representative of both ultrasound classification systems.
Our prior study involving the same core group of four radiologists reported a 10% rate of interobserver variability when using the ATA classification system (16). Since the same core group of radiologists provided the majority of thyroid nodule ultrasound classifications, the interobserver variability in this study can reasonably be extrapolated from our prior study. Our high rates of diagnostic accuracy in determining malignancy in sonographically suspicious nodules was due in large part to the expertise of our radiologists, which enabled them to accurately classify nodules based on the ATA and ACR TIRADS systems. Since ultrasound is largely user-dependent, it is of paramount importance that the radiologists evaluating the cytologically indeterminate nodules have adequate experience and expertise with thyroid nodules. Although a strength of our study was the evaluation of both Afirma GSC and ThyroSeq v3, each group had a relatively smaller sample size.
Conclusions
Our study highlighted the diagnostic value of Afirma GSC and ThyroSeq v3 in their ability to rule out thyroid cancer. Cytologically indeterminate thyroid nodules (Bethesda III and IV) with ultrasound features that are high suspicion for malignancy may forgo molecular testing to avoid unnecessary costs. Molecular testing for nodules in low/intermediate ultrasound risk categories can decrease diagnostic surgery due to the relatively high BCR and high sensitivities of the current molecular test versions.
Footnotes
Authors' Contributions
T.X.H. and M.J.L. had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Concept and design: T.X.H., M.W.Y., and M.J.L. Acquisition, analysis, or interpretation of data: T.X.H., D.T.N., M.P., K.B., M.D., R.M., J.R., J.K., C.-H.T., M.W.Y., and M.J.L. Drafting of the article: T.X.H., D.T.N., M.P., J.R., and M.J.L. Critical revision of the article for important intellectual content: T.X.H., J.R., C.-H.T., M.W.Y., and M.J.L. Statistical analysis: T.X.H., J.R., J.K., C.-H.T., M.W.Y., and M.J.L. Obtained funding: none. Administrative, technical, or material support: T.X.H., D.T.N., M.W.Y., and M.J.L. Supervision: T.X.H., M.W.Y., and M.J.L.
Acknowledgment
We thank Ms. Elena G. Hughes for her logistical support during the preparation of the article.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Supplementary Material
Supplementary Figure S1
Supplementary Figure S2
