Abstract
Background:
The aim of this study was to demonstrate the analytical validity of an RNA classifier for medullary thyroid carcinoma (MTC).
Methods:
Fresh-frozen tissue specimens were obtained from commercial sources, and MTC diagnoses were confirmed by histopathology review. De-identified patient fine-needle aspiration biopsies (FNABs) and whole blood from normal donors were obtained. Total RNA was extracted, amplified, and hybridized to custom microarrays for gene expression analysis. Gene expression data were normalized and classified via a machine learning algorithm. Positive control materials were produced from MTC tissues and tested across multiple experiments and laboratories. Twenty-seven MTC tissue specimens were used to evaluate the sensitivity of the MTC classifier. Gene expression data from tissues and FNABs were used to model classifier response to mixtures of MTC samples with normal thyroid tissue, a benign thyroid nodule, a Hürthle cell adenoma, and whole blood. Select mixture conditions were confirmed in vitro. Assay tolerance to RNA input variation (5–25 ng) and genomic DNA contamination (30% by mass) was evaluated. The intra- and inter-run reproducibility and inter-laboratory accuracy of MTC classifier results were characterized.
Results:
The MTC classifier sensitivity of 96.3% [confidence interval 81.0–99.9%] was determined retrospectively using 27 MTC confirmed tissue specimens. One false-negative result in a necrotic tissue implicated sample necrosis in reduced classifier sensitivity. Dilution modeling of MTC samples with normal or benign tissues showed consistent detection of MTC down to 20% sample proportions, with in vitro confirmation of 20% analytical sensitivity. Classifier tolerance to RNA input variation (5–25 ng), genomic DNA contamination (30% by mass), and an interfering substance (blood) was demonstrated with 100% accurate classifier results under all tested conditions. The maximum observed run-to-run score difference for a single FNAB sample was ∼1 unit compared with the average score difference between 38 MTC and non-MTC FNABs of ∼32 units. MTC classifier results for 20 tissues processed from total RNA in two different laboratories showed 100% concordance.
Conclusions:
The MTC classifier, offered as part of the routine molecular testing of cytology-indeterminate thyroid nodules, demonstrates robust analytical sensitivity, specificity, accuracy, and reproducibility.
Introduction
M
Fine-needle aspiration is an accurate and well-established technique for the preoperative evaluation of thyroid nodules (4,5). However, MTC is not always suspected or accurately diagnosed by cytology, as the cytologic features of MTC can show significant overlap with those of other thyroid neoplasms, leading to low sensitivity and specificity (3,6 –13). In a recent international multicenter study, less than half of sporadic MTCs were specifically diagnosed preoperatively by cytology on fine-needle aspiration biopsies (FNAB) (14). While sensitive, the use of serum calcitonin screening (basal or stimulated) for the preoperative detection of MTC remains controversial because of a variety of factors, including differences in assay concordance and detection limits, a wide range of normal calcitonin levels in the population resulting in low test specificity (false-positives), and uncertain cost-effectiveness related to the low prevalence of MTC (3 –5,15 –19). Thus, a more accurate method for preoperative diagnosis of MTC is needed so that the optimal preoperative evaluation and surgery necessary to maximize patient outcome can be planned (4,20). As not all MTCs harbor RET or RAS point mutations, current gene mutations tests in thyroid FNAB samples are not highly sensitive for MTC (21,22).
The Veracyte Afirma Gene Expression Classifier (GEC) is a molecular test that reclassifies cytology indeterminate (Bethesda III–IV) thyroid nodules as either benign or suspicious (23,24). FNAB specimens from a patient are collected from a single nodule. Cytology specimens are prepared, and the corresponding GEC specimen is stored in nucleic acid preservative. When cytology identifies a nodule as indeterminate, the corresponding sample collected for genomic analysis is tested via a gene expression assay, the GEC, performed in the Veracyte CLIA-certified clinical laboratory. The GEC uses a series of “cassettes” to evaluate the expression of 25 genes associated with specific neoplasms potentially encountered in thyroid FNABs, upstream of a support-vector machine (SVM) algorithm that classifies the sample as benign or suspicious based on the normalized expression of 142 additional genes (23,24). The cassettes were developed through an unbiased, whole-transcriptome interrogation of histopathologically confirmed thyroid lesions, which identified distinct gene expression signatures specific to each type of lesion. One of these cassettes comprises an MTC classifier; the five classifier genes, CALCA, CEACAM5, SCG3, SCN9A, and SYT4, are involved in membrane structure, vesicular transport, and calcium-dependent exocytosis (23 –25). All cassettes were clinically validated against blinded expert histopathology review as part of the GEC on a large independent test set of consecutively collected and operated cytology samples (23). In particular, the MTC cassette correctly called 2/2 MTC positive samples (100% sensitivity), and generated no false calls of MTC on 318 non-MTC samples (Bethesda III–VI; 100% specificity) (23). Both of the MTC cases in this cohort had indeterminate FNAB cytology, and MTC was not suspected preoperatively. Subsequently, the MTC classifier identified 42/43 instances of surgically confirmed MTC in a series of >53,000 thyroid nodules screened by cytology, with a positive predictive value (PPV) of 98% (26).
This study presents the analytical validation of the MTC classifier following guidelines provided by the Evaluation of Genomic Applications in Practice and Prevention Working Group (EGAPP), the Centers for Disease Control ACCE Project: Evaluation of Genetic Testing, and Emergency Care Research Institute (ECRI) Evidence-based Practice Center (27 –29). The performance of the MTC classifier was evaluated across a diverse set of histopathologically confirmed MTC tissue specimens and thyroid FNABs. The analytical sensitivity of the MTC classifier to limiting assay RNA input is examined, and the dilution of an MTC positive sample with benign thyroid tissues is computationally simulated (24). The study demonstrates analytical specificity through assessment of assay performance in the context of potentially interfering genomic DNA, and simulates the effect of blood contamination on classifier performance. It also determines the reproducibility of MTC classifier results across technical replicates, across sample processing batches, and across laboratories.
Materials and Methods
Specimens
Fresh-frozen MTC tissue specimens were obtained from tissue bank sources (Cooperative Human Tissue Network [CHTN], Asterand, Inc., and ProteoGenex). Tissue specimens were shipped frozen and stored at −80°C upon receipt. The supplied clinical records were reviewed to confirm the surgical pathology diagnosis of MTC for each specimen. Prior to use in analytical studies, the MTC diagnosis for each tissue specimen was verified histopathologically using tissue sections prepared in-house. Briefly, frozen tissues were mounted for sectioning with Tissue-Tek O.C.T. medium (Sakura Finetek), and thick sections (1.5–3 mm) were generated on a CM1800 cryostat (Leica Biosystems). Thick sections were immersed in 10% buffered formalin, paraffin embedded, sectioned at 5 μm thickness, and separate sections were processed for hematoxylin and eosin (H&E) staining and calcitonin immunohistochemistry (IHC) using standard procedures (Histo-Tec Laboratory). Slides were read and specimens confirmed as MTC by a board-certified anatomic pathologist at Veracyte. Tumor content is expressed as percent tumor by area from review of representative H&E sections (Table 1).
Tissue sections with <100% tumor content included non-malignant components comprised predominantly of normal thyroid follicles and intervening stroma.
MTC, medullary thyroid carcinoma; H&E, hematoxylin and eosin; Ct IHC, calcitonin immunohistochemistry.
Following review and waiver by an Institutional Review Board, FNABs referred to the Veracyte Clinical Laboratory Improvement Amendments (CLIA) certified laboratory for GEC testing, which activated the MTC cassette (23), were selected for physician follow-up by the Veracyte Medical Director. Cytology preparations (ThinPrep, Hologic) and smears (Diff-Quick, VWR International) were performed following standard procedures. Patients with documented MTC histopathology or clinical sequelae consistent with MTC (e.g., elevated serum calcitonin) were assigned a clinical truth label of MTC (26). Forty-one MTC FNAB samples (24 Bethesda III–IV, 12 Bethesda V, 5 Bethesda VI by cytology) were de-identified following standard procedures prior to use in analytical studies. The normal thyroid tissue, benign follicular nodule, Hürthle cell adenoma, and blood specimens used in this study were described previously (23,30).
RNA extraction, amplification, and microarray hybridization
For tissue specimens, a 20 μm frozen section taken immediately adjacent to the thick sections processed for histology were prepared, and RNA was isolated using the AllPrep Micro Kit (QIAGEN). Total RNA yield and quality was determined as described previously (30). Total RNAs from clinical FNAB samples and tissue specimens were amplified and hybridized to a custom microarray, and gene expression was analyzed using the GEC as described (30). Each expression analysis processing batch contained GEC benign and malignant thyroid tissue controls (30) as well as a no template (water) negative control. RNA from histopathologically confirmed MTC samples served as positive controls for the MTC classifier in each processing batch. Preanalytic test requirements, including sample collection and processing procedures and MTC classifier training and development, were described and validated previously (23,24). Confidence intervals (CI) were calculated as exact, two-sided Clopper–Pearson binomial confidence intervals.
In silico mixture modeling
Mixtures of gene expression data were modeled in silico from single sample data as previously described (24,30). Briefly, 200 simulations were performed for each mixture proportion, resulting in 200 independent classifier results (calls) per condition. The frequency of correct calls, that is, calls of MTC in a mixture containing an MTC-positive sample, are expressed as proportions. Each simulation incorporates known technical variability for in vitro mixing and gene expression intensity measurements. The statistical methods used in this study were reviewed and approved by a biomedical statistician.
Results
Tissue sensitivity
The prospective clinical collection of a diversity of confirmed MTC specimens remains a challenge due to the low prevalence of this neoplasm. Therefore, the performance of the MTC classifier was evaluated using a set of commercially available, histology verified MTC tissue specimens representing 27 different patients (Table 1). Out of 22 tissues with 100% estimated tumor content, 21 were correctly identified as MTC by the MTC classifier. Two heterogeneous MTC specimens containing normal thyroid follicles and intervening stroma and an estimated 90% tumor content were correctly classified, as were three additional specimens with 80%, 75%, and 40% estimated tumor content, respectively (Table 1). Additionally, three different MTC confirmed specimens from patient 22 were independently evaluated; all three were called MTC by the classifier, including one specimen with estimated blood contamination comprising 75% of the tissue section (data not shown). The false-negative in this tissue set (Patient 2, Table 1) is MTC by standard pathology criteria (H&E morphology in combination with calcitonin immunoreactivity). Analysis of adjacent, sequential sections from this tissue identified a gradient of decreasing MTC classifier gene expression and classifier score. The H&E morphology of these adjacent sections showed evidence of coagulative necrosis, with proportions of non-viable tumor in some sections of up to 60% (data not shown). Overall, the MTC classifier correctly identified MTC in 26/27 cases, providing a sensitivity estimate on tissues of 96.3% [confidence interval (CI) 81.0–99.9%].
Analytical sensitivity
Variability in routine quantitation of total RNA concentration for FNAB samples can result in modest variation around the nominal 15 ng total RNA assay input requirement. To characterize the effect of varying total RNA input on MTC classifier sensitivity, an assay input titration was performed on total RNA extracted from FNABs. Total RNA from three different nodules with confirmed MTC was assayed in triplicate at total RNA input quantities of 5, 10, 15, and 25 ng. GEC results were obtained at all input amounts, and all replicates for each nodule were classified correctly as MTC at all tested input levels (Fig. 1A). As previously reported (30), four similarly tested MTC negative FNABs did not trigger the MTC classifier at any of these input amounts, demonstrating classifier robustness to variation in assay input quantities (Fig. 1A).

Analytical sensitivity. (
The FNAB procedure has the potential to introduce material from tissues adjacent to a nodule, which could in turn dilute the molecular signature of the nodule being sampled. To examine the effect of sample dilution on MTC classifier performance, dilution was modeled using an in silico simulation previously demonstrated to show agreement with in vitro microarray data (31) and for the GEC (24,30). Independently, an MTC positive FNAB sample and an MTC positive tissue sample were mixed in silico with ex vivo FNABs from a normal tissue sample and a benign thyroid nodule, and an FNAB derived from a Hürthle cell adenoma with gold-standard histopathologic confirmation, respectively (23). Mixture proportions from 100% MTC specimen to 0% MTC specimen were modeled in 10% decrements (Fig. 1B and C). In simulated mixtures, both the MTC tissue and FNAB sample were correctly classified a majority of the time in mixtures representing up to fivefold (80%) dilution of the MTC sample (Fig. 1B and C). To confirm the modeling result, RNA mixtures of an MTC FNAB with benign nodule RNA were tested in vitro in triplicate at mixture proportions of 40%, 30%, and 20% MTC FNAB RNA by mass. All nine technical replicates were classified as MTC by the classifier, confirming the in silico modeling result and demonstrating a robust limit of detection for MTC in RNA mixtures representing up to 80% dilution of the MTC sample.
Analytical specificity
The presence of an interfering substance with the potential to inhibit a molecular assay could cause a false-negative result. If the substance in question generates a positive signal through a nonspecific mechanism, a false-positive may result. Deviation in the RNA extraction process can result in sporadic genomic DNA contamination of the purified total RNA. During GEC testing, the routine in-process quality control of the RNA extraction process using the Bioanalyzer chip (Agilent) identifies and excludes samples with genomic DNA contamination >30% by total nucleic acid mass. To test the effect of 30% genomic DNA contamination by mass on the classification of MTC, 6.4 ng of genomic DNA extracted from an MTC positive tissue sample was added to 15 ng of total RNA isolated from the same sample, and the mixture assayed in triplicate. All three replicates of the MTC positive tissue were correctly called MTC by the classifier in the presence of 30% genomic DNA contamination (Fig. 2A). Three MTC negative samples assayed similarly with 30% genomic DNA added (30) were not identified as MTC by the classifier, demonstrating that incidental genomic DNA contamination from the sample extraction process will not generate false MTC classifier calls. FNAB can also contain varying quantities of blood (32). It is therefore important to determine whether blood itself, or the presence of blood in an FNAB, interferes with the detection of gene expression specific to MTC. To examine the effect of blood on the MTC classification, dilution was modeled using in silico simulation. An MTC positive FNAB and an MTC positive tissue sample were mixed in silico with nine different blood samples collected from nine different individuals as described previously (30). Mixture proportions spanning the range from 100% MTC sample to 0% MTC sample were modeled in 10% decrements (Fig. 2B and C). Both the MTC tissue and FNAB were correctly classified a majority of the time in simulated mixtures representing up to 80% blood (Fig. 2B and C). None of the nine pure blood samples were called MTC, suggesting that the gene expression signature of blood alone is unlikely to result in a nonspecific false call of MTC.

Analytical specificity. (
Assay control material
MTC-positive control materials were produced from tissue specimens with histology confirmed MTC. An RNA control material derived from patient 22 (Tables 1 and 2) was tested in triplicate in the Veracyte Research and Development (R&D) laboratory as part of the interfering substances study, and in singlicate in the R&D and CLIA clinical laboratories as part of the accuracy study described below. An RNA control material derived from patient 13 (Tables 1 and 2), in use as an analytic control for routine clinical testing, shows a classifier score SD of 0.507 across 235 technical replicates, versus an inter-run score SD of 0.321 estimated using three MTC FNABs (Fig. 3). All positive control replicates processed to date have classified correctly as MTC (data not shown).

MTC classifier score variation. Intra-run (R1, R2, R3) and inter-run (FNAB 1–3) score variation, illustrated as per-replicate score differences relative to the mean score for each grouping, is shown for 27 technical replicates of three MTC-positive FNABs. Score differences for an additional 38 MTC-positive and 38 MTC-negative FNABs are shown within and across diagnostic categories (MTC+ vs. MTC−).
Tissue identifiers for MTC samples correspond to the patient identifiers used in Table 1.
+, MTC classifier call of positive; −, MTC classifier call of negative.
Reproducibility
The Afirma GEC was previously demonstrated to show high reproducibility for technical replicates within a processing run, across processing runs, and between the Veracyte R&D and CLIA laboratories (30). This study characterizes the intra- and inter-run reproducibility of MTC classifier calls. FNABs from three different patients with histopathologically confirmed MTC were each processed in triplicate from total RNA in a CLIA-certified laboratory as part of one sample processing batch. Processing of these samples in triplicate was repeated twice more in the clinical laboratory, utilizing different assay operators and assay reagent lots for the two subsequent processing batches. In each of three processing batches, all nine replicates for each FNAB, starting from total RNA (27 independent technical replicates in total) were correctly classified as MTC by the MTC classifier (Fig. 3 and data not shown). Thus, the MTC classifier demonstrates high reproducibility in the context of the multiple different operators and reagent lots that are typical of routine clinical laboratory operations.
The study also characterized variation in the MTC score underlying the classifier calls. Score differences were calculated for the 27 FNAB technical replicates within and across processing runs by grouping replicates by sample either within a run (R1, R2, R3) or across runs (FNAB 1–3), and calculating score differences relative to the mean score for each grouping (Fig. 3). Calculated thus, the estimated intra-run and inter-run SD of MTC scores for FNAB samples is 0.246 and 0.321, respectively, with a maximum observed score difference across runs for a single sample of ∼1 unit on an ∼40-unit scale, or ∼2.5% of the score range. To place this degree of technical variation in context, Figure 3 also illustrates score differences for 38 MTC positive (21 Bethesda III–IV, 12 Bethesda V, 5 Bethesda VI cytology) and 38 MTC negative (19 Bethesda V, 19 Bethesda VI cytology) FNAB samples encountered in routine clinical practice. Scores for MTC positive and MTC negative samples routinely differ by up to ∼35 units (Fig. 3). The MTC classifier thus demonstrates high reproducibility of calls and scores within and across processing runs, with a bimodal separation of scores observed between MTC positive and negative sample groups.
Accuracy
Inter-laboratory concordance of test results is an important measure of test accuracy (29). To characterize the accuracy of the MTC classifier, 10 tissue specimens positive for MTC and 10 GEC control tissues negative for MTC were processed through classification in the laboratory where the test was developed (Veracyte R&D laboratory) and independently in the CLIA laboratory on a different day using different operators, reagent lots, and equipment. The MTC classifier results for all 20 specimens were accurate and 100% concordant (Table 2), thus demonstrating MTC classifier accuracy.
Discussion
The development and validation of molecular tests for neuroendocrine lesions is challenging due to the low prevalence of these malignancies. To extend the body of evidence demonstrating the external validity of the MTC classifier (23,26), sensitivity was assessed on a set of 27 MTC tissue samples with associated reference standard diagnosis, and the correct call of MTC was observed in 26 samples, resulting in 96.3% sensitivity. The single false-negative tissue displayed clear MTC pathology confounded by the presence of coagulative necrosis. Elevated expression of CALCA, the calcitonin precursor gene, was noted in the array data for this sample, but expression of other MTC classifier genes was not meaningfully elevated above baseline levels (data not shown). While classifier error remains a formal possibility, the available evidence strongly suggests a negative association between necrosis and the ability to detect the expression of genes characteristic of MTC.
Thyroid FNABs can represent complex and potentially dilute mixtures of follicular and parafollicular cells with stroma, lymphocytes, colloid, cyst contents, and blood (24). Therefore, a molecular test intended for use on thyroid FNAB should demonstrate resistance to analyte dilution and potential interference (30). The MTC classifier demonstrates high analytical sensitivity, with consistent detection of MTC at one-third of nominal input requirements, and no evidence of false calls in benign or malignant FNAB samples across a range of input levels. Mixing simulations on array data consistently detected MTC in mixtures representing up to fivefold dilutions of MTC samples, demonstrating the robustness of the MTC classifier to sample dilution. Interference of test performance due to genomic DNA or blood contamination was not evident, with no false-positive calls observed in the presence of 30% genomic DNA or at any level of blood mixture. False-negative calls are anticipated only at mixtures representing ≥80% blood contamination by gene expression, as supported by mixture modeling and the correct classification of an MTC tissue from patient 22 containing ∼75% blood contamination.
During routine clinical use of the GEC, a FNAB from a nodule subsequently identified as a paraganglioma arising within an intrathyroidal parathyroid gland gave rise to a false-positive classifier call of MTC (26). This paraganglioma sample, and a second surgically confirmed paraganglioma that was not called MTC by the classifier, showed elevated expression of SCG3, SYT4, and SCN9A, but nominal levels of CALCA and CEACAM5, consistent with the absence of calcitonin and CEA expression reported for thyroid paraganglioma (33). By histology, paraganglioma shares a differential diagnosis with MTC, Hürthle cell carcinoma, and other neuroendocrine tumors; the Multiple Endocrine Neoplasia type 2 syndrome resulting from germline mutations in the RET proto-oncogene is characterized by related tumors in multiple endocrine organs, including medullary carcinomas of the thyroid and intra-adrenal paragangliomas, known as pheochromocytomas (4,33). No instances of MTC classifier cross-reactivity to a variety of benign or malignant thyroid nodules was observed, including Hürthle cell adenomas and carcinomas (100% specificity), during the prospective validation of the GEC (23). The very low incidence of paraganglioma (33) does not substantially change the overall negative predictive value and PPV of the MTC classifier. Recently, a 15-gene subset of a 92-gene molecular classifier for tumor origin showed promise in distinguishing pheochromocytoma and paraganglioma from MTC by retrospective principal component analysis on formalin-fixed, paraffin-embedded tissues (34). None of these 15 genes is represented in either the Afirma GEC or MTC classifiers.
In line with EGAPP, ACCE, and ECRI technical evaluation criteria for novel molecular tests, the analytical reproducibility of the MTC classifier was evaluated within processing runs, across processing runs within a laboratory, and across laboratories. High concordance is demonstrated of MTC classification and score within and across three processing runs in a CLIA-certified laboratory, using three surgically confirmed patient FNABs, and absolute concordance of MTC calls between the CLIA laboratory and the laboratory in which the test was developed using 10 histopathology confirmed MTC tissues. EGAPP level I analytic validity is therefore demonstrated for the MTC classifier as an index test using reference standard samples processed under a spectrum of laboratory conditions.
The MTC classifier is an accurate and robust tool for the preoperative identification of MTC in Bethesda III–VI thyroid FNAB, and may be used in conjunction with cytopathology to increase the detection of this rare but potentially aggressive malignancy. Clinical implementation of this test may provide for earlier diagnosis of MTC, ensuring not only more timely treatment but also the proper preoperative workup and the correct surgical management of the disease (35).
Footnotes
Acknowledgments
The authors thank James Diggans for analytical support, Sharlene Velichko for technical assistance with tissue sectioning, Ambika Sopory and Maryam Hosseini for sample processing, Lyssa Friedman for sample procurement, and Moraima Pagan for manuscript assistance. This study was funded by Veracyte, Inc. Select tissue samples were provided by the Cooperative Human Tissue Network, which is funded by the National Cancer Institute.
Author Disclosure Statement
D.G.P., Z.H., S.Y.K., R.J.M., M.G.M., R.T.K., P.S.W., and G.C.K. are Veracyte employees and equity owners. S.T.T. is an equity owner in Veracyte. G.C.K. has U.S. Patent 8,541,170 issued. R.T.K. is a consultant to Novo Nordisk.
