Abstract
Background:
RAS mutations are common in the available mutational analysis of cytologically indeterminate (Cyto-I) thyroid nodules. However, their reported positive predictive value (PPV) for cancer is widely variable. The reason for this variability is unknown, and it causes clinical management uncertainty. A systematic review was performed, evaluating the PPV for cancer in RAS mutation positive Cyto-I nodules, and variables that might affect residual heterogeneity across the different studies were considered.
Methods:
PubMed was searched through February 22, 2017, including studies that evaluated at least one type of RAS mutation in Cyto-I nodules, including any (or all) of the Bethesda III/IV/V categories or their equivalents and where the histological diagnosis was available. The PPV residual heterogeneity was investigated after accounting for Bethesda classification, blindedness of the histopathologist to the RAS mutational status, Bethesda category-specific cancer prevalence for each study, and which RAS genes and codons were tested. This was studied using five meta-regression models fit to different sets of Bethesda classification categories: Bethesda III, IV, or V (III/IV/V); Bethesda III or IV (III/IV); Bethesda III only; Bethesda IV only; and Bethesda V only.
Results:
Of 1831 studies, 23 were eligible for data inclusion. Wide ranges of PPV were found at 0–100%, 28–100%, and 0–100% in Bethesda III, IV, and V, respectively. Residual heterogeneity remained moderately high for PPV after accounting for the above moderators for Bethesda III/IV/V (21 studies; I 2 = 59.5%) and Bethesda III/IV (19 studies; I 2 = 66.0%), with significant Cochran's Q-test for residual heterogeneity (p < 0.001). Among individual Bethesda categories, residual heterogeneity was: Bethesda III (eight studies; I 2 = 89.0%), IV (12 studies; I 2 = 53.5%), and V (10 studies; I 2 = 34.4%), with significant Cochran's Q-test for Bethesda III (p < 0.001) and IV (p = 0.04).
Conclusion:
The PPV of RAS mutations in Bethesda III and IV categories is quite heterogeneous across different studies, creating low confidence in the accuracy of a single estimate of PPV. Clinicians must appreciate this wide variability when managing a RAS-mutated Cyto-I nodule. Future studies should seek to resolve this unexplained variability.
Introduction
C
In this study, a systematic literature review was performed with the primary objective of estimating the percentage of unexplained heterogeneity in PPV across different studies after accounting for study-level variables that could contribute to this heterogeneity.
Methods
A systematic literature review was performed in PubMed through February 22, 2017. The results of four searches were combined using different search words: (i) RAS mutation or molecular diagnostics and thyroid nodules, (ii) RAS and thyroid cancer, (iii) indeterminate thyroid nodules, and (iv) mutational panel and thyroid nodules. For the study to be included, at least one type of RAS mutation should have been tested in Cyto-I thyroid nodules, including any (or all) of the indeterminate Bethesda III/IV/V categories or their equivalents (4,9), and there must have been surgical follow-up to determine the histological diagnosis. When multiple studies were published from one institution, the study dates of inclusion were evaluated, and the largest non-overlapping cohorts were included and those that were redundant were excluded to avoid nodules being potentially counted more than once. The search generated 1831 references. After reviewing titles and/or abstracts, 1796 references were excluded. The remaining 35 papers were reviewed, and 12 were excluded: eight (10 –17) due to inability to extract the necessary data fully, and four (18 –21) due to potential overlap of patient cohorts with included studies (Fig. 1). Given the period when the search was done, histologies now considered noninvasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP) were likely considered the follicular variant of papillary thyroid cancer (fvPTC) in the earlier studies. To be consistent in the pathological readings between earlier and more recent studies, NIFTP were considered as malignant tumors in this analysis. Studies that specifically stated that the histopathology diagnosis was established when the pathologist was unaware of the RAS mutational status were considered blinded. Otherwise, the study was considered unblinded. The study did not control for whether the treating physician was blinded to the mutational status.

Summary of literature search strategy and results.
To measure the range of PPVs reported in the literature in each Bethesda category on a per medical center basis, different studies were assigned to a single center when possible as follows: (i) University of Pittsburgh Medical Center (UPMC) (6,7,22 –24), recognizing that one study (22) included several contributing centers; (ii) University of Leipzig (8,9); (iii) University of California at San Francisco (5); (iv) University of Sienna (25); (v) University of Ferrara (26); (vi) University of Iowa (27); (vii) Konkuk University School of Medicine (28); (viii) Xi'an Jiaotong University Health Science Center (29); (ix) McMaster University (30); (x) University of Sao Paulo (31); (xi) Institute of Pathology, Locarno (32); (xii) University of Pisa (33); (xiii) University of Minnesota (34); (xiv) Moffitt Cancer Center (35); (xv) Erasmus University Hospital (4); and (xvi) Brigham and Women's Hospital (36). Two studies had multiple centers participating to a degree that precluded assigning them to one center (37,38).
Subsequently, mixed-effect meta-regression models were used to estimate unexplained heterogeneity after accounting for study-level moderator variables. The “mixed-effect” designation indicates that (i) random-effect models were used rather than fixed-effect models and (ii) study-level moderator variables were included in the models (39). In the meta-analysis context, a random-effects model indicates that the parameterization allows the underlying “true” PPV to vary across studies rather than assuming a constant underlying PPV for all studies (40,41). The random-effects formulation is also used to allow generalization of results beyond the limited set of studies included in the meta-analysis. The “moderator variable” portion of the mixed-effect meta-regression indicates that study-level variables are included in the model, and parameter estimates are made for these variables, analogous to variables in a linear regression model. The moderator variables can account for some of the between-study variability. A moderator variable to account for medical center of each study was not included, as most centers reported only one study, and some studies could not be reasonably assigned to a single center.
In a random-effect meta-analysis, there are two sources of variability: s
2, a measure of the “typical” within-subject variance, and
and represents the percentage of total variability due to heterogeneity (42). In random-effect models that include study-level moderator variables (i.e., mixed-effect meta-regression models), the between-study variance
For the current analysis, mixed-effect meta-regression models were fit to estimate the percentage of residual heterogeneity for the primary outcome PPV, as well as secondarily for sensitivity, using the R package metafor (44). For each outcome, meta-regression models were fit to different sets of indeterminate nodules based on Bethesda classifications: Bethesda III/IV/V, which included all available data (Bethesda categories III, IV, V, III/IV combined when data for III and IV were inseparable, and IV/V combined when data for IV and V were inseparable); Bethesda III/IV, which included Bethesda categories III, IV, and III/IV inseparable (in other words excluding category V); Bethesda III only; Bethesda IV only; and Bethesda V only.
Each PPV and sensitivity model contained the following moderators: histopathology blindedness to RAS status, Bethesda category-specific cancer prevalence for each study, and RAS mutations tested (specifically NRAS 12 and/or 13, HRAS 12 and/or 13, and KRAS 61, as these mutations had the highest variability of inclusion across studies). Regardless, a mutation in any of the three codons for any of the three RAS genes was considered positive. For the two model sets that included multiple Bethesda classifications (III/IV/V and III/IV models), the Bethesda classification was included as a moderator variable, and a multilevel effect was included to account for the correlation among results from the same study. The I 2 measure for these multilevel models was calculated by a generalization of the I 2 formula described above (45,46). All I 2 values are reported with corresponding exact confidence intervals (47). As a supplemental analysis, for each model, the statistical test of heterogeneity is also presented based on Cochran's Q statistic, although this test is known to be underpowered unless many studies are included.
For all models, the arcsin square root transformation was applied to each outcome to fit model assumptions better. Sensitivity analyses using logit transformations were also performed, and the results were consistent. Pooled estimates of PPV and sensitivity are not reported because this was not the purpose of the meta-analysis and, due to the large amount of heterogeneity observed, we would not want a pooled estimate to be interpreted as the “true” PPV or sensitivity established by this meta-analysis.
Results
Table 1 reports the studies included in the meta-analysis, including blinded status and whether NRAS 12/13, HRAS 12/13, and KRAS 61 mutations were included in mutations tested. For each study, the total number of indeterminate nodules, the number of RAS-positive nodules, and the number of cancer-positive nodules are reported for each Bethesda classification level. For five studies (8,25,29,30,35), Bethesda III and IV were not reported separately, and for one study (4), Bethesda IV and V were not reported separately. The totals for these “inseparable” results are reported across the relevant columns. It could not be determined whether NRAS 12/13, HRAS 12/13, and KRAS 61 were included in one study (34); these mutations are considered to have values of “No” as moderators in the meta-regression models.
In addition, all studies tested for NRAS 61, HRAS 61, and KRAS 12/13, except for one (33) that did not test for HRAS 61 and KRAS 12/13 and two (5,29) that did not test for HRAS 61.
Total indeterminate nodules (RAS-positive nodules; cancerous nodules).
Numbers reflect a combination of two different Bethesda categories, as it was not possible to discern numbers in each indeterminate category.
n/a, not available: total indeterminate nodules not available, as only RAS-positive nodules were reported.
All studies tested for NRAS 61, HRAS 61, and KRAS 12/13, except for one (33) that did not test for HRAS 61 and KRAS 12/13 and two (5,29) that did not test for HRAS 61. Those RAS mutations (NRAS 61, HRAS 61, and KRAS 12/13) that were included in all or nearly all studies are not reported in Table 1 or included as moderators in the meta-regression models.
Residual heterogeneity in PPV models
Figures 2 –4 shows the unadjusted PPV values of different centers per Bethesda categories III, IV, and V, respectively. The PPV varied between 0% and 100% in Bethesda III, between 28% and 100% in Bethesda IV, and between 0% and 100% in Bethesda V. Residual heterogeneity remained moderately high for PPV, even after accounting for Bethesda classification, blinded status, Bethesda category-specific cancer prevalence for each study, and RAS mutations for Bethesda III/IV/V (I 2 = 59.5%) and somewhat higher for Bethesda III/IV, with Bethesda V excluded (I 2 = 66.0%). For both combined Bethesda classification models, Cochran's Q test for residual heterogeneity was statistically significant (p < 0.001; Table 2).

Positive predictive value (PPV) of Bethesda category III across different studies. PPVs for all studies are displayed as circles with confidence intervals. ‡Five institutions include: Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts; Santa Monica Thyroid Center, Santa Monica, California; Endocrinology and Diabetes Division, West Los Angeles Veterans Affairs Medical Center, Los Angeles, California; Texas Diabetes and Endocrinology, Austin, Texas; and The Austin Diagnostic Clinic, Austin, Texas. §This included University of Cincinnati and University of Colorado.

PPVs of Bethesda Category IV across different studies. PPVs for all studies are displayed as circles with confidence intervals. †This included: Arcispedale Santa Maria Nuova-IRCCS, Reggio Emilia, Italy. ‡Five institutions include: Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts; Santa Monica Thyroid Center, Santa Monica, California; Endocrinology and Diabetes Division, West Los Angeles Veterans Affairs Medical Center, Los Angeles, California; Texas Diabetes and Endocrinology, Austin, Texas; and The Austin Diagnostic Clinic, Austin, Texas. §This included University of Cincinnati and University of Colorado.

PPV of Bethesda category V across different studies. PPVs for all studies are displayed as circles with confidence intervals. †This included Arcispedale Santa Maria Nuova-IRCCS, Reggio Emilia, Italy. ‡Five institutions include: Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts; Santa Monica Thyroid Center, Santa Monica, California; Endocrinology and Diabetes Division, West Los Angeles Veterans Affairs Medical Center, Los Angeles, California; Texas Diabetes and Endocrinology, Austin, Texas; and The Austin Diagnostic Clinic, Austin, Texas.
All models included variables for blinded versus non-blinded status, RAS mutations tested (NRAS 12/13, HRAS 12/13, and KRAS 61), and Bethesda category-specific cancer prevalence for each study. Models including multiple Bethesda classification levels also included variables for Bethesda classification.
Two studies reporting only RAS-positive nodules were excluded because study-specific cancer prevalence was not calculable and could not be included in the model (24,36). Results for models not including the prevalence variable and including these two studies produced similar results.
Includes inseparable cases of combined III/IV (as above).
For models fit specifically to each single Bethesda level, after accounting for blinded status, Bethesda-category-specific cancer prevalence for each study, and RAS mutations included, the residual heterogeneity was highest for Bethesda III (I 2 = 89.0%), decreased for Bethesda IV (I 2 = 53.5%), and further decreased for Bethesda V (I 2 = 34.4%). Cochran's Q-test was significant for the residual heterogeneity for levels III (p < 0.001) and IV (p = 0.04) but not for level V (p = 0.30; Table 2). Two studies (24,36) were not included in the PPV models because only RAS-positive nodules were reported, and therefore study-specific cancer prevalence could not be determined and constituted a missing value for the modeling.
Moderator effects in PPV models
None of the moderator variables were statistically significantly associated with PPV in any of the models at the 0.05 significance level. The variables closest to this threshold were HRAS 12/13 (p = 0.07) and cancer prevalence (p = 0.08) in Bethesda III/IV/V.
Residual heterogeneity in sensitivity models
Residual heterogeneity for sensitivity models followed a similar pattern as those for PPV models. After accounting for Bethesda classification, blinded status, Bethesda category-specific cancer prevalence for each study, and RAS mutation, residual heterogeneity remained moderately high in Bethesda III/IV/V (I 2 = 65.1%) and Bethesda III/IV (I 2 = 70.2%). For both combined Bethesda classification models, Cochran's Q-test for residual heterogeneity was statistically significant (p < 0.001; Table 3).
All models included variables for blinded versus non-blinded status, RAS mutations tested (NRAS 12/13, HRAS 12/13, and KRAS 61), and, Bethesda category-specific cancer prevalence for each study. Models including multiple Bethesda classification levels also included variables for Bethesda classification.
Two studies reporting only RAS-positive nodules were excluded because sensitivity could not be calculated without the count of false negatives (24,36).
Includes inseparable cases of combined III/IV (as above).
For models fit specifically to each single Bethesda category, the residual heterogeneity in descending order was Bethesda III (I 2 = 78.7%), Bethesda IV (I 2 = 49.8%), and Bethesda V (I 2 = 15.5%) after accounting for blinded status, RAS mutations included, and Bethesda category-specific cancer prevalence for each study. Cochran's Q-test was significant for the residual heterogeneity for level III (p < 0.001) but not for levels IV (p = 0.052) or V (p = 0.49) at the 0.05 significance level (Table 3). Two studies (24,36) were not included in the sensitivity models because only RAS-positive nodules were reported and therefore sensitivity could not be calculated.
Moderator effects in sensitivity models
In the Bethesda III/IV/V model, sensitivity was significantly lower for Bethesda V nodules than for Bethesda IV nodules (p < 0.001 and p = 0.07 for Bethesda V versus Bethesda III) after adjusting for blinded status, RAS mutations included, and Bethesda category-specific cancer prevalence for each study, and accounting for within-center correlation. No other moderator variables were significantly associated with sensitivity in any of the models at the 0.05 significance level.
Discussion
The clinical significance of a RAS mutation in a Cyto-I thyroid nodule is a challenge, as this mutation is found in both benign and malignant thyroid nodules. The significant between-study heterogeneity seen in this study is similar to a recent meta-analysis of diagnostic value of RAS mutation in indeterminate thyroid nodules (48). The inclusion criteria of the previous study differed from the present study, as it excluded studies if mutational status of the three RAS genes was not done, and if they only represented one or two cytological classes of the indeterminate Bethesda categories. Additionally, the literature search was done in January 2016 in that study (48) versus February 2017 in the present study. While the two studies agree that between-study heterogeneity was significant, the between-study variability was then measured after accounting for multiple moderators, and moderate or high persistent (residual) heterogeneity was found in Bethesda III, Bethesda IV, Bethesda III + IV, and Bethesda III–V. As a result, this significant residual heterogeneity is uniquely interpreted as precluding a clinically useful point estimate of the pooled PPV of RAS mutation, and it is suggested that its cause should be the focus of future investigation.
In this analysis, the PPV of RAS mutations had substantial variability across different studies, especially among Bethesda III and IV categories. The cause of such wide heterogeneity is not fully known. Moderate and high heterogeneity remained after accounting for the cytology types included in each study according to the Bethesda classification, blinded status of the histopathologist to the RAS mutational status, Bethesda category-specific cancer prevalence for each study, and types of RAS mutations tested. In addition, similar residual heterogeneity was detected in RAS sensitivity to detect cancer.
This persistent (residual) heterogeneity after accounting for these moderators suggests the existence of other modifying factors, such as differences in the RAS mutation detection methodology or the threshold of pathology to differentiate benignity from malignancy at an individual level or at an institutional level, given the possibility of conformation toward the patterns of institutional peers. The differentiation of benign from malignant lesions on surgical histopathology is imperfect. If RAS-mutated clonal benign nodules were over-represented in a study, or if study histopathologists tended to classify RAS-mutated clonal nodules benign instead of cancer, then the PPV of a RAS mutation would be reduced. Lloyd et al. examined the observer variation among 10 experienced pathologists in the diagnosis of fvPTC. A concordant diagnosis of fvPTC was made by all 10 reviewers with a cumulative frequency of only 39% (49). Hirokawa et al. (50) investigated inter-observer variation among eight pathologists in assessment of encapsulated follicular lesions. Only in 10% of the cases was there complete agreement. El Sheikh et al. (51) assessed inter- and intra-observer variability in distinguishing fvPTC, follicular adenoma, and follicular thyroid cancer among six experts. Complete agreement occurred only in 13% of cases. In that study (51), even intra-observer agreement was quite variable and ranged from 17% to 100%. Similarly, Cibas et al. (52) reported high levels of disagreement between benign and malignant diagnoses by expert histopathologists in the evaluation of follicular lesions.
It is well known that RAS mutations can be present in cytologically and/or histologically benign nodules (53). The significance of a RAS mutation in thyroid nodules in predicting the transformation from adenoma to carcinoma, or from carcinoma in situ to invasive carcinoma, is unknown. Transgenic mouse models have demonstrated progressive changes from hyperplasia to adenoma and carcinoma in some animals (54). Supporting this progressive transformation model is the clinical finding that small follicular thyroid carcinomas are uncommon (55), which suggests that larger follicular thyroid carcinomas evolve from small follicular adenomas that have grown and transformed. Conversely, it appears that RAS alone is unable to transform a cell from benign to malignant, and that additional factors are needed for this event (54), and the field is at an early stage of gaining insights into what other factors may contribute to transformation (56). Even less is known regarding the potential rate at which an invasive transformation may occur. Overall, the relatively high prevalence of follicular adenomas and the low prevalence of follicular carcinomas suggest that the rate of transformation is low, and the presence of a RAS mutation is of questionable clinical significance for most patients. This is supported by the observation that cytologically benign thyroid nodules that did not undergo surgery were retrospectively found to harbor RAS mutations and were stable for a mean follow-up of 8.3 years (36).
The introduction of NIFTP (57) as a non-cancerous (but not necessarily benign) neoplasm may add more uncertainty to the PPV of RAS mutation in predicting clinically significant thyroid cancer, as 38% of NIFTP harbored RAS mutations. Wong et al. (58) reported that among GEC suspicious operated Bethesda III/IV nodules that a histological review following the introduction of NIFTP resulted in 64% of all cancer diagnoses being reclassified to NIFTP (including 88% of fvPTC diagnoses). The risk of cancer diminishes in Bethesda III/IV nodules when NIFTP is not considered cancer (59). Therefore, the predictive value of a RAS mutation toward predicting cancer could be lower when NIFTP is not counted as a true positive. Yet, this calculation may not be the most clinically relevant approach, since the diagnosis of NIFTP will not be known before surgery (as it requires a pathological diagnosis) to inform the decision to proceed with surgery. Additionally, some advocate that their surgical resection is necessary to avoid their potential progression to a carcinoma. From this perspective, the PPV of a RAS mutation is unchanged by the introduction of NIFTP when one considers true positives to include nodules warranting surgery (NIFTP and carcinomas). Still, the argument to perform a total thyroidectomy in the setting of a RAS mutation alone, and without other factors suggesting a high risk of malignancy, is even less compelling with the adoption of NIFTP nomenclature, as a hemithyroidectomy for them is considered sufficient.
The strengths of this study include its modest diversity of institutional experiences and number of individual studies and the statistical methods that account for different factors that may influence the results. Its weaknesses include the relatively small number of RAS-mutated nodules assigned with blinded histopathology. The requirement that the authors specifically state that the pathologist was blinded to the mutational status may have erroneously mis-categorized some blinded studies/nodules as unblinded. The blindedness of the treating physician was not included in the models, although it is possible that this information influenced which nodules underwent surgery in some studies.
In conclusion, this meta-analysis demonstrates that there is substantial residual heterogeneity in the PPV of a RAS mutation across different studies. Additional investigation is needed to explain this heterogeneity further and the degree to which inter-observer variability among pathologists and other factors contribute to this variability. Given this heterogeneity, the modest PPV for cancer in nodules with RAS-only mutations in many studies, and the trend toward a more conservative treatment of low-risk malignancies, hemithyroidectomy may be favored over total thyroidectomy in the management of these nodules in the absence of compelling findings in the contralateral lobe. Future studies are needed to understand the potential pathway(s) from RAS-mutated benign neoplasm to invasive carcinoma, and to clarify whether all RAS-mutated nodules are best served by surgery or whether other safe and more conservative approaches may be appropriate for some patients.
Footnotes
Author Disclosure Statement
M.A.L. reports receiving research funds from Veracyte for an unrelated project as well as speaker fees. M.A.L. also reports receiving research funds from Pathway Genomics, and previously was on the advisory board of Rosetta Genomics, from which he has also received research funds. K.N.P. reports receiving speaker fees from Veracyte. R.T.K. is a Veracyte employee and equity owner. No competing financial interests exist for the remaining authors.
