Abstract
Background:
Management recommendations for thyroid nodules rely primarily on the cytological diagnosis. However, 25% of biopsies render an indeterminate cytology for which management decision is more challenging due to heterogeneity of the specimens. This study aimed to stratify the cancer risk through subcategorization of indeterminate cytology.
Methods:
The indeterminate cytological specimens (Bethesda-III or IV) of 518 thyroid nodules consecutively evaluated at our academic cancer center between October 2008 and September 2015, blinded to the histological outcome, were retrospectively reviewed. Cytological specimens were subclassified into four groups: aspirates exhibiting nuclear atypia (n = 158; 31%); architectural atypia (n = 222; 43%); oncocytic features (n = 120; 23%); or other types of atypia (n = 18; 3%). The prevalence of malignancy and odds ratio for malignancy were calculated in 323 nodules with histological confirmation.
Results:
The prevalence of malignancy was 26% overall (20% in Bethesda-III and 29% in Bethesda-IV; p = 0.07), and 47%, 12%, 24%, and 25% for aspirates with nuclear atypia, architectural atypia, oncocytic features, or other types of atypia, respectively. The OR of nuclear atypia over architectural atypia was 6.4 (3.4–12.2; p < 0.001), and 2.7 over oncocytic features (1.4–5.1; p = 0.01), whereas the OR of architectural atypia over oncocytic features was 0.4 (0.2–0.9; p = 0.03). Results were similar for Bethesda-III and IV aspirates when analyzed independently. Furthermore, cytological subcategories improved cytology–histology correlation, as they were associated with distinct profiles of histological diagnoses (p < 0.001).
Conclusions:
Cytological subcategories can effectively stratify the risk of malignancy of thyroid nodules with indeterminate cytology and improve cytology–histology correlation.
Introduction
T
The authors have found that the institutional prevalence of cancer among each Bethesda category seems to fluctuate over time and only a rough estimate of the institutional predictive values achieved by each test can be given because it does not account for individual factors that modify the pretest prevalence of malignancy, necessary for the correct interpretation of individual test results (4,5). In the authors' experience, there was a significant discordance between the expected and observed performance of molecular markers, particularly among specimens with AUS/FLUS cytology (4,5). This might be explained, at least in part, by the intrinsic heterogeneity of the indeterminate categories, most evident in AUS/FLUS specimens. The diagnostic criteria for AUS/FLUS intersect with criteria in almost every other Bethesda category, and therefore the AUS/FLUS diagnosis is particularly susceptible to individual interpretation and experience (6,7). Among the scenarios that may correctly be classified in the AUS/FLUS category are aspirates with (i) mild nuclear atypia, insufficient to categorize the sample as diagnostic or suspicious for malignancy; (ii) prominent microfollicular proliferation that does not fulfill the criteria for follicular neoplasm; (iii) presence of oncocytic (Hürthle) cells in an aspirate with otherwise scant cellularity and colloid; and (iii) other types of atypia, including clotting artifact or reactive changes among other (8). Aspirates with high cellularity, scant colloid, and disturbed architecture meeting diagnostic criteria for follicular neoplasm can also present (i) mild or focal nuclear atypia insufficient for a “suspicious for” or “diagnostic of” papillary thyroid carcinoma diagnosis, and a predominance of (ii) follicular or (iii) oncocytic (Hürthle) cells. These cytological scenarios in FN/HCN overlap with those in AUS/FLUS because the distinction between indeterminate categories is not always straightforward.
It was hypothesized that each of these scenarios carries a distinct prevalence of malignancy. Therefore, differences in the prevalence of cancer among institutions could be explained in part by differences in the prevalence of these scenarios within the categories. Furthermore, if those differences in the prevalence of cancer are significant between different scenarios, subcategorization of indeterminate categories may be relevant for the selection and interpretation of the results of molecular marker tests, and for clinical decision making.
Materials and Methods
Study cohort
Between October 2008 and September 2015, a total of 2257 thyroid nodules underwent cytological evaluation in the Anatomic Pathology Department at Moffitt Cancer Center, following fine-needle aspiration biopsy at the authors' institution. In nodules with multiple biopsies, only the diagnosis of the last specimen was reviewed. Twenty-five percent (n = 574) of the nodules had an indeterminate cytology (AUS/FLUS or FN/HCN categories of the Bethesda system). Sixteen (3%) nodules for which cytology slides were not available and 31 (5%) in which the subcategorization was performed as part of an ongoing prospective observational study that began in July 2015 were excluded. The remaining 527 (92%) thyroid nodules were included in this Institutional Review Board–approved retrospective study (Fig. 1).

Patient cohort selection process. Diagnostic categories of The Bethesda System for Reporting Thyroid Cytopathology: B-I, non diagnostic; B-II, benign; B-III, atypia/follicular lesion of undetermined significance; B-IV, follicular/Hürthle cell neoplasm; B-V, suspicious for malignancy; B-VI, malignant. Proposed diagnostic subcategories for the B-III and B-IV specimens: NA, nuclear atypia; AA, architectural atypia; OF, oncocytic features; OTA, other types of atypia (only for B-III specimens).
Cytological findings
Two Board-certified cytopathologists reviewed the cytology specimens, aware of the prior cytological diagnosis but blinded to the histological diagnosis in resected nodules. After review, cytology was benign in nine (2%) aspirates, which were excluded from the analysis. Further, the diagnostic category was modified in 13 other aspirates: five aspirates originally classified as AUS/FLUS were reclassified as FN/HCN (2% of all AUS/FLUS specimens), and eight aspirates originally classified as FN/HCN were reclassified as AUS/FLUS (3% of all FN/HCN specimens). The cytology was further subclassified into four groups, as previously suggested: (i) aspirates with mild/focal nuclear atypia, insufficient to categorize the aspirate as suspicious for malignancy; (ii) aspirates with architectural atypia (e.g., microfollicles, nuclear crowding, and/or overlapping); (iii) aspirates with oncocytic features; and (iv) aspirates with other type of atypia (only for AUS/FLUS specimens) (9). Nuclear atypia was defined as the presence of any of the following features: pale chromatin, nuclear irregularities in size and shape, including enlarged or elongated appearing nuclei, nuclear grooves, and nuclear inclusions. All aspirates with nuclear atypia, regardless of the presence of architectural atypia or oncocytic features, were analyzed in the nuclear atypia group, except in the presence of reactive changes in which case were classified in the other types of atypia group. Eighteen (3%) nodules were classified in the other types of atypia group due to clotting artifact (n = 2), atypical cyst lining cells (n = 3), reactive changes (n = 10), or atypical lymphoid infiltrates (n = 3). Given the low prevalence of aspirates with other types of atypia, this group was excluded for most comparisons.
Histological evaluation
Through September 2016, 325 of the reviewed nodules had been resected. Histological correlation was available in 323 cases, which were used to calculate the prevalence of malignancy for each of the categories and subcategories. The histology was reviewed blinded to the cytological diagnosis in all available specimens with a malignant diagnosis (90%; 75/83) by three Board-certified pathologists with focused experience in head and neck pathology. Follicular variant of papillary thyroid carcinomas (FVPTCs) were reclassified as noninvasive follicular thyroid neoplasms with papillary-like nuclear features (NIFTPs) if the criteria recommended in a recent publication were met, or were classified as papillary thyroid carcinoma (PTC) if exclusionary criteria for NIFTP were identified (e.g., presence of invasion, >1% “true” papillary formation, psammoma bodies, tumor necrosis, high mitotic activity, and/or cell/morphologic characteristics of other variants of PTC) (10). The histology of two nodules originally diagnosed as follicular tumor of uncertain malignant potential was also reviewed and reclassified in both as follicular adenoma and NIFTP, respectively.
Statistical analysis
All analyses were performed using SAS v9.4 (SAS Institute, Cary, NC) and R v3.3.1 (The R Foundation for Statistical Computing, Vienna, Austria). Descriptive statistics and percentages are presented in the tables. Comparisons were performed using chi-square tests and Fisher's exact tests for categorical variables. Analysis of variance was used to compare means of continuous measures. Odds ratios (OR) and associated confidence intervals (CI) were calculated from the contingency tables. p-Values were adjusted for multiple comparisons with Holms' method.
The effect of two variables was evaluated in the distribution of the subcategories and their prevalence of malignancy. First, the possible effect of the time period over the distribution and prevalence of malignancy of the subcategories was evaluated. Outcomes were compared before and after 2014 because the Anatomic Pathology Department initiated a consensus review protocol for all AUS/FLUS specimens at that time in order to reduce the frequency of this diagnosis and to improve cytology–histology correlation, as described in previous publications (11 –14). Second, the possible effect of observer interpretation was explored by looking at differences between the two pathologists who reviewed the cytological specimens.
Results
Study cohort
Cytology of 518 thyroid nodules in 479 patients (M age = 56 years; 74% female) with confirmed indeterminate cytology were classified into one of four subcategories: 158 (31%) in the nuclear atypia group, 222 (43%) in the architectural atypia group, 120 (23%) in the oncocytic features group, and 18 (3%) in the other types of atypia group. The distribution of subcategories within the AUS/FLUS and FN/HCN was significantly different (p < 0.001) due to an over-representation of aspirates with oncocytic features in the FN/HCN category (Table 1). The mean nodule size of the cohort was 2.5 cm, and was not significantly different in any of the subcategories. The size of the nodules was <1 cm in 6%, 1–1.9 cm in 35%, 2–3.9 cm in 48%, and ≥4 cm in 11%. The distribution of the nodule size was not significantly different between subcategories.
Differences in the distribution of the subcategories between Bethesda III and Bethesda IV specimens.
Size missing in one patient (from FC group).
Differences in the mean size of nodules in each subcategory.
Differences in the size distribution of nodules between subcategories.
Differences on resection rates of subcategories between Bethesda III and Bethesda IV, nodules in the OTA group excluded for this comparison. The overall rate of resection in Bethesda III was also significantly different from that in Bethesda IV (p < 0.001).
AA, architectural atypia; Bethesda III, atypia/follicular lesion of undetermined significance; Bethesda IV, follicular/Hürthle cell neoplasm; NA, nuclear atypia; OF, oncocytic features; OTA, other types of atypia (only B-III specimens).
A total of 325 (63%) nodules were resected. Nodules with FN/HCN cytology had significantly higher rates of resection than nodules with AUS/FLUS cytology (71% vs. 53%; p < 0.001), and this difference affected all subcategories (p = 0.03). However, within each Bethesda category, the rates of resection between subcategories were not significantly different (Table 1).
Prevalence and risk of malignancy within subcategories
The overall prevalence of malignancy in the 323 biopsied nodules with cytology–histology correlation was 26%, falling to 15% if NIFTPs are considered “benign.” The prevalence of malignancy was not significantly different between AUS/FLUS and FN/HCN specimens, when NIFTPs are considered either malignant (20% vs. 29%; p = 0.07) or benign (12% vs. 18%; p = 0.20).
The prevalence of malignancy was, however, significantly different between the defined subcategories. Nodules in the nuclear atypia group had a prevalence of malignancy significantly higher than nodules in the architectural atypia (47% vs. 12%; p < 0.001) or oncocytic features (24%; p = 0.01) groups, and nodules in the architectural atypia group had a significantly lower prevalence of malignancy than the oncocytic features group (p = 0.03; Fig. 2). Most NIFTPs had cytology findings classified in the nuclear atypia or architectural atypia groups. Therefore, if NIFTPs were considered benign, the prevalence of malignancy in those two groups would drop by nearly half, whereas it would remain unchanged for nodules in the oncocytic features group. In that situation, the prevalence of malignancy between nodules in the nuclear atypia group (26%) would not be significantly different from that of nodules in the oncocytic features group (20%), but nodules in the architectural atypia group would still have a significantly lower prevalence of malignancy (6%) than nodules in the nuclear atypia group (p < 0.001) or in the oncocytic features group (p = 0.003).

Differences in the prevalence of malignancy between subcategories. Prevalence of malignancy in each subcategory for all nodules in the cohort (left), AUS/FLUS specimens (middle), and FN/HCN specimens (right), considering either NIFTP malignant (top panel) or benign (bottom panel). Significant differences shown in bold. OR, odds ratio, given with confidence intervals; NA, nuclear atypia; AA, architectural atypia; OF, oncocytic features.
The risk of malignancy in the nuclear atypia group was 5.4 times higher than in the architectural atypia group (OR = 6.4 [CI 3.4–12.2]), and 1.7 times higher than in the oncocytic features group (OR = 2.7 [CI 1.4–5.1]). On the other hand, the risk of malignancy of the architectural atypia group was 60% lower than in the oncocytic features group (OR = 0.4 [CI 0.2–0.9]). If NIFTPs are considered benign, the risk of malignancy in the nuclear atypia and oncocytic features groups would not be significantly different, but it would still be 4.9 times higher in the nuclear atypia group than in the architectural atypia group (OR = 5.9 [CI 2.5–13.6]) and 80% lower in the architectural atypia group than in the oncocytic features group (OR = 0.2 [CI 0.1–0.6]).
Correlation between histological diagnosis and cytological subcategories
There were significant differences in the distribution of histological diagnoses between AUS/FLUS and FN/HCN specimens (p < 0.001) and within subcategories (p < 0.001; Fig. 3 and Supplementary Table S1; Supplementary Data are available online at

Correlation between histological and cytological diagnosis. (
Differences in the prevalence of malignancy of subcategories by time period and by observer
There were no statistically significant differences in the distribution of the subcategories before or after 2014, when a consensus review of all AUS/FLUS specimens was started (p = 0.06), or between observers (p = 0.18; Table 2). Differences in the prevalence of malignancy of the subcategories were also not significant either by time period (p = 0.89) or by observer (p = 0.58).
Overall prevalence of malignancy: 26% before 2014, 25% from 2014, 27% for observer #1, and 24% for observer #2.
n/N, number malignant/total.
Discussion
This study found that cytological subcategories stratified the risk of malignancy of indeterminate thyroid nodules more effectively than the current categories of the Bethesda system. Furthermore, cytological subcategories were associated with distinct histological outcomes and therefore are likely to improve cytology–histology correlations. Differences in the proportion of these cytological scenarios within indeterminate categories between institutions could explain the known variability in their prevalence of malignancy and in the performance of molecular marker tests.
Strengths and limitations of the study
As this is a single-center, retrospective study, it is recognized that results could be different at other institutions, and the interpretation of the results may be limited by this design. Aspirates suspicious for malignancy, which exhibit a higher degree of atypia, were not included because they carry a significantly higher risk of malignancy—90% at the authors' institution—for which management is unlikely to be modified, regardless of additional testing results (4). All cytological specimens were reviewed blinded to the histological outcome for the cytological subcategorization, and 90% of the specimens with malignant histology were reviewed blinded to the cytological interpretation. However, the limitations of light microscopy in the diagnosis of thyroid pathology are acknowledged, particularly in follicular-pattern lesions, where inter-observer agreement is low, even among experts (7,15 –17). Moreover, similar disagreement is expected for the subcategorization of indeterminate thyroid cytology between different observers (18). The prevalence of malignancy might be overestimated, as it was calculated for resected nodules only. However, the rates of resection of the cytological subcategories were not significantly different, which strengthens the statistical differences observed between the groups in this series.
Cytological subcategories stratify the risk of indeterminate thyroid nodules
This study found that cytological subcategories are useful to stratify the risk of malignancy of indeterminate thyroid nodules. Aspirates with nuclear atypia had the highest prevalence of malignancy (47%), which approached that of the suspicious for malignancy category reported by some institutions (19). On the other hand, nodules with architectural atypia had the lowest prevalence of malignancy (12%). The reclassification of NIFTPs as “benign” tumors would decrease the prevalence of malignancy in these two groups by half. However, NIFTPs are possibly tumors in a benign to malignant transformation and therefore are not clearly benign or malignant (10). Given that at the present time the diagnosis of NIFTPs is based on surgical pathology, since their diagnosis can only be established after complete resection and evaluation of the tumor capsule, it seems appropriate to include them as “malignant” to calculate the prevalence of malignancy, and to consider them as true positives or false negatives for the interpretation of molecular marker test results (20). Regardless of whether they were considered to be benign or malignant, the risk of malignancy was around fivefold higher in aspirates with nuclear atypia than in aspirates with architectural atypia.
Despite these limitations, the utility of cytological subcategories for risk stratification is supported by the literature (18,21 –25). Most previous studies have focused on subcategorizing AUS/FLUS specimens, given the intrinsic heterogeneity within this category (8). A recent editorial summarized the outcomes of several studies in which AUS/FLUS specimens had been stratified with cytological subcategories, reporting an overall prevalence of cancer among resected nodules of 55% for aspirates with nuclear atypia, 22% for aspirates with architectural atypia, 6% for aspirates with oncocytic features, and 35% for aspirates not otherwise specified (26). Fewer studies have looked at the impact of nuclear atypia in follicular neoplasms, but their findings have also been consistent with the present results (27 –29). Cytological subcategories were associated with distinct histological outcomes in this study, which strengthens previous observations, suggesting that their implementation into clinical practice is likely to improve not only malignancy risk prediction but also cytology–histology correlation (22,27,30 –33).
Implications for molecular marker test results interpretation and development of future guidelines
The significant differences in the pretest risk of malignancy of cytological subcategories are expected to have an impact on the interpretation of molecular marker tests, particularly in nodules with nuclear atypia. The high pretest risk of malignancy in this group might make the “negative”/“benign” results of molecular tests unreliable, with a negative predictive value likely <95% (3,34). In contrast, a positive result, despite increasing the probability of cancer, might not change management because thyroid lobectomy seems sufficient for most suspicious nodules in this category (35,36). Molecular marker test performance might also be compromised in aspirates with oncocytic features, and it has not been specifically assessed in other cytological scenarios (4,37,38). Until that information is available, results need cautious interpretation.
This study suggests that subcategorization of indeterminate cytology improves the prognostication of histological outcomes with the potential to improve communication between pathologists and clinicians, and the sharing of information between centers, ultimately adding to the rigor of thyroid nodule and cancer diagnostics. The cytological subcategories should also improve the ability to counsel patients on the most appropriate next step because they seem to impact on the outcomes of subsequent tests, including repeat biopsy and molecular markers (37 –40). This aspect is critical in the era of personalized medicine. It is proposed that this subclassification scheme be standardized and incorporated into future iterations of The Bethesda System for Reporting Thyroid Cytopathology, as well as into future management guidelines for thyroid nodules.
Footnotes
Acknowledgments
This work was supported in part by the Biostatistics Core Facility at the H. Lee Moffitt Cancer Center and Research Institute, an NCI designated Comprehensive Cancer Center (P30-CA076292).
Author Disclosure Statement
No competing financial interests exist.
