Abstract
Background:
Fine-needle aspiration (FNA) plays a pivotal role in the initial evaluation of patients with thyroid nodules. Traditionally, aspirated material is expelled directly onto the microscope slide to make a conventional smear (CS). Recently, liquid-based preparations (LBP) have gained in popularity. This study compares the accuracy of these two preparation techniques in diagnosing thyroid nodules.
Methods:
A clinical database containing 5475 thyroid cytology consults from 2009 to 2013 was queried to identify 5169 CS and 306 LBP cases. Cytological diagnostic frequency rendered before and after second review were compared between LBP and CS. Correlation with the histology diagnosis was also calculated for each preparatory technique.
Results:
Age, sex, and nodule size were comparable between patients who had FNA processed by LBP and CS. More LBP cases than CS cases were inadequate (17% vs. 10%; p<0.001). LBP cases had fewer benign diagnoses (39% vs. 47%; p=0.003) and tended to have more malignant diagnoses (16% vs. 12%; p=0.09) when compared to CS. Indeterminate and suspicious categories were comparable between LBP and CS. Correlation with histology was also comparable between both techniques.
Conclusion:
LBP was associated with a significantly higher proportion of inadequate and a lower proportion of benign diagnoses. Thus, universal adoption of LBP may introduce more inadequate samples. Future investigations should explore the lack of on-site evaluation with LBP as a potential source for the high inadequate rate.
Introduction
H
While the methods of collecting (7) and reporting (8 –11) thyroid cytology have undergone improvements in recent years, the preparation of the FNA specimen for microscopic evaluation has changed little over the past century. A conventional smear (CS) is made by coating aspirated material evenly onto a glass slide and staining it either with a modified Romanowsky stain after air drying or with the Papanicolaou stain after alcohol fixation. Each FNA pass processed by CS therefore yields at least one cytology slide, and most labs make two slides for each pass. Thus, a typical three to six pass thyroid FNA results can yield up to 6–12 slides. In contrast, liquid-based preparations (LBP) employ advanced concentration techniques to yield a single slide containing all the cytological material collected. All of the aspirated material is deposited directly into a vial of alcohol-based preservative and processed later in a cytopathology laboratory. Widespread use of the LBP technique for thyroid FNA has not occurred (5,12 –17) primarily because evidence for its accuracy is limited. Furthermore, most of the published literature arose from laboratories that have converted to LBP and reported their experience before and after conversion. In this study, an attempt is made to address these issues by comparing more than 5000 LBP and CS cytology specimens sent to the authors' institution for consultation.
Materials and Methods
Selection and description of cases
A Filemaker® database (Santa Clara, CA) containing all cytology consults received at the Johns Hopkins University School of Medicine was retrospectively searched after obtaining Institutional Review Board approval. The search included thyroid FNA biopsies that were reviewed between January 2009 and December 2013. January 2009 was chosen because this was the date when The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) was first implemented at the authors' institution.
The study included 5475 cases out of 6047 unique cases identified. Of the 6047 cases initially identified, 572 were excluded because they were sent with a diagnosis that could not be translated into TBSRTC or were sent for expert morphological analysis only and not for inter-institutional consultation preceding medical or surgical consultation. The resulting 5475 cases came with an outside TBSRTC diagnosis that was either unequivocally benign or malignant, or an indeterminate diagnosis that was accompanied by clear follow-up recommendations that correlated with TBSRTC guidelines. All cytological slides sent for review were screened by a cytotechnologist and then reviewed by a board-certified cytopathologist. The cases were given a final diagnosis in accordance with TBSRTC, with all malignant cases and cases with a disagreement involving a change from outside diagnosis spanning two or more TBSRTC categories were reviewed by multiple cytopathologists at a consensus conference.
Of the 5475 cases, 1407 underwent surgical resection and thus had a corresponding histopathological diagnosis. Eight-one cases were excluded because the histopathology results could not be correlated with the exact location of the FNA biopsy. Reasons for this included an equivocal aspiration site in a thyroid with multinodular disease, multifocal microcarcinoma, or substantial discrepancies between the aspiration location and radiographic nodule size or gross findings. Thus, 1326 cases had available histological correlation.
CS and LBP techniques and cytological review
Of the 5475 cases included in this study, 5169 were CS and 306 were LBP. All smears were made at the outside institution where the primary cytological diagnosis was made. The same slide(s) was/were reviewed at the authors' institution and diagnosed by application of TBSRTC. Specimens were deemed adequate if they contained no atypical findings and contained approximately six clusters of follicular cells with at least 10 cells per cluster. If these criteria were not met, the specimen was defined as inadequate. Indeterminate diagnoses included atypia of undetermined significance (AUS), suspicious for follicular neoplasm (SFN), or suspicious for Hürthle cell neoplasm (SHCN). Suspicious for malignancy (SFM) was considered a separate entity due to its well-defined high risk of malignancy.
Statistics
Custom Perl scripts (
The type of smear was considered a dichotomous variable (CS or LBP). Age of the patient, size of the nodule, and number of smears (for CS) were considered continuous variables. Other demographic variables such as sex, distance from the outside institution, days elapsed between the date of specimen collection and the date of consult, and number of consults from the outside institution were treated as categorical variables (Table 1). Outside and re-reviewed diagnoses were coded into one of the six TBSRTC categories as follows: (a) inadequate, (b) benign, (c) AUS, (d) SFN/SHCN, (e) SFM, and (f) malignant. The presence or absence of Hashimoto's thyroiditis and the availability or lack of histological follow-up were treated as dichotomous variables. The final histological diagnoses (malignant or benign) were also treated as dichotomous variables.
p<0.05.
LBP, liquid-based preparation; CS, conventional smear; JHH, Johns Hopkins Hospital; IQR, interquartile range.
Simple and multiple logistic regression models were used to look for differences between CS and LBP (18). Unadjusted and adjusted odds ratios and p-values for the regressions were calculated. The TBSRTC diagnostic frequencies in the outside institution for LBP and CS were compared with a statistical power calculated based on a minimum significant difference of 5% for a power >80%. For these power calculations, 5% was chosen as the minimum significant difference because for all categories, differences of this magnitude are expected based on the range in institutional literature that has been published to date (1,3,6,10,11). To address issues of possible observer bias, LBP and CS diagnostic frequencies were also compared after inter-institutional second review. Agreement rates were calculated between the outside institution and the authors' institution overall and by type of smear. Pearson's chi-square test (19) was used when appropriate; Fisher's exact test was used if there were fewer than five observations in one or more cells (20). Because Hashimoto's thyroiditis is a known diagnostic confounder and because LBP is designed to eliminate inflammatory cells, the cases of Hashimoto's thyroiditis were evaluated separately to look for differential rates of diagnosis by LBP and CS (21). Correlation with histology as well as sensitivity, specificity, positive predictive, and negative predictive values of thyroid FNA diagnoses were calculated.
Results
Demographic features of LBP and CS cases
Of the 5475 cases reviewed, 5169 were CS and 306 were LBP. In order to examine potential demographic biases between patients who underwent FNA processed by LBP versus those who underwent FNA processed by CS, the aggregate features of both groups were compared (Table 1). Patients who had LBP or CS were similar in terms of median age, nodule size, and sex distribution. One laboratory feature, however, did significantly correlate with LBP use: the distance of the laboratory from the authors' institution. Labs that were farther away were more likely to send LBP compared to CS; large reference laboratories were more likely to use and send LBP, and these were farther away from the authors' institution than the regional laboratories. Similarly, there was a significant difference in the time elapsed between biopsy collection and second review; 39% of LBP cases were sent to the authors' institution within 30 days after biopsy compared to 51% of CS cases (p<0.001). Finally, multivariate analysis confirmed the findings that LBP and CS cases originated from similar patient groups but different laboratories.
Effect of LBP use among outside laboratories
There was no difference in overall distribution of TBSRTC diagnoses among outside institutions (p=0.127; Table 2). When each TBSTRC category was analyzed separately, only the malignant category showed a difference between LBP and CS (15% vs. 11%; p=0.015). When considering the diagnoses prior to second review in outside laboratories, there was no difference in Hashimoto's diagnosis between LBP and CS (11% vs. 13%; p=0.391).
p<0.05.
AUS, atypia of undetermined significance; SFN/SHCN, suspicious for follicular neoplasm/suspicious for Hürthle cell neoplasm; SFM, suspicious for malignancy.
Effect of LBP use on morphological re-review
Regardless of how the specimens were prepared, the most salient feature of re-review at the authors' institution was the increased proportion of cases deemed. However, this occurred more frequently in cases that were prepared by LBP than in cases that were prepared by CS. In the authors' institution, 52 (17%) of the 306 LBP smears were inadequate compared to 517 (10%) out of the 5169 CS smears (p<0.001). Fewer LBP than CS cases were diagnosed as benign after second review; 118 (39%) of the 306 LBP cases were benign compared to 2441 (47%) of the 5169 CS cases (p=0.003). The proportion of cases diagnosed as AUS, SFN/SHCN, and SFM were comparable between LBP and CS categories (17% vs. 18%, p=0.779; 5% vs. 7%, p=0.152; and 7% vs. 5%, p=0.305, respectively). There were a higher proportion of cases diagnosed as malignant with LBP compared to CS, albeit not statistically significant (15.69% vs. 12.38%; p=0.09). As expected, the proportion of cases diagnosed as Hashimoto's thyroiditis was smaller in LBP preparations compared to CS (9% vs. 12%), but this association was also not statistically significant (p=0.12). The statistical power of these comparisons is adequate to see a difference of >5% for each comparison that showed no significant difference.
Effect of LBP on positive and negative predictive values of thyroid FNA diagnosis
Specimen preparation type did not appear to alter the predictive values of the diagnoses in this study (Table 3a, b). Of the smears with corresponding histological diagnoses, inadequate smears of LBP and CS had a low likelihood of malignancy: 0/2 (0%) in LBP and 8/109 (7.5%) in CS. The NPV for cases diagnosed as benign by LBP and CS were comparable (92.9% vs. 95.6%). For the AUS category, 26.7% of LBP and 37.5% of CS were malignant on histology. Among the cases diagnosed as SFN/SHCH, 80% of LBP and 28.4% of CS were malignant on histology; in SFM, 83.3% of LBP and 80.6% of CS were malignant on histology. The PPV for diagnosing a malignancy by LBP and CS were similar: 100% by LBP compared to 99.2% in CS.
Discussion
This investigation found: (a) no difference in the distribution of diagnosis between LBP and CS at outside institutions; (b) an increase in the inadequacy rate of LBP when compared to CS at the authors' institution; (c) a decrease in the proportion of cases diagnosed as benign by LBP when compared to CS at the authors' institution, with no statistical difference in proportion of cases diagnosed as malignant and indeterminate; (d) no difference between LBP and CS in diagnosing Hashimoto's thyroiditis at either outside institutions or the authors'; and (e) no significant difference in the predictive values of LBP and CS.
Since its introduction in 1996 for streamlining of cervical cancer screening (22), LBP has been tried for virtually every type of cytological specimen, including the thyroid (5,12,23 –27). LBP is an attractive alternative to CS for several reasons. First, cytology specimens are screened by cytotechnologists (28), and screening one slide per case reduces the total number of slides reviewed, thereby decreasing labor costs. Second, LBP is highly automated and heavily regulated, so there is a high level of quality control inherent in using this technique. Third, LBP is amenable to immunocytochemistry (21,25,26,29 –31). Fourth, while CS requires the presence of a skilled individual (13,32) to prepare the smears on-site, LBP does not require this, as all aspirated material is deposited into the preservative. Despite these advantages, LBP also introduces undesirable nuances to the diagnostic process. First, LBP is associated with known morphological artifacts, including decreasing the presence of colloid and inflammatory cells, as well as the disaggregation of macrofollicles. Second, material deposited directly into LBP preservative cannot undergo onsite evaluation of adequacy (OSEA), so there is no way to know the adequacy at the time of aspiration and take additional passes to remediate a suboptimal specimen. Third, without OSEA or dedicated passes for other methods, LBP leads to a loss of the potential for collecting material for the performance of ancillary testing such as flow cytometry or targeted molecular testing (5,21,25,26,29,31,33,34).
While a detailed technical description of cytopreparatory techniques is beyond the scope of this article, the findings of this study suggest that what may appear to be a seemingly trivial cytopreparatory difference can impact the frequencies of diagnoses and thus patient care. When presented with roughly the same group of patients that differed only by how their thyroid FNA slides were prepared, larger proportion of the patients with LBP slides were deemed inadequate and malignant at the authors' institution, and fewer patients with LBP slides were identified as benign. While these differences do not appear to impact the correlation of the results with the final histology, too few cases of LBP went to surgery for the study to be adequately powered on this comparison.
The finding that a second review led to an increase in the inadequate rate is not surprising; this effect has been shown in an overall analysis of second review at this institution (35). However, the observation that a significantly higher proportion of LBP than CS cases are deemed inadequate calls into question the usefulness of how universal LBP may be. Anecdotally, sparsely cellular CS specimens are frequently identified as adequate on the basis of abundant colloid, so its removal in LBP processing has the potential to obscure the diagnosis of benign nodules. Furthermore, disaggregation of cellular clusters by LBP directly reduces the number of follicles available for analysis. Additionally, the LBP process does not allow OSEA, so there may be systematic undersampling that cannot be detected.
There are several limitations of this retrospective review. First, the database contains information from patients who came to the authors' institution for clinical consultation. These patients were already seen at another institution, so there is a risk of selection and referral bias. This bias is attenuated by exclusion of cases referred for morphological consultation only and because institutional policy requires a second review before any medical and surgical consult. Despite this, it is impossible to eliminate selection bias completely in a retrospective study, as there are a number of factors that encourage a patient to get a second review for reasons that are not captured in the records. It is known that the authors' institution receives a higher percentage of indeterminate samples than that reported in the literature, and that may also contribute to selection bias. Second, not all patients have histological follow-up. Only a small percentage of patients with malignant or suspicious nodules on FNA have subsequent histological follow-up. Hence, a final diagnosis for all the patients is not available. Thus, there is a chance of misjudging the risk of malignancy, especially in patients with benign and AUS FNAs. This bias is difficult to control but appears to play only a small role, since the overall malignancy rates for the different categories appear to correlate with those seen in other studies (1,6,35 –38). Finally, there is a mismatch between the number of LBP and CS received for second review, as LBP is currently not a widespread technique. An attempt was made to address this by comparing the frequency of diagnosis and running regressions on a comparable sample set of CS. The results were similar.
Conclusion
In conclusion, this study shows that LBP is associated with a higher proportion of cases deemed inadequate when compared to CS. Also, LBP diagnosis includes a smaller proportion of cases classified as benign when compared to CS. These differences between LBP and CS in diagnosing thyroid pathologies suggest that a universal adoption of LBP will come with significant limitations that need to be addressed by future well-designed prospective studies.
Footnotes
Acknowledgments
We thank Marie Diener-West, PhD, and Rosa Crum, MD.
Author Disclosure Statement
No competing financial interests exist.
