Abstract
Background:
Processing methods for fine-needle aspiration biopsy (FNA) samples mainly include conventional smears (CS) and liquid-based preparations (LBP). There is still debate as to which method is better and the diagnostic value and necessity of combining the two methods remains unclear. The objective of the current study was to compare the diagnostic performance of the two methods and their combined use in thyroid nodules.
Methods:
We analyzed thyroid cytopathology data from 16 medical centers between June 2010 and November 2025, comparing nondiagnostic and indeterminate nodules rates across preparation methods. For histologically confirmed samples, diagnostic performance metrics were calculated. Cases with separate CS and LBP descriptions (multi-diagnoses group) were analyzed for diagnostic consistency and performance.
Results:
In total, 89,392 thyroid FNA cases were included (49,309 CS, 13,161 LBP, and 26,922 combined). The rate of indeterminate nodules was 10.3% (CS), 10.9% (LBP), and 14.8% (combined), while nondiagnostic rate was lowest in the combined group (7.3% vs. 10.5% for CS and 17.9% for LBP). LBP demonstrated higher sensitivity (98.1% vs. 95.0%) and accuracy (97.0% vs. 93.7%) than CS, while combined use provided no significant advantage over LBP alone. In the multi-diagnoses group, CS–LBP concordance among diagnostic samples was 92.9%, with comparable diagnostic performance across all methods.
Conclusions:
LBP demonstrated superior diagnostic performance compared with CS, but combined use of both methods provided no significant advantage over LBP alone.
Keywords
Introduction
The increasing use of imaging techniques has led to a rise in the clinical detection of thyroid nodules, with reported prevalence rates ranging from 20% to 68%. 1 Accurate differentiation between benign and malignant nodules is essential for clinical management. Fine-needle aspiration biopsy (FNA) is the preferred preoperative diagnostic method, with the Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) serving as the standard diagnostic framework. 2
Common preparation methods for thyroid FNA samples include conventional smears (CS) and liquid-based preparations (LBP). 3 However, current clinical guidelines do not provide specific recommendations on which method is superior in FNA sample preparation.2,4 CS is done by placing aspirate onto the slide directly and smeared manually, which allows on-site evaluation of the sample and better preservation of cellular and colloidal structures. However, it is susceptible to obscuring blood and crush artifacts. 5 LBP, originally developed for gynecological cytology in 1996, was later adopted for thyroid FNA due to its clear background and convenient handling. 6 Yet it may alter the native architecture of cell clusters, lead to the loss of colloid material, and hinder the assessment of nuclear inclusions. 7
Previous studies comparing the diagnostic performance of these two preparation methods have yielded conflicting results,8–13 and most of these were small, single-center studies.8,9,11,13 In addition, the combination of two methods has also been used in clinical practice. This approach aims to capture complementary cytomorphological features and optimize sample utilization, though it may reduce cellularity in each preparation and increase both cost and workload. 8 Hence, the diagnostic value and necessity of combining the two methods remain unclear.
In this multicenter study, we aimed to compare the diagnostic performance of CS, LBP, and their combination, offering clinical evidence for thyroid FNA sample preparation.
Materials and Methods
This multicenter study included patients undergoing thyroid FNA cytopathological examination between June 2010 and November 2025 from 16 medical centers: the First Affiliated Hospital of Sun Yat-sen University (FAHSYSU), Guangdong Provincial People’s Hospital (GDPH), the First Affiliated Hospital of Guangxi Medical University (FAHGXMU), Sun Yat-sen University Cancer Center (SYSUCC), the First People’s Hospital of Changde City (FPHCD), the Affiliated Hospital of Zunyi Medical University (AHZYMU), the First People’s Hospital of Yulin (FPHYL), the Seventh Affiliated Hospital of Sun Yat-sen University (SAHSYSU), Liuzhou People’s Hospital (LZPH), the Second Affiliated Hospital of Guangxi Medical University (SAHGXMU), the People’s Hospital of Guigang (PHGG), Dongguan People’s Hospital (DGPH), the First People’s Hospital of Qinzhou (FPHQZ), Sanming First Hospital (SMFH), Foshan Fosun Chancheng Hospital (FSCC), and the Second Affiliated Hospital of Nanchang University (SAHNCU). The study was approved by the Ethics Committee of the FAHSYSU (Approval No. [2024] 848-1) and the informed consent was waived. All ultrasound-guided FNA samples for thyroid nodules were included in this study, with exclusion of cases where cytological findings indicated inadvertent sampling of nonthyroidal structures.
We collected the cytological diagnostic descriptions for all cases enrolled in this study and reviewed the cytopathology diagnostic descriptions of all samples to derive the TBSRTC diagnostic categories. 2 The rates for nondiagnostic (Bethesda I) and indeterminate (Bethesda III and IV) nodules were compared across preparation methods. Surgical records and corresponding histological diagnoses were retrieved. Using histology as the gold standard, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated. The confidence intervals (CIs) were estimated using the Wilson score method. Among combined-method samples, cases with separate CS and LBP diagnoses in addition to an integrated diagnosis formed the “multi-diagnoses group.” Concordance was calculated as the proportion of cases with identical diagnostic categories between CS and LBP among diagnostic samples (Bethesda II–VI). Diagnostic performance for each method was assessed.
Additional details on the preparation procedure and statistical analysis are provided in the Supplementary Data.
Results
In total, we identified 89,392 thyroid FNA cases, of which 49,309 were prepared using CS only, 13,161 using LBP only, and 26,922 using a combination of both methods (Fig. 1). The rate of indeterminate nodules (Bethesda III and IV) for CS, LBP and combined CS-LBP was 10.3%, 10.9%, and 14.8%, respectively. The combined method showed the lowest nondiagnostic rate (7.3%), compared with 10.5% for CS and 17.9% for LBP (Supplementary Table S2).

Flowchart of case selection and analysis.
CS and LBP demonstrated comparable PPV (97.8% vs. 98.5%) and specificity (85.2% vs. 85.9%). However, LBP achieved higher sensitivity (98.1% vs. 95.0%), accuracy (97.0% vs. 93.7%) and NPV (82.6% vs. 71.1%) compared with CS. Combined use showed similar sensitivity (98.6%), accuracy (96.2%), PPV (97.3%) and NPV (80.4%) to LBP, but specificity (68.2%) was lower (Table 1).
Diagnostic Performance of Preparation Methods in the Entire Dataset and Multidiagnoses Group
CI, confidence interval; CS, conventional smears; LBP, liquid-based preparations; NPV, negative predictive value; PPV, positive predictive value.
Among combined-method samples, 22,872 (85.0%) had only a final integrated diagnosis, while 4,050 (15.0%) had different CS and LBP, and consequently a final integrated diagnosis (multi-diagnoses group). In this multi-diagnosis group, CS had a higher nondiagnostic rate than LBP (15.3% vs. 9.0%), which was further reduced by combined use (7.4%). Indeterminate rates were similar between CS and LBP (20.4% vs. 20.1%) (Supplementary Table S3). The concordance of CS–LBP among diagnostic samples (Bethesda II–VI) was 92.9% (Supplementary Fig. S2).
CS, LBP, and their combined use demonstrated comparable sensitivity (97.1% vs. 97.6% vs. 98.6%), PPV (99.1% vs. 99.1% vs. 99.1%), and overall accuracy (96.3% vs. 96.8% vs. 97.8%). Specificity varied across the three methods (62.5%, 72.7%, and 65.4%, respectively), with corresponding NPV values of 34.9%, 50.0%, and 54.8%, respectively (Table 1).
Discussion
This multicenter study, the largest to date comparing CS, LBP, and their combination in thyroid FNA, demonstrated that LBP outperformed CS in diagnostic performance, showing higher sensitivity, accuracy, and NPV while maintaining similar PPV and specificity. The combined use of CS and LBP reduced the rate of nondiagnostic preparations, but did not demonstrate a clear advantage over LBP alone in diagnostic accuracy, as specificity declined. Within the multi-diagnoses group, the consistency between CS and LBP was high, yet discrepancies remained, with final interpretations more often aligned with LBP. LBP demonstrated superior specificity and NPV compared with CS, while the two methods exhibited comparable sensitivity, PPV, and accuracy. Combined use provided no diagnostic benefit beyond LBP alone.
In our study, based on the TBSRTC standard, the distribution of diagnostic categories varied across the different preparation methods, which may reflect variations in patient populations across institutions, as well as differences in physicians’ thresholds for recommending FNA. The nondiagnostic rate for LBP was higher than CS in the overall dataset but lower in the multidiagnoses subset, reflecting intercenter variability. Nevertheless, the combined use of both methods markedly reduced the nondiagnostic rate. This may be due to the fact that the two methods complement each other, 14 resulting in a higher qualification rate of combined use samples. Previous studies suggest LBP facilitates detection of malignant features due to higher cell density and better nuclear detail. 15 In our study, however, CS and LBP showed similar indeterminate nodules rates, indicating comparable reliability. The higher indeterminate rate with combined use may result from richer, sometimes discordant, cytomorphological information, increasing interpretive ambiguity. Moreover, the enhanced sensitivity of the combined approach may expose subtle or borderline cellular features that are difficult to classify, prompting a more cautious diagnostic approach. In addition, technical variability, such as the lack of on-site adequacy assessment for CS in certain centers, may have further contributed to the rise in diagnostic uncertainty. 16
Previous studies have reported inconsistent findings regarding the diagnostic performance of CS and LBP,9,13 partly due to small sample sizes or non-contemporaneous cohorts. Our large, histologically confirmed dataset showed LBP had higher overall diagnostic accuracy than CS. However, in the multidiagnoses subset where both methods were applied simultaneously, their performance was comparable, suggesting similar capability under standardized conditions. Despite comparable accuracy, CS and LBP offer distinct practical advantages. CS facilitates on-site evaluation of sample adequacy and is notably cost-effective, eliminating the need for complex consumables and machinery. 17 LBP offers automated, consistent slide preparation, facilitates additional slides for ancillary tests, and simplifies pathologist screening. 18
Although the combined application reduces the nondiagnostic specimen rate, it does not demonstrate a significant advantage in diagnostic accuracy compared with LBP alone. From a resource utilization standpoint, the routine combined use of CS and LBP implies that each FNA sample consumes two sets of consumables and increases the handling and interpretation time for pathologists. However, this additional resource investment does not translate into a clear improvement in diagnostic accuracy. In settings with limited healthcare resources, allocating resources to such a routine combined application with unclear benefits may result in overall inefficiency and crowd out other more cost-effective medical services.
There are several limitations of our study. First, this study is retrospective, and the samples involved this study are limited to southern China, which may constrain the generalizability of the findings. Additionally, the variability in physician experience across different centers might result in the misinterpretation of outcomes, thereby influencing the assessment of diagnostic precision. However, this was a multicenter study involving both academic and non-academic hospitals, and therefore, our data represent real-world experience. The relatively low proportion of Bethesda II patients undergoing surgery reduced negative cases, potentially skewing specificity and NPV. However, this tends to be the norm in clinical practice and other studies in the field.19,20 Future large-scale, prospective, international multicenter trials can be pursued to further validate these findings.
Overall, LBP demonstrated slightly superior diagnostic performance to CS, particularly in sensitivity, accuracy, and NPV. Although combined use reduced nondiagnostic rates, it offered no significant diagnostic advantage over LBP alone. It is recommended to adopt a single cytological preparation method according to local resources and clinical practice, rather than applying both methods simultaneously.
Authors’ Contributions
Wenke C.: Conceptualization (supporting), data curation (lead), formal analysis (lead), writing—original draft (lead), writing—review and editing (equal). Y.Z.: Data curation (lead), formal analysis (supporting), writing—original draft (lead), writing—review and editing (equal). P.M.: Investigation (lead), data curation (supporting), writing—review and editing (equal). G.C.: Investigation (lead), data curation (supporting), writing—review and editing (equal). P.S.: Investigation (lead), data curation (supporting), writing—review and editing (equal). W.D.: Investigation (lead), data curation (supporting), writing—review and editing (equal). Y.B.: Investigation (lead); Data curation (supporting); Writing—review and editing (equal). Yinghui W.: Investigation (lead), data curation (supporting), writing—review and editing (equal). Z.Y.: Investigation (lead), data curation (supporting), writing—review and editing (equal). Yue L.: Investigation (lead), data curation (supporting), writing—review and editing (equal). J.X.: Investigation (lead), data curation (supporting), writing—review and editing (equal). D.Y.: Investigation (lead), data curation (supporting), writing—review and editing (equal). Yongqin W.: Investigation (lead), data curation (supporting), writing—review and editing (equal). Ying L.: Investigation (lead), data curation (supporting), writing—review and editing (equal). C.C.: Investigation (lead), data curation (supporting), writing—review and editing (equal). K.G.: Investigation (lead), data curation (supporting), writing—review and editing (equal). L.W.: Investigation (lead), data curation (supporting), writing—review and editing (equal). D.X.: Investigation (supporting), data curation (supporting), writing—review and editing (equal). Y.P.: Investigation (supporting), data curation (supporting), writing—review and editing (equal). X.D.: Formal analysis (supporting), data integration (lead), writing—review and editing (equal). Z.K.: Formal analysis (supporting), data integration (lead), writing—review and editing (equal). Wenxin C.: Formal analysis (supporting), data integration (supporting), writing—review and editing (equal). Y.H.: Formal analysis (supporting), data integration (supporting), writing—review and editing (equal). F.L.: Formal analysis (supporting), data integration (supporting), writing—review and editing (equal). C.Q.: Conceptualization (lead), supervision (lead), writing—review and editing (equal). A.B.: Conceptualization (lead), supervision (lead), writing—review and editing (equal). L.C.: Conceptualization (lead), data curation (supporting), supervision (lead), writing—review and editing (equal). Yihao L.: Conceptualization (lead), supervision (lead), writing—review and editing (equal).
Footnotes
Acknowledgments
The authors thank the participants who took part in this study.
Data Availability
The datasets generated and analyzed during the current study are not publicly available.
Author Disclosure Statement
The authors declare no conflicts of interest.
Funding Information
Chubo Qi: No funding was received for this work. Athanasios Bikas: No funding was received for this work. Lili Chen: No funding was received for this work. Yihao Liu: This study was supported by the Noncommunicable Chronic Diseases—National Science and Technology Major Project (
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
