Abstract
Background:
False-negative rates for thyroid fine-needle aspiration (FNA) vary from 0.4% to 13%, but the effect of nodule size on the accuracy of thyroid FNA remains controversial. We hypothesized that large thyroid nodule size does not contribute to the risk of malignancy or the risk of a false-negative FNA.
Methods:
All thyroid FNAs performed at the Walter Reed Army Medical Center during September 2001–August 2011 were reviewed. A strict correlation between the biopsy site, location, and size of nodule on ultrasound (US) and pathology report was ensured. FNA results were classified as benign, atypical, follicular neoplasm/suspicious for follicular neoplasm (FN/SFN), suspicious for malignancy (SM), or malignant, and the pathology result was categorized as either benign or malignant. Nodules were analyzed by size: 0.5–0.9 cm (group A), 1.0–3.9 cm (group B), and ≥4 cm (group C). Incidental thyroid cancer was not included.
Results:
Of 3013 patients undergoing FNA, 667 (22.1%) had surgery. Patients were excluded for nodules <0.5 cm, nondiagnostic FNA, or no preoperative US, leaving 540 patients with 695 nodules. Among patients referred for surgery, FNA results were benign in 417 nodules (60%), atypical in 22 (3.2%), FN/SFN in 122 (17.6%), SM in 77 (11.1%), and malignant in 57 (8.2%). Postoperative malignancy rates by FNA result were 7% if benign, 4.5% if atypical, 23% if FN/SFN, 33.8% if SM, and 78.9% if malignant. FNA accuracy was 60% in group A, 68.5% in group B, and 80.3% in group C (p=0.01). False-negative rates for FNA were 7.0% overall, 15.8% in group A, 6.3% in group B, and 7.1% in group C (p=0.25). Sensitivity and negative predictive value were highest in group B at 81.6% and 93.7%, respectively. The prevalence of malignancy was not different between groups.
Conclusion:
Our results show that the thyroid nodule size ≥4 cm increases neither the risk of false-negative FNA results, nor the overall risk of malignancy. We also show a tendency toward a higher false-negative rate in subcentimeter nodules. We conclude that a large nodule size should not prompt automatic referral for thyroidectomy. An increased awareness of potential sampling error in subcentimeter nodules is warranted.
Introduction
It is controversial as to whether the thyroid nodule size affects either the risk of malignancy or the accuracy of FNA, yet this characteristic is frequently used in decision making regarding surgical referral. Various reports have suggested the use of specific size cutoffs, such as >3 or >4 cm, as an indication for surgery due to the risk of sampling error (12 –17). Sampling error refers to the risk of obtaining a false-negative FNA result due to inadequate sampling of a large nodule, thus missing the potential site of malignant transformation within an otherwise benign lesion. False-negative rates of FNA in nodules of all sizes have varied from 0.4% to 13% (7,14,16 –23). Accurate sampling of thyroid nodules is dependent on multiple factors besides nodule size, to include the size of the needle, the skill of the aspirator, the aspiration technique, the use of US guidance, slide preparation technique, and expertise of the pathologist. There is currently no consensus recommendation for surgical referral based on the nodule size alone (24). In this study, we examine the impact of the thyroid nodule size on the accuracy of thyroid nodule FNA as well as the overall risk of malignancy at a single institution over a 10-year period.
Materials and Methods
Patients
This study was approved by the Walter Reed Army Medical Center Institutional Review Board. Medical records of patients who had an US-guided FNA biopsy of at least one thyroid nodule from September 2001 through August 2011 were reviewed retrospectively. We included patients with a documented thyroid US, available FNA results, and records permitting a clear correlation between nodule location, cytology, and surgical histology. If a patient had more than one biopsy of the same nodule, the FNA performed immediately before surgery was used in the analysis. Patients were excluded if they did not have a formal US or had a nodule <0.5 cm, insufficient/nondiagnostic thyroid cytology, or an incidental thyroid cancer, defined as a malignancy found outside the nodule of interest.
FNA technique
Thyroid FNA at our center utilizes US guidance nearly exclusively, with fewer than 5% of FNAs performed using palpation alone. All clinical endocrinology staff and fellows (∼10 individuals at any given time) perform US-guided FNAs with biopsy suite support from a cytology technician for slide preparation and adequacy review at the time of the procedure. FNA interpretation is performed on a rotating basis by approximately five different cytopathologists at any given time, working together with pathology residents. Our FNA reports generally include in-depth descriptions, with an assessment of adequacy, cellularity, and nuclear descriptions, an assessment of colloid content, and selection of a diagnostic result category. The reports often have additional commentary, and many (approximately one-third) are included in the intradepartmental review and/or referred to the Armed Forces Institute of Pathology for further review.
FNA categorization
Two endocrinologists (M.S., H.B.B.) independently reviewed the written report of each thyroid FNA. Cytology results were recorded as benign, atypical (atypia of undetermined significance [AUS], follicular lesion of undetermined significance [FLUS]), follicular neoplasm/suspicious for follicular neoplasm (FN/SFN), suspicious for malignancy (SM), or malignant according to the Bethesda System for Reporting Thyroid Cytopathology (25). When independent classification yielded discordant results, reports were reviewed for a third opinion from a cytopathologist (B.A.C.). Cytology slides were not independently reviewed for the purpose of this study. For results classified as AUS/FLUS, this assignment was made on the basis of comparison of the cytological description to published criteria for each of the subtypes of atypical lesions using the Bethesda System (25). All results classified as AUS/FLUS were reviewed by each of the three authors for consensus.
Clinical data acquisition
The electronic medical record was reviewed for the collection of clinical, laboratory, and radiological data. Clinical data collected for each patient included age, sex, race/ethnicity category, and the type of surgery performed. Laboratory data included the thyrotropin (TSH) level obtained before FNA, the cytology report for the thyroid FNA, and the surgical histology, which was classified as either benign or malignant. Thyroid US reports were reviewed to determine the number and maximal diameter of nodules. When necessary, US images were reviewed to confirm report data.
Verification of nodule location
Clinical history and thyroid US reports were carefully matched to thyroid FNA cytology and surgical pathology reports in all patients. This required particular attention in patients undergoing multiple thyroid nodule aspirations for multinodular disease. The procedure note in the electronic medical record was reviewed in cases where uncertainty existed.
Statistical analysis
Nodules were grouped according to the size of the maximal diameter as follows: 0.5–0.9 cm (group A), 1.0–3.9 cm (group B), and ≥4 cm (group C). For the calculation of FNA sensitivity and specificity, thyroid nodules with indeterminate (includes AUS/FLUS, FN, or SM) or malignant FNA results that were found to be malignant at the time of surgery were considered true-positive results. Likewise, nodules with benign FNA results and ultimately found to have cancer within the biopsied nodule were considered false-negative results. Nodules with benign FNA and surgical pathology results were considered to have true-negative FNA results, and those nodules with indeterminate or malignant FNA results, but benign pathology, were classified as having false-positive results.
Continuous data were presented as means with standard deviations (SDs), and groups were compared using the two-sample t-test. For data that were not normally distributed (as determined by the Shapiro-Wilk test and by visual inspection of a histogram), groups were compared using the Wilcoxon rank-sum test. Categorical data were analyzed using the Fisher-Freeman-Halton exact test for comparison of three groups and Fisher's exact test for two group comparisons. A two-sided p-value<0.05 was considered significant. Data were analyzed using SPSS for Windows (v. 19; SPSS/IBM, Inc., Chicago, IL).
Results
Patients
During the study period, 3013 patients had an FNA of one or more thyroid nodules, and 667 had subsequent thyroid surgery. After excluding patients with nodules <0.5 cm, nondiagnostic FNA results, or without a preoperative US, 540 patients with 695 nodules remained for analysis. Baseline characteristics of the included patients are shown in Table 1. The average age for the study population was 49.4 years, and the majority was women (71.3%). Fifty-nine percent of patients were Caucasian. The average TSH level of 521 patients with available data was 1.64 mIU/L. TSH levels were below the reference range in 53/521 (10.2%), within the normal range in 447 (85.8%), and above the reference range in 21/521 (4%). The majority of patients (389/540; 72%) had one nodule; 113 patients (20.9%) had two nodules, and 38 (7.1%) had three or four nodules.
TSH, thyrotropin.
FNA results
The original FNA results of 3997 nodules were benign in 79%, insufficient in 10%, indeterminate in 9%, and malignant in 2% (Fig. 1). Indeterminate results were categorized as AUS/FLUS in 1.5%, FN/SFN in 4.7%, and SM in 2.9%. The AUS/FLUS classification was assigned retrospectively to 51 nodules. The pre-Bethesda System classification for these nodules would have been benign in 20 (38%), insufficient/inadequate in 15 (29%), and SM in 16 (31%). For patients referred for thyroid surgery, FNA results were benign in 417 nodules (60%), atypical in 22 (3.2%), FN/SFN in 122 (17.6%), SM in 77 (11.1%), and malignant in 57 (8.2%). Overall sensitivity, specificity, positive predictive value, negative predictive value, and accuracy for thyroid FNA were 77.5%, 68.6%, 36%, 93%, and 70.2%, respectively.

Flow diagram of the study population. Of 3997 nodules biopsied over 10 years, 695 nodules fit inclusion criteria. Nodules were subdivided into three size categories for further analysis. Final surgical histologic results are noted. FN/SFN, follicular neoplasm/suspicious for follicular neoplasm; SM, suspicious for malignancy.
Rates of malignancy were not significantly different in subjects ≥50-year old (20.6%) versus <50-year old (25.5%), p=0.19, or in women (22.9%) versus men (23.2%), p=0.91. The average age of subjects with a benign diagnosis was 50 years (SD±15.3), and this was not significantly different than the average age of those with cancer, 47.6 years (SD±13.4), p=0.11.
The effect of nodule size
When grouped according to size, there were 35 nodules in group A (0.5–0.9 cm), 533 nodules in group B (1.0–3.9 cm), and 127 nodules in group C (≥4 cm). The malignancy rate after surgery was 18.6% (129/695 nodules), and this did not differ significantly based on the size (p=0.33) (Fig. 2). Also, the mean nodule size was not significantly different between benign and malignant nodules (2.6 vs. 2.7 cm, respectively, p=0.15). The malignancy yield for a benign FNA was 7%, atypical 3.2%, FN/SFN 23%, SM 33.8%, and malignant 78.9%. When considering all sufficient FNA results other than benign as positive, the FNA false-negative rate for the whole group was 7%. We found that false-negative rates did not differ significantly based on size (p=0.25), though subcentimeter nodules had a tendency toward a higher rate of false negatives at 15.8% when compared to group B at 6.3% and group C at 7.1% (Fig. 2). Accuracy and specificity of FNA increased as the size of nodule increased (p=0.01 and p=<0.01, respectively), and this difference was significant when comparing groups A and B individually with group C (Table 2).

Thyroid nodules 0.5–0.9 cm had the highest rates of malignancy and false-negative fine-needle aspiration (FNA) results, when compared to larger nodules. *p=0.33; † p=0.25.
p<0.05 considered significant. No significant differences were seen in comparison of group A versus group B.
PPV, positive predictive value; NPV, negative predictive value.
Discussion
This study explored the 10-year experience of a single institution's thyroid nodule FNAs and the associated surgical outcomes to assess the effect of the thyroid nodule size on the likelihood of thyroid cancer and the accuracy of FNA. Nodules in our series varied from 0.5 to 10 cm in maximum diameter. In this large series of patients with FNA results meticulously linked to radiographic and surgical pathology results, we show conclusively that a nodule size ≥4 cm neither diminishes the accuracy of FNA nor increases the risk of malignancy within the biopsied nodule. Further, we describe an overall higher false-negative rate in nodules <1.0 cm, possibly due to difficulty aspirating smaller lesions with a resultant sampling of adjacent normal tissue.
Previous studies have attempted to link thyroid nodule size to FNA accuracy, specifically false-negative rates, with conflicting results (Tables 3 and 4). Carrillo et al. evaluated 135 nodules of all sizes, 74 of which were benign by FNA, and found a 12.2% false-negative rate overall. When these authors specifically reviewed nodules ≥4 cm (n=35), the false-negative rate increased to 20% (21). The authors concluded that a thyroid nodule diameter of ≥4 cm was considered a high-risk feature and should prompt surgical referral. In 71 patients with nodules ≥4.0 cm who had a thyroidectomy after a benign FNA, McCoy et al. found an FNA false-negative rate of 12.7% (16). Although the rate of malignancy was not significantly higher in larger nodules, these authors recommended surgery in patients with nodules ≥4 cm, regardless of FNA cytology results. Meko and Norton compared 90 patients' preoperative FNA to postsurgical pathology data and found a false-negative rate of 11% in nodules of all sizes (14). Kuru et al. evaluated a total of 601 nodules of all sizes and found an overall false-negative rate of 1.9% (17). False-negative rates were similar for nodules <4 cm (1.3%) and ≥4 cm (4.3%). When Porterfield et al. looked at 145 benign nodules that were ≥3 cm by US and subsequently excised, the false-negative rate was reported as 0.7% (1/145), which was lower than in most studies to date (23). Finally, Rosario et al. found a low false-negative rate of 3.6% in 151 patients with thyroid nodules ≥4 cm who were referred for thyroidectomy irrespective of FNA results. Their study population had a 22.5% malignancy rate after excision of the nodules (20). Our false-negative rate of 7% is within the range of previously published studies. Likely contributors to false-negative FNA results include both incorrect sampling technique (sampling adjacent benign tissue) and interpretation failure, which may due to specimen insufficiency, preparation artifact, or simply erroneous interpretation.
The number of benign FNAs and the distribution based on size were identical between the Kuru study and ours, although the number of false negatives in each category was different.
FNA, fine-needle aspiration.
Several studies have assessed the suitability of nodule size as a predictor of malignancy irrespective of FNA results. McCoy et al. reviewed clinical data on 223 patients with thyroid nodules ≥4 cm and found an overall malignancy rate of 19.3% (16). Meko and Norton found a slightly higher prevalence of malignancy in nodules ≥3 cm compared with nodules <3 cm, but this did not achieve statistical significance (14). They concluded that a diagnostic lobectomy should be performed in all patients with thyroid nodules ≥3 cm, even when FNA is benign (14). Using sonographic criteria, Frates et al. evaluated 865 nodules sized >1.0 cm, and found that size was not a significant predictor of malignancy; the highest size category used was ≥3 cm (26). McHenry et al. evaluated 207 thyroid nodules in patients undergoing thyroid surgery (18) and found that the mean nodule size was significantly larger in the benign (n=164) than in malignant group (n=43) (4.4±2.4 cm vs. 3.3±2.2 cm; p<0.001), with an overall malignancy rate of 20.8%.
In our study, we only included thyroid nodules from patients with documented measurements, with a digital micrometer, on a dedicated preoperative US. Approximately 95% of our FNAs since 1999 have been performed using US guidance. Previous studies have often failed to use a consistent method to evaluate the nodule size, including such techniques as manual palpation and/or ruler, preoperative computed tomography scan, and postsurgical histologic measurement. Currently, the standard method of evaluating thyroid nodule characteristics before and during FNA is US, both as a way of standardizing measurement and for accurate placement of the needle (8).
Our finding of a lower accuracy for FNA in nodules 0.5–1.0 cm is of potential interest. The 2009 Revised American Thyroid Association management guidelines only recommend FNA of nodules of this size in the presence of a high-risk history, suspicious US features, or if abnormal cervical lymph nodes are present (24). The American Association of Clinical Endocrinologists recommends biopsy of nodules ≤1 cm only if suspicious US features are present (27). In 2008, Mazzaferri and Sipos recommended against FNA for nodules <0.5 cm due to the high rate of inadequate specimens (28). Subsequently, a retrospective study on 1440 nodules 0.2–1.0 cm found that 17.8% were insufficient and as the nodule size was inversely proportional to insufficiency rates (29). Looking specifically at thyroid nodules <1 cm in maximal diameter, Moon et al. (29) found a false-negative rate of 6.8%, similar to our results.
Our study has several potential limitations. First, a relatively large percentage of our surgical patients (60%) had a benign preoperative FNA. Approximately 10% of these nodules were removed concurrently with other suspicious nodules, which artificially increases the calculated surgical referral rate for benign nodules. In addition, our exclusion of nodules with insufficient FNAs from the denominator serves to overestimate the percentage of benign FNA nodules going to surgery. Other clinical factors contributing to the decision to refer for surgery despite a benign FNA include suspicious features on US, nodule growth, local compressive symptoms, nodule size, and patient concern or preference. Our dataset does not include non-FNA considerations leading to surgical referral in individual cases. The sampling of a patient population with a higher-than-average frequency of benign results would be expected to influence the accuracy of FNA, since both the positive and negative predictive value of a diagnostic test are affected by the prevalence of the disease being assessed in the population under study. However, this does not appear to be the case in our study, as the prevalence of malignancy in our surgical series (18.6%) is within the range reported in other large series from academic centers (7,30,31). Another limitation of our study was the retrospective reclassification of pre-2010 FNA results using the 2010 Bethesda System for Reporting Thyroid Cytopathology (25), based on the cytological description provided in the FNA result. Although we attempted to ensure reproducibility by performing independent assignment by two endocrinologists and evaluation of discordant results by an experienced cytopathologist, it is possible that this practice resulted in an underuse of the AUS/FLUS category. Next, we chose to classify indeterminate FNA results as false positive if the nodules were benign at the time of surgery. This practice has a tendency to increase the sensitivity and decrease the specificity of thyroid FNA (7). While an alternate approach would be to call these true-negative results, we felt it more accurate to label a sufficient FNA resulting in a decision to refer to surgery as positive. Finally, as a training institution, experience with FNA specimen procurement and interpretation can vary considerably and might account for the differences in our thyroid diagnostic categories, positive and negative predictive values when compared to other studies.
In conclusion, and perhaps contrary to popular belief, the accuracy of US-guided FNA of thyroid nodules is superior in nodules ≥4 cm when directly compared to nodules 0.5–0.9 cm and 1.0–3.9 cm. Nodule size ≥4 cm is neither associated with an increased risk of false-negative results nor an increased overall risk of malignancy. We also show a trend toward a higher false-negative rate in nodules 0.5–0.9 cm and a lower false-negative rate in nodules ≥4 cm. Our results suggest that large nodule size should not prompt automatic referral for thyroidectomy, and that an increased awareness of potential sampling error in subcentimeter nodules is warranted.
Footnotes
Disclosure Statement
The authors declare that no financial conflicts of interest exist. We are military service members or employees of the U.S. Government. This work was prepared as part of our official duties. The views expressed in this article are those of the authors and do not reflect the official policy of the Department of the Army, the Department of the Navy, the Department of Defense, or the United States Government. Title 17 U.S.C. 105 provides, “Copyright protection under this title is not available for any work of the United States Government.” Title 17 U.S.C. 101 defines a U.S. Government work as a work prepared by a military service member or employee of the U.S. Government as part of that person's official duties. We certify that all individuals who qualify as authors have been listed; each has participated in the conception and design of this work, the analysis of data (when applicable), the writing of the document, and/or the approval of the submission of this version; that the document represents valid work; that if we used information derived from another source, we obtained all necessary approvals to use it and made appropriate acknowledgements in the document; and that each author takes public responsibility for this article.
