Abstract
Background:
Management of large thyroid nodules is controversial, as data are conflicting regarding overall rates of malignancy (ROM) in all nodules and frequency of false-negative fine-needle aspiration results (FNR) in cytologically benign nodules. This meta-analysis aimed to evaluate and compare ROM and FNR in small versus large nodules published in the literature.
Methods:
Articles indexed in PubMed, written in English, published electronically or in print on or prior to December 8 2017 were searched for “false negative thyroid size or cm” and “malignancy rates benign thyroid nodules.” Three hundred fifty-two unique citations were identified. Multiple reviewers selected a final set of 35 articles that contained nodules stratified by size (3, 4, or 5 cm), with benign or all cytologic diagnoses, and with postsurgical histologic diagnoses. Multiple observers extracted data, including numbers of total, cytologically benign, and histologically malignant nodules. Size cutoffs of 3, 4, and/or 5 cm were analyzed in 14, 24, and 1 article, respectively.
Results:
ROM in all nodules ≥3 cm (13.1%) and ≥4 cm (20.9%) was lower than those <3 cm (19.6%) and <4 cm (19.9%; odds ratio [OR] = 0.72 [confidence interval (CI) 0.64–0.81] and OR = 0.85 [CI 0.77–0.95]). FNR in nodules ≥3 cm (7.2%) was not different from smaller nodules (5.7%; OR = 1.47 [CI 0.80–2.69]). FNR in nodules ≥4 cm (6.7%) was slightly higher than those <4 cm (4.5%; OR = 1.38 [CI 1.06–1.80]). The most frequently reported false-negative diagnosis was papillary thyroid carcinoma.
Conclusions:
Rates of malignancy and false-negative FNA results vary but, in most studies, are not higher in larger nodules. Patients with large, cytologically benign thyroid nodules need not undergo immediate surgical resection, as false-negative FNA rates are low and are expected to decrease in light of nomenclature revision of a subset of follicular variants of papillary thyroid carcinoma.
Introduction
W
The American Thyroid Association (ATA) recommends surgical resection of thyroid nodules with a FNA diagnosis of malignancy, with the exception of low-risk microcarcinomas, patients with high surgical risk or short life expectancy, or patients whose other comorbid conditions require more urgent treatment (3). In contrast, nodules with a benign FNA diagnosis in asymptomatic patients do not require additional immediate workup or treatment. Follow-up varies based on US features. Repeat US with FNA within one year is recommended for highly suspicious nodules. For minimally suspicious nodules, the utility of surveillance is limited, and repeat US should be performed no sooner than two years. Resection of cytologically benign nodules is not routinely recommended but may be considered in the context of growth, patient symptoms, or other clinical concern (3). Some studies have attempted to correlate thyroid nodule size with risk of malignancy, as well as with false-negative FNA rates (nodules with a benign diagnosis on cytology but a malignant diagnosis on final surgical pathology following resection). Some authors report high rates of malignancy or false-negative FNAs in larger nodules (≥3 or 4 cm), whereas others report no difference based on size. Understandably, the ATA guidelines find the current data unsatisfactory and make no recommendation on treatment of cytologically benign nodules based on size alone: “Based on the evidence, it is still unclear if patients with thyroid nodules ≥4 cm and benign cytology carry a higher risk of malignancy and should be managed differently than those with smaller nodules” (3). Due to this ambiguity in the guidelines and conflicting reports in the literature, some surgeons elect to resect large thyroid nodules regardless of cytology results out of concern for the possibility of a false-negative FNA. Clarity on this topic is important, as it has significant implications on the decision to operate in patients with large nodules.
The aims of this meta-analysis were to review studies that have evaluated the rate of malignancy and/or the false-negative rate of thyroid nodules based on size and to analyze the published data in a homogenous manner in order to define the true false-negative FNA rate and make recommendations for treatment of large thyroid nodules with benign cytology.
Review
Methods of meta-analysis
Three PubMed searches were performed on December 8, 2017: (i) using the words “false negative thyroid size,” resulting in 130 articles, of which 119 were written in English; (ii) using the words “false negative thyroid cm,” resulting in 113 articles, of which 105 were written in English; and (iii) using the words “malignancy rates benign thyroid nodules,” resulting in 215 articles, of which 203 were written in English. Duplicate articles from these searches were eliminated, leaving 352 unique articles. Forty-nine potentially relevant articles were identified, and an additional 13 articles were identified within the references, giving a total of 62 potentially relevant articles.
Potentially relevant articles were reviewed and included if they contained the following. (i) Size stratification of thyroid nodules using at least 3, 4, or 5 cm cutoffs. If size cutoffs of <3 cm were used, articles were excluded. (ii) Final diagnosis (benign or malignant) confirmed by histologic examination following surgical resection. If final diagnosis was not confirmed, articles were excluded. (iii) Nodules with benign (Bethesda II) cytologic diagnoses, or nodules with all Bethesda diagnoses. If nodules analyzed did not include Bethesda II, articles were excluded. Studies published prior to the Bethesda classification system were required to include either benign cytologic diagnosis or all cytologic diagnoses (including nondiagnostic, indeterminate, and malignant). Studies with only benign cytology were excluded from the overall rate of malignancy analyses. All studies were included in the false-negative rate analyses.
Thirty-seven articles were included. Based on authors, institutions, and time frames, the data in two articles overlapped significantly (Kuru et al. [2009] overlapped with Kuru et al. [2010] and Parikh et al. [2013] overlapped with Albuja-Cruz et al. [2013]) (6 –9). Therefore, overlapping data in Kuru et al. (2009) and Parikh et al. (2013) were eliminated, leaving 35 main articles published between 1995 and 2017 (33 published in the last 10 years between 2007 and 2017; Fig. 1).

Flow chart of literature search.
The following data were collected from each article.
Overall rates of malignancy in nodules based on size cutoffs. Rates were calculated as: # malignant on final histology/total # resected. Only articles including all cytologic diagnoses were included in this analysis. In three articles, data were available such that size cutoffs of both 3 and 4 cm could be analyzed (10 –12). Size was stated or assumed to be based on US unless otherwise indicated. Size was stated to be from final pathology in three articles (13 –15).
False-negative rates in nodules based on size cutoffs. “False negative” was defined as a nodule with benign (or Bethesda II) cytology but with malignancy within the nodule on final histologic exam. Rates were calculated as: # Bethesda II with malignant final histology/total # Bethesda II resected. In two articles, data were available such that size cutoffs of both 3 and 4 cm could be analyzed (12,16).
Whether incidental microcarcinomas found outside the index nodule were excluded in false-negative studies. In seven articles, this information could not be determined (17 –23). In one article, nodules <1 cm were excluded from analysis, but the authors made no comment on correlation of size/location of final carcinomas to size/location of index nodules (10). In three articles, incidental microcarcinomas were included (11,17,24). In two articles, incidental carcinomas <5 mm or <1 cm were excluded unless they had extrathyroidal extension (25,26). In the remaining 22 articles, incidental microcarcinomas found outside the index nodule were excluded.
The final pathologic diagnoses of false-negative nodules. This information was not reported in 13 articles. It was available in the remaining 22 articles (7,8,12,14 –16,18,20 –25,27 –35).
Whether the nodules were biopsied under US guidance (10,11,13,15 –18,20,22,23,26,27,29 –38), US or palpation guidance (7,9,14,25,39), or not reported (12,19,21,24,28,40,41).
Whether a pathologist was included in the article authors (10,12,13,15,17,19,20,22,23,26 –29,31 –33,35,37 –39).
The conclusions of the article authors, and whether resection of large nodules was recommended.
Extracted data were analyzed by the authors (M.G.W. and N.A.C.). When comparisons between small nodules (less than or less than and equal to respective size cutoff) and large nodules (greater than or equal to or greater than respective size cutoff) could be made, the odds ratio (OR) and confidence interval (CI) were calculated using Stata v13 (StataCorp, College Station, TX). Bulk analysis calculating the OR for all published nodules was performed and is reported in the text and Table 1. Formal weighted meta-analyses and Forest plots for applicable studies (those including both large and small, benign and malignant nodules) were also performed and are reported in the text and Figures 1 –3. Both fixed- and random-effects models were performed in formal meta-analyses. Given the standardization of FNA cytology through the Bethesda criteria, a fixed-effects model was used for primary outcomes. Random-effects outcomes are reported in the text and in Supplementary Figures S1–S3. Given the large number of nodules included in this meta-analysis, any p-value <0.01 was considered statistically significant. The results were compared to the results and conclusions of the article authors.

Forest plots showing overall risk of malignancy in thyroid nodules at size cutoffs of (

Forest plots showing false-negative rates in thyroid nodules at size cutoffs of (
Summary of Data from Articles Included in Meta-Analysis
Totals exclude benign-only and large-only studies.
Totals exclude large-only studies.
Article in which two size cutoffs were analyzed.
Nodules <1 cm excluded but no comment on correlation to index nodule.
CI, confidence interval; ETE, extrathyroidal extension; FNA, fine-needle aspiration; FTC, follicular thyroid carcinoma; HTC, Hürthle cell variant of FTC; MTC, medullary thyroid carcinoma; NIFTP, noninvasive papillary thyroid neoplasm with papillary-like nuclear features; NR, not reported; OR, odds ratio; Palp, palpation guided; PDC, poorly differentiated carcinoma; PTC, papillary thyroid carcinoma; PTC-FV, papillary thyroid carcinoma follicular variant; PTMC, papillary thyroid microcarcinoma; UNK, unknown; US, ultrasound-guided; US-Bx, ultrasound-guided FNA biopsy; UTC, undifferentiated thyroid carcinoma; WDT-UMP, well-differentiated thyroid neoplasm of uncertain malignant potential.
Results of meta-analysis
Of the 35 main articles, a size cutoff of 3 cm could be analyzed in 14 articles, and a size cutoff of 4 cm could be analyzed in 24 articles. (In three articles, size cutoffs of both 3 and 4 cm were analyzed. Therefore, the number of analyses is greater than the number of articles.) All but two studies (28,38) separated nodules at ≥3 cm; all but two studies separated nodules at ≥4 cm (20,30). The studies separating nodules at >3 and >4 cm were included with the ≥3 and ≥4 cm groups, respectively. Therefore, in the text, the symbol “≥” is be used to denote both “greater than” and “greater than and equal to.” In only one article was a cutoff of >5 cm used (Table 1).
Effect size: rates of malignancy by nodule size
This analysis included articles with all cytologic diagnoses; articles with only benign cytologic diagnoses were excluded. Comparisons between small and large nodules at a 3 cm cutoff size could be made in eight studies. The overall rates of malignancy in nodules <3 cm versus ≥3 cm ranged from 10.2% to 48.3% (mean 26.8%, median 18.5%) versus 11.1% to 33.3% (mean 20.3% median 20.1%), respectively (10
–13,17,18,25,27
–29). The total number of nodules studied ranged from 84 to 7348 (mean 1370, median 439). In two studies, a statistically significant difference was not found between malignancy rates in smaller versus larger nodules (25,27). In five studies, larger nodules were found to have statistically significantly lower rates of malignancy than smaller nodules (10,12,13,17,28). In one study, larger nodules were found to have statistically significantly higher rates of malignancy than smaller nodules (11). Additionally, two studies only analyzed nodules ≥3 cm and showed malignancy rates of 10.2% and 31.6% (18,38). The malignancy rate of all nodules <3 cm (n = 4393) was compared to that of all nodules ≥3 cm (n = 7619). Bulk malignancy rates were 19.6% versus 13.1%, respectively (OR 0.62 [CI 0.56–0.69]). In a meta-analysis, nodules ≥3 cm showed a statistically significantly lower rate of malignancy with fixed-effects modeling, but an equal risk of malignancy with random-effects modeling (fixed effects: OR 0.72 [CI 0.64–0.81]; random effects: OR 0.73 [CI 0.48–1.11]; Fig. 2A and Supplementary Fig. S1A; Supplementary Data are available online at
Comparisons between small and large nodules at a 4 cm cutoff size could be made in 12 studies. The overall rates of malignancy in nodules <4 cm versus ≥4 cm ranged from 10.9% to 70.7% (mean 31.1%, median 27.2%) versus 8.5% to 57.5% (mean 26.6%, median 24.6%), respectively (6,9 –12,19,20,30,35 –37,39). The total number of nodules studied ranged from 95 to 7348 (mean 1644.5, median 692.5). In five studies, a statistically significant difference was not found between malignancy rates in smaller versus larger nodules (16,26,30,39,40). In five studies, larger nodules were found to have statistically significantly lower rates of malignancy than smaller nodules (9,12,20,35,36). In four studies, larger nodules were found to have statistically significantly higher rates of malignancy than smaller nodules.(7,10,11,19). Additionally, one study only provided data in percentages. Therefore, independent statistical analysis was not performed. However, they reported no significant difference between small and large nodules (p = 0.89) (37). Eight studies only analyzed nodules ≥4 cm and showed malignancy rates of 7.2–58.6% (mean 22.1%, median 20.5%). In these studies, comparisons to smaller nodules could not be made (14,15,21,24,31,32,34,41). The malignancy rate of all nodules <4 cm (n = 16,778) was compared to that of all nodules ≥4 cm (n = 4660). Bulk malignancy rates were 19.9% versus 20.9%, respectively (OR 1.06 [CI 0.98–1.16]). In a meta-analysis, nodules ≥4 cm showed a statistically significantly lower rate of malignancy with fixed-effects modeling, but an equal risk of malignancy with random-effects modeling (fixed effects: OR 0.85 [CI 0.77–0.95]; random effects: OR 0.87 [CI 0.62–1.22]; Fig. 2B and Supplementary Fig. 1B).
The rates of malignancy in nodules <5 cm versus >5 cm (a single study, 271 nodules) were 32% versus 18.2%, respectively (22). In this case, larger nodules had a statistically significantly lower rate of malignancy than smaller nodules (OR 0.47 [CI 0.25–0.91]).
Effect size: false-negative rates by nodule size
This analysis included articles reporting benign cytologic diagnosis; nodules without benign cytologic diagnoses were excluded. Comparisons between small and large nodules at a 3 cm cutoff size could be made in five studies. The overall false-negative rates in nodules <3 cm versus ≥3 cm ranged from 0% to 21.9% (mean 6.8%, median 4.8%) versus 6.7% to 16.7% (mean 10.3%, median 11.7%), respectively (11,12,16 –18,25,27,29). In four studies, a statistically significant difference was not found between false-negative rates in smaller versus larger nodules (12,16,25,27). In one study, larger nodules were found to have statistically significantly lower false-negative rates than smaller nodules (17). Additionally, five studies only analyzed nodules ≥3 cm and showed false-negative rates ranging from 0.7% to 13.4%. However, comparisons to smaller nodules could not be made (15,18,23,29,33). One study showed extremely high false-negative rates of 43.7% and 77.3% in small versus large nodules. However, they did not exclude incidental papillary thyroid microcarcinomas (PTMC) outside the index nodule (11). Combining all studies using a cutoff size of 3 cm, the false-negative rate of all nodules <3 cm (n = 246) was calculated and compared to that of all nodules ≥3 cm (n = 1435). Bulk false-negative rates were 5.7% versus 7.2% (OR 1.29 [CI 0.72–2.49]). In a meta-analysis, nodules ≥3 cm were not statistically significantly different (fixed effects: OR 1.47 [CI 0.80–2.69]; random effects: OR 1.57 [CI 0.51–4.83]; Fig. 3A and Supplementary Fig. 2A).
Comparisons between small and large nodules at a 4 cm cutoff size could be made in 12 studies. The overall false-negative rates in nodules <4 cm versus ≥4 cm ranged from 1.3% to 28.2% (mean 9.0%, median 6.9%) versus 0% to 20% (mean 10%, median 8.2%), respectively (7,9,12,14,16,19 –21,24,26,30 –32,35,36,40 –42). In 11 studies, a statistically significant difference was not found between false-negative rates in smaller versus larger nodules (7,9,12,16,19,20,26,30,36,40,42). In one study, false-negative rates were higher in larger nodules (35). Additionally, eight studies only analyzed nodules ≥4 cm and showed false-negative rates of 0–12.7% (mean 6.6%, median 7.1%). However, comparisons to smaller nodules could not be made (14,15,21,24,31,32,34,41). Combining all studies using a cutoff size of 4 cm, the false-negative rate of all nodules <4 cm (n = 3655) was calculated and compared to that of all nodules ≥4 cm (n = 2232). Bulk false-negative rates were 4.5% versus 6.7%, respectively (OR 1.07 [CI 0.85–1.36]). In a meta-analysis analyzing only studies with both small and large nodules with a 4 cm cutoff, those ≥4 cm had a slightly higher false-negative rate, although the difference did not reach our assigned p-value of <0.01 (fixed effects: OR 1.38 [CI 1.06–1.80]; random effects: OR 1.38 [CI 1.03–1.85]; Fig. 3B and Supplementary Fig. S2B).
Six studies did not stratify false-negatives by size but showed overall rates of 0–15.4% (mean 8.0%, median 6.5%) (10,13,22,28,37,39).
Publication bias
When performing the Begg and Mazumdar adjusted rank correlation test for publication bias, Egger's bias (0.70; p = 0.65) at the 3 cm cutoff was not significant, nor was it at the 4 cm cutoff (−0.49; p = 0.92; Supplementary Fig. S3A and B). Notably, however, there were only nine studies within the 3 cm cutoff group, which does not reach the minimum of 10 studies typically expected to analyze a meta-analysis for symmetry fully (43). While interpretation of the 3 cm group was limited by the number of studies, neither group demonstrated any clear publication bias, although significant inter-study heterogeneity existed.
Histology of false-negatives diagnoses
Of the six articles that documented false-negative diagnoses in nodules ≥3 cm, the most frequently reported diagnosis within the index nodule was follicular variant of papillary thyroid carcinoma (FV-PTC; 32 cases), followed by follicular thyroid carcinoma (FTC; 24 cases, including one Hürthle cell carcinoma [HTC]), PTC (10 cases, including one oxyphilic, one tall cell, and one with “mixed” histologic patterns), two undifferentiated or anaplastic thyroid carcinomas (UTC), and one poorly differentiated thyroid carcinoma (PDC) (16,18,23,25,29,33). One of these also reported four FV-PTC in nodules <3 cm (16). Of the nine articles that documented false-negative diagnoses in nodules ≥4 cm, the most frequently reported diagnosis within the index nodule was PTC (38 cases), FV-PTC (28 cases, including two deemed well-differentiated thyroid neoplasm of uncertain malignant potential), followed by FTC (24 cases, including three HTC), three PTMC, one medullary thyroid carcinoma (MTC), and three other (including one lymphoma and two unspecified) (7,8,14,15,21,24,31,32,35). One of these also reported 47 PTC, 5 FTC, and 3 FV-PTC (deemed well-differentiated thyroid neoplasm of uncertain malignant potential) in nodules <4 cm (35). One article that reported false-negative diagnoses at cutoffs of 3 and 4 cm showed one PTC <3 cm, one FV-PTC, and one PTC between 3 and 4 cm, and two FV-PTC ≥4 cm (12). Four articles reported false-negative diagnoses without specifying size (20,22,27,30). In these articles, the most frequently reported diagnosis was PTC (28 cases), FTC (8), FV-PTC (5), PTMC (2), and one each of MTC and UTC (20,22,27,30). Overall, 180 false-negative diagnoses within the index nodule were reported: the most frequent was PTC (125; 39.4%), followed by FV-PTC (75; 23.7%), FTC (61; 19.2%, including four HTC), PTMC (5; 1.6%), UTC (3; 0.9%), MTC (2; 0.6%), and four other (one PDC, one lymphoma, two unknown).
Conclusions of original authors
Authors of 31 articles made recommendations on whether to resect all large, cytologically benign thyroid nodules surgically. Overall, 7 (22.6%) recommended surgical resection, and 24 (77.4%) did not. Of authors who made recommendations based on a 3 cm size cutoff, 4/9 (44.4%) recommended surgical resection (11,16,25,38), and 5/9 (55.5%) did not recommend resection (13,17,18,29), including one who recommended close follow-up with repeat US-guided FNA or surgery in large nodules (27). Of authors who made recommendations based a 4 cm size cutoff, 3/21 (14.3%) recommended surgical resection (14,31,32), and 18/21 (85.7%) did not recommend resection (7,10,15,24,26,30,34,36,37,39 –42), including (i) the only article whose data showed slightly higher false-negative rates in larger nodules (35), (ii) one that additionally recommended personalized practice driven by institutional false-negative rates (21), (iii) two that recommended close follow-up with repeat US-guided FNA or surgery (19,20), and (iv) one that recommended consideration of resection of large nodules in women but not men (8). One set of authors evaluated both 3 and 4 cm cutoffs and did not recommend surgical resection (12).
Summary
This meta-analysis of 35 articles evaluated the overall rates of malignancy and false-negative rates in thyroid nodules by size, using US size cutoffs of 3 and 4 cm. In bulk analyses, larger nodules did not have an increased risk of malignancy compared to smaller nodules: 19.6% (<3 cm) versus 13.1% (≥3 cm) and 19.9% (<4 cm) versus 20.9% (≥4 cm). Similarly, bulk false-negative rates were statistically equivalent: 5.7% (<3 cm) versus 7.2% (≥3 cm) and 4.5% (<4 cm) versus 6.7% (≥4 cm; Table 1). Results were similar in formal meta-analyses, with the exception of slightly higher false-negative rates in nodules ≥4 cm (Fig. 3B). This slight difference in nodules ≥4 cm is not statistically significant at the assigned p-value cutoff of 0.01, is not considered clinically meaningful, and is still less than the false-negative rate in nodules ≥3 cm. Based on these data, surgical resection of large cytologically benign nodules is not recommended in the absence of other clinical indications for resection.
More than 20,000 pooled nodules were analyzed in this study, >7000 with benign cytology. Only with such large numbers can accurate data be generated. In 21 data sets, malignancy rates could be compared by size. The majority (81%) showed similar or lesser rates of malignancy in larger nodules. False-negative rates could be compared in 17 data sets, one of which showed higher false-negative rates in larger nodules. In light of these findings, it is surprising that multiple authors recommend surgical resection of large cytologically benign nodules due to perceived (but not actual) high false-negative rates. Of the seven articles that recommended resection, one included incidental microcarcinomas in the false negatives, yielding very high false-negative rates (11), and four did not evaluate small nodules (14,31,32,38). Of the remaining two, a statistically significant difference between false-negative rates in small and large nodules was not found on the analysis (16,25). Giles et al. (16) found statistical significance using a size cutoff of 3 cm (p = 0.03). Meko et al. did not perform statistical analysis on this subgroup (25). The recommendations to resect may have been based more on institutional preferences/practices rather than being driven by data.
Missed “follicular lesions”
Two groups that strongly recommended resection of large nodules also claimed that benign cytology may miss or incorrectly classify “follicular lesions,” namely follicular adenomas (including oncocytic or Hürthle cell adenomas). McCoy et al. and Pinchot et al. found that 26.7% and 42.3% cytologically benign nodules ≥4 cm were diagnosed as follicular adenomas (14,32). Since follicular neoplasms require resection for determination of follicular adenoma versus carcinoma, they claim that follicular adenomas with benign cytology may be inappropriately treated nonsurgically. However, patients with cytologically benign nodules that are not immediately resected should still be followed clinically. Repeat FNA or subsequent resection may be prompted by growth, worrisome US features, or patient symptoms—likely to occur in cases of true follicular carcinoma. The natural history of follicular neoplasms is not well understood, but malignant transformation is thought to be rare (44). Similarly, repeat biopsy showing malignancy in a cytologically benign nodule has only been reported rarely (45 –47). Additionally, not all follicular adenomas are highly cellular or composed of microfollicles to prompt a diagnosis of FLUS or follicular neoplasm on FNA. The Bethesda System comments that FNA cannot distinguish a dominant nodule in a multinodular goiter from a solitary colloid-rich, macrofollicular adenoma, such that both may receive a benign Bethesda II diagnosis (48). The World Health Organization Classification of Tumours of Endocrine Organs also confirms that follicular adenoma may contain follicles of varying sizes with varying amounts of intrafollicular colloid (49). In light of the vague criteria and nonspecific terminology, uniform diagnoses are not rendered by all pathologists, and what one may call a follicular adenoma, another may call a hyperplastic or adenomatoid or colloid nodule. The only way to determine true clonality in a nodule (i.e., adenoma) is via genetic analysis, which is not routinely performed on benign thyroid lesions. In a clonality analysis using X-inactivation, Apel et al. found that 18/27 (67%) histologically hyperplastic thyroid nodules were monoclonal and morphologically identical to the polyclonal cases (50). Others have supported the clonal origin of a subset of benign nodules in multinodular goiters (51 –55). Benign adenomatoid or histologically hyperplastic nodules may also harbor clonal RAS point mutations (53,56 –58). Due to the interobserver variability among both surgical pathologists and cytopathologists, caution must be exercised when evaluating so-called “missed follicular lesions,” as the significance is unclear (4,59).
Since follicular carcinomas are usually of larger size than PTC (60), higher false-negative rates for FTC might be expected in larger compared to smaller nodules. Missed FTC were reported in 13 studies. However, as most studies reported findings only in larger nodules or did not differentiate false-negative diagnoses based on size, comparisons based on size could not be made.
Missed “follicular variants of papillary”
The Endocrine Pathology Society recently proposed a revision of the nomenclature of noninvasive encapsulated follicular variant of papillary thyroid carcinoma (FV-PTC) to “noninvasive follicular thyroid neoplasm with papillary-like nuclear features” (NIFTP) if strict diagnostic criteria are met (61). This change will impact rates of malignancy and false-negative rates. In this meta-analysis, considering all 20 articles that reported false-negative diagnoses within the index nodule, PTC and FV-PTC were the most frequent (39.4% and 23.7%). Some of these PTC likely represent NIFTP, which should be treated by lobectomy and followed clinically. However, like follicular adenomas, not all are microfollicular or highly cellular. Some are macrofollicular or colloid-rich, and a clinically aggressive but cytologically benign NIFTP could also be brought to attention by worrisome clinical or radiographic features. Decreases in the malignancy rates of cytologically benign (Bethesda II) nodules have been reported by various institutions (ranging from 3% to 60% relative percent decrease with removal of NIFTPs) (62,63). Therefore, false-negative rates in this context are expected to decrease. In this meta-analysis, three studies published following the NIFTP proposal addressed this concept, and each study recognized only three NIFTP. In studies by Cavallo et al., Bakkar et al., and Nam et al., respectively, the false-negative rates of nodules >3 cm were 1.4%, 6.4%, and 13.4% with NIFTP considered benign and 5.8%, 7.9%, and 15.2% with NIFTP considered malignant (12,23,33).
Interobserver variability
Rates of malignancy in cytologically benign nodules are highly variable and depend upon individual and institutional practices of surgeons, endocrinologists, radiologists, and pathologists. First, false-negative rates may be affected by variability in sampling practices. FNA under US guidance by an interventional radiologist with adequacy assessment by a cytopathologist is favored at the authors' institution in order to qualify the imaging characteristics of the nodule and target the nodule or region of interest more accurately. Palpation-guided aspiration is not recommended and is hypothesized to result in an increased number of inadequately or inappropriately sampled specimens (64,65). In this meta-analysis, 22 articles utilized US-guided FNA, six US- or palpation-guided FNA, and seven did not specify the method used. Although comparisons are difficult to make in light of the high rates of US use, those using a mixture of biopsy techniques had overall false-negative rates ranging from 1.9% to 13% (6,14,25,39,42). These rates are similar to those reporting only US guidance (mean 15.2%).
Second, interobserver variability among cytopathologists and surgical pathologists may affect false-negative rates. Cibas et al. evaluated differences between an academic cytopathology panel and local community cytopathologists and found that overall concordance in the standard six-category Bethesda system was 64%. Furthermore, they found that local pathologists made fewer benign cytologic diagnoses, but their risk for malignancy was slightly higher. In other words, academic pathologists made more benign diagnoses with a concurrent decrease in final false-negative rate. Additionally, overall concordance in a two-tier histopathology system (benign vs. malignant) was 90.7%, with the most common disagreements being in PTC and FV-PTC (4). Unanimous agreement in diagnosis of FV-PTC/NIFTP, even among expert pathologists, is rare (in some studies only 13%) and depends upon the perceived severity of nuclear features (66). In this regard, a diagnosis of follicular adenoma, atypical follicular neoplasm, or papillary carcinoma may be given to the same nodule by different pathologists. Furthermore, FTC are associated with a similar challenge, namely variability in interpretation of the presence and extent of capsular or vascular invasion (67). Again, the same nodule may be given a diagnosis of adenoma or carcinoma by different pathologists. In summary, false-negative rates may be higher in community cytopathology practices or in aggressive surgical pathology practices in diagnostically equivocal nodules.
Clinical indications for resection of cytologically benign nodules
While this meta-analysis found that cytologically benign nodules need not be resected based on large size alone, other features may serve as indications for resection. The ATA advocates repeat biopsy after one to two years for US suspicious or growing nodules (3). Large nodules might still require resection if they are symptomatic, growing, of cosmetic concern, or substernal. Symptoms may include difficulty swallowing, difficulty breathing, or tracheal deviation. Young patients in whom nodule growth or symptomatology is predicted may elect to undergo surgical resection. Some physicians advocate removal of substernal goiters because they are likely to show progressive enlargement, and they cannot be easily biopsied or monitored for cancer (68).
Limitations
In most studies, moderate numbers of large nodules were evaluated, likely due to low resection rates of cytologically benign nodules in standard practice. All but two were retrospective, leading to a selection bias in most studies. Kuru et al. claimed to recruit patients prospectively. However, not all cytologically benign nodules were resected, and the patient population in this study was similar to that in the retrospective studies (7). Therefore, only the study by Rosario et al. was truly prospective, such that all patients with thyroid nodules ≥4 cm on US and no contraindication to or refusal of surgery proceeded to surgical resection regardless of the FNA result (21). In this study, the false-negative rate in large nodules was low (3.6%), and two of three were FV-PTC. A number of retrospective studies reported that surgical resection was routinely offered to patients with large nodules (16,17,27,31,32,41). However, it is unclear whether all patients with large nodules underwent resection. Therefore, some level of selection bias is likely still present in these studies.
Additionally, this study did not evaluate the sonographic features of thyroid nodules in the context of decision to biopsy and/or resect. While sonographic features play a role in risk stratification and clinical decision making in thyroid nodules, these features were not uniformly reported. The evaluation of imaging characteristics was not a goal of this study. However, differences in imaging characteristics of nodules may have affected the included data.
Another limitation is that only 20/35 articles included a pathologist among the authors (10,12,13,15,17,19,20,22,26 –29,31,32,35,37 –39). Cytologic-histologic comparison is not always straightforward, and difficulties may arise when confirming that the nodule sampled on FNA was appropriately correlated to the surgical specimen. Inclusion of a pathologist to assist with pathologic correlation is encouraged.
Conclusions
Resection of cytologically benign (Bethesda II) thyroid nodules with large size on US (≥3, 4, or 5 cm) is not recommended in the absence of other clinical indications in light of three main factors: (i) false-negative rates in larger nodules are low, widely variable, and, in most studies, not significantly different from false-negative rates in smaller nodules; (ii) false-negative rates are highly institution and practice dependent; and (iii) false-negative rates are expected to decrease in light of new recommendations on the nomenclature of NIFTP versus FV-PTC. Although a number needed to treat or cost–benefit analysis was not performed, surgical resection may lead to increased morbidity (physical, psychological, and possibly financial) compared to close clinical follow-up of cytologically benign nodules. Close follow-up (including repeat US and/or FNA) can identify patients who ultimately require resection, and those with indolent disease may avoid surgery for cytologically benign but large thyroid nodules. Based on the findings, the authors have stopped recommending thyroidectomy for large cytologically being nodules based on size alone.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
