Large Cytologically Benign Thyroid Nodules Do Not Have High Rates of Malignancy or False-Negative Rates and Clinical Observation Should be Considered: A Meta-Analysis

Abstract

Background:

Management of large thyroid nodules is controversial, as data are conflicting regarding overall rates of malignancy (ROM) in all nodules and frequency of false-negative fine-needle aspiration results (FNR) in cytologically benign nodules. This meta-analysis aimed to evaluate and compare ROM and FNR in small versus large nodules published in the literature.

Methods:

Articles indexed in PubMed, written in English, published electronically or in print on or prior to December 8 2017 were searched for “false negative thyroid size or cm” and “malignancy rates benign thyroid nodules.” Three hundred fifty-two unique citations were identified. Multiple reviewers selected a final set of 35 articles that contained nodules stratified by size (3, 4, or 5 cm), with benign or all cytologic diagnoses, and with postsurgical histologic diagnoses. Multiple observers extracted data, including numbers of total, cytologically benign, and histologically malignant nodules. Size cutoffs of 3, 4, and/or 5 cm were analyzed in 14, 24, and 1 article, respectively.

Results:

ROM in all nodules ≥3 cm (13.1%) and ≥4 cm (20.9%) was lower than those <3 cm (19.6%) and <4 cm (19.9%; odds ratio [OR] = 0.72 [confidence interval (CI) 0.64–0.81] and OR = 0.85 [CI 0.77–0.95]). FNR in nodules ≥3 cm (7.2%) was not different from smaller nodules (5.7%; OR = 1.47 [CI 0.80–2.69]). FNR in nodules ≥4 cm (6.7%) was slightly higher than those <4 cm (4.5%; OR = 1.38 [CI 1.06–1.80]). The most frequently reported false-negative diagnosis was papillary thyroid carcinoma.

Conclusions:

Rates of malignancy and false-negative FNA results vary but, in most studies, are not higher in larger nodules. Patients with large, cytologically benign thyroid nodules need not undergo immediate surgical resection, as false-negative FNA rates are low and are expected to decrease in light of nomenclature revision of a subset of follicular variants of papillary thyroid carcinoma.

Introduction

Workup of thyroid nodules involves a multidisciplinary approach, including serologic thyroid function testing, manual palpation, ultrasonographic (US) evaluation, and fine-needle aspiration (FNA) biopsy. The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) was devised to stratify patients into tiers with increasing risk for cancer based on features identified on FNA (1, 2). Nodules are categorized into one of six diagnostic categories: nondiagnostic (Bethesda I), benign (Bethesda II), atypia of undetermined significance (AUS) or follicular lesion of undetermined significance (FLUS; Bethesda III), follicular neoplasm or suspicious for follicular neoplasm (Bethesda IV), suspicious for malignancy (Bethesda V), or malignant (Bethesda VI). Estimated risks of malignancy in categories II through VI are approximately 3%, 15%, 30%, 75%, and 99%, respectively. However, actual percentages vary by institution (1 –5).

The American Thyroid Association (ATA) recommends surgical resection of thyroid nodules with a FNA diagnosis of malignancy, with the exception of low-risk microcarcinomas, patients with high surgical risk or short life expectancy, or patients whose other comorbid conditions require more urgent treatment (3). In contrast, nodules with a benign FNA diagnosis in asymptomatic patients do not require additional immediate workup or treatment. Follow-up varies based on US features. Repeat US with FNA within one year is recommended for highly suspicious nodules. For minimally suspicious nodules, the utility of surveillance is limited, and repeat US should be performed no sooner than two years. Resection of cytologically benign nodules is not routinely recommended but may be considered in the context of growth, patient symptoms, or other clinical concern (3). Some studies have attempted to correlate thyroid nodule size with risk of malignancy, as well as with false-negative FNA rates (nodules with a benign diagnosis on cytology but a malignant diagnosis on final surgical pathology following resection). Some authors report high rates of malignancy or false-negative FNAs in larger nodules (≥3 or 4 cm), whereas others report no difference based on size. Understandably, the ATA guidelines find the current data unsatisfactory and make no recommendation on treatment of cytologically benign nodules based on size alone: “Based on the evidence, it is still unclear if patients with thyroid nodules ≥4 cm and benign cytology carry a higher risk of malignancy and should be managed differently than those with smaller nodules” (3). Due to this ambiguity in the guidelines and conflicting reports in the literature, some surgeons elect to resect large thyroid nodules regardless of cytology results out of concern for the possibility of a false-negative FNA. Clarity on this topic is important, as it has significant implications on the decision to operate in patients with large nodules.

The aims of this meta-analysis were to review studies that have evaluated the rate of malignancy and/or the false-negative rate of thyroid nodules based on size and to analyze the published data in a homogenous manner in order to define the true false-negative FNA rate and make recommendations for treatment of large thyroid nodules with benign cytology.

Review

Methods of meta-analysis

Three PubMed searches were performed on December 8, 2017: (i) using the words “false negative thyroid size,” resulting in 130 articles, of which 119 were written in English; (ii) using the words “false negative thyroid cm,” resulting in 113 articles, of which 105 were written in English; and (iii) using the words “malignancy rates benign thyroid nodules,” resulting in 215 articles, of which 203 were written in English. Duplicate articles from these searches were eliminated, leaving 352 unique articles. Forty-nine potentially relevant articles were identified, and an additional 13 articles were identified within the references, giving a total of 62 potentially relevant articles.

Potentially relevant articles were reviewed and included if they contained the following. (i) Size stratification of thyroid nodules using at least 3, 4, or 5 cm cutoffs. If size cutoffs of <3 cm were used, articles were excluded. (ii) Final diagnosis (benign or malignant) confirmed by histologic examination following surgical resection. If final diagnosis was not confirmed, articles were excluded. (iii) Nodules with benign (Bethesda II) cytologic diagnoses, or nodules with all Bethesda diagnoses. If nodules analyzed did not include Bethesda II, articles were excluded. Studies published prior to the Bethesda classification system were required to include either benign cytologic diagnosis or all cytologic diagnoses (including nondiagnostic, indeterminate, and malignant). Studies with only benign cytology were excluded from the overall rate of malignancy analyses. All studies were included in the false-negative rate analyses.

Thirty-seven articles were included. Based on authors, institutions, and time frames, the data in two articles overlapped significantly (Kuru et al. [2009] overlapped with Kuru et al. [2010] and Parikh et al. [2013] overlapped with Albuja-Cruz et al. [2013]) (6 –9). Therefore, overlapping data in Kuru et al. (2009) and Parikh et al. (2013) were eliminated, leaving 35 main articles published between 1995 and 2017 (33 published in the last 10 years between 2007 and 2017; Fig. 1).

FIG. 1.

Flow chart of literature search.

The following data were collected from each article.

Overall rates of malignancy in nodules based on size cutoffs. Rates were calculated as: # malignant on final histology/total # resected. Only articles including all cytologic diagnoses were included in this analysis. In three articles, data were available such that size cutoffs of both 3 and 4 cm could be analyzed (10 –12). Size was stated or assumed to be based on US unless otherwise indicated. Size was stated to be from final pathology in three articles (13 –15).

False-negative rates in nodules based on size cutoffs. “False negative” was defined as a nodule with benign (or Bethesda II) cytology but with malignancy within the nodule on final histologic exam. Rates were calculated as: # Bethesda II with malignant final histology/total # Bethesda II resected. In two articles, data were available such that size cutoffs of both 3 and 4 cm could be analyzed (12,16).

Whether incidental microcarcinomas found outside the index nodule were excluded in false-negative studies. In seven articles, this information could not be determined (17 –23). In one article, nodules <1 cm were excluded from analysis, but the authors made no comment on correlation of size/location of final carcinomas to size/location of index nodules (10). In three articles, incidental microcarcinomas were included (11,17,24). In two articles, incidental carcinomas <5 mm or <1 cm were excluded unless they had extrathyroidal extension (25,26). In the remaining 22 articles, incidental microcarcinomas found outside the index nodule were excluded.

The final pathologic diagnoses of false-negative nodules. This information was not reported in 13 articles. It was available in the remaining 22 articles (7,8,12,14 –16,18,20 –25,27 –35).

Whether the nodules were biopsied under US guidance (10,11,13,15 –18,20,22,23,26,27,29 –38), US or palpation guidance (7,9,14,25,39), or not reported (12,19,21,24,28,40,41).

Whether a pathologist was included in the article authors (10,12,13,15,17,19,20,22,23,26 –29,31 –33,35,37 –39).

Whether the study was retrospective or prospective (7,21).

The conclusions of the article authors, and whether resection of large nodules was recommended.

Extracted data were analyzed by the authors (M.G.W. and N.A.C.). When comparisons between small nodules (less than or less than and equal to respective size cutoff) and large nodules (greater than or equal to or greater than respective size cutoff) could be made, the odds ratio (OR) and confidence interval (CI) were calculated using Stata v13 (StataCorp, College Station, TX). Bulk analysis calculating the OR for all published nodules was performed and is reported in the text and Table 1. Formal weighted meta-analyses and Forest plots for applicable studies (those including both large and small, benign and malignant nodules) were also performed and are reported in the text and Figures 1 –3. Both fixed- and random-effects models were performed in formal meta-analyses. Given the standardization of FNA cytology through the Bethesda criteria, a fixed-effects model was used for primary outcomes. Random-effects outcomes are reported in the text and in Supplementary Figures S1–S3. Given the large number of nodules included in this meta-analysis, any p-value <0.01 was considered statistically significant. The results were compared to the results and conclusions of the article authors.

FIG. 2.

Forest plots showing overall risk of malignancy in thyroid nodules at size cutoffs of (A) 3 cm and (B) 4 cm using fixed-effects modeling.

FIG. 3.

Forest plots showing false-negative rates in thyroid nodules at size cutoffs of (A) 3 cm and (B) 4 cm using fixed-effects modeling.

Table 1.

Summary of Data from Articles Included in Meta-Analysis

			All nodules ^a			Small nodules			Large nodules				All nodules with benign cytology ^b
Article	Cytologic diagnoses included	Cutoff size (cm)	Total	Malignant	% Malignant	Total	Maligant	% Malignant	Total	Malignant	% Malignant	OR [CI]	Total	Malignant	% False negative
Deveci 2007 (28)	ALL	>3	664	106	16.0%	465	84	18.1%	199	22	11.1%	0.56 [0.34–0.93]	59	0	0.0%
Ucler 2015 (17)	ALL	≥3	267	75	28.1%	123	52	42.3%	144	23	16.0%	0.26 [0.15–0.46]	122	13	10.7%
Cavallo 2017^c (12)	ALL	≥3	548	208	38.0%	382	163	42.7%	166	45	27.1%	0.50 [0.34–0.74]	155	5	3.2%
Magister 2015 (13)	ALL	≥3	326	148	45.4%	263	127	48.3%	63	21	33.3%	0.54 [0.30–0.95]	67	4	6.0%
Kamran 2013^c (10)	ALL	≥3	7348	927	12.6%	1771	279	15.8%	5577	648	11.6%	0.70 [0.60–0.82]
Meko 1995 (25)	ALL	≥3	84	17	20.2%	32	6	18.8%	52	11	21.2%	1.16 [0.38–3.53]	42	5	11.9%
Mehanna 2013 (27)	ALL	≥3	260	54	20.8%	122	22	18.0%	138	32	23.2%	1.37 [0.75–2.52]	88	8	9.1%
Ucar 2014^c (11)	ALL	≥3	1466	170	11.6%	1235	126	10.2%	231	44	19.0%	2.07 [1.42–3.02]			52.4%
Raguin 2017 (38)	ALL	>3 only							843	86	10.2%
Yoon 2011 (18)	ALL	≥3 only							206	65	31.6%
Giles 2015^c (16)	Benign only	≥3											323	32	9.9%
Porterfield 2008 (29)	Benign only	≥3 only
Bakkar 2017 (33)	Benign only	≥3 only
Nam 2017 (23)	Benign only	≥3 only
Koo 2016 (20)	ALL	>4	690	475	68.8%	662	468	70.7%	28	7	25.0%	0.14 [0.06–0.33]	108	17	15.7%
Bohacek 2012 (30)	ALL	>4	451	129	28.6%	360	103	28.6%	91	26	28.6%	1.00 [0.60–1.66]	173	12	6.9%
Albuja-Cruz 2013 (9) (Parikh 2013 (8))	ALL	≥4	1068	534	50.0%	856	459	53.6%	212	75	35.4%	0.47 [0.35–0.65]	322	42	13.0%
Cavallo 2017^c (12)	ALL	≥4	548	208	38.0%	456	185	40.6%	92	23	25.0%	0.49 [0.29–0.81]	155	5	3.2%
Shrestha 2012 (39)	ALL	≥4	695	129	18.6%	568	111	19.5%	127	18	14.2%	0.68 [0.40–1.17]			7%
Varshney 2014 (36)	ALL	≥4	998	509	51.0%	773	410	53.0%	225	99	44.0%	0.70 [0.52–0.94]	136	32	23.5%
Godazandeh 2016 (40)	ALL	≥4	95	24	25.3%	62	16	25.8%	33	8	24.2%	0.92 [0.35–2.45]	57	2	3.5%
Kamran 2013^c (10)	ALL	≥4	7348	927	12.6%	6575	811	12.3%	773	116	15.0%	1.25 [1.02–1.55]	1502	79	5.3%
Ucar 2014^c (11)	ALL	≥4	1466	170	11.6%	1327	144	10.9%	139	26	18.7%	1.89 [1.19–2.99]
Carrillo 2000 (19)	ALL	≥4	152	70	46.1%	72	24	33.3%	80	46	57.5%	2.71 [1.40–5.24]	74	9	12.2%
Kuru 2010 (7)	ALL	≥4	662	98	14.8%	514	63	12.3%	148	35	23.6%	2.22 [1.40–3.52]	417	8	1.9%
Bestepe 2016 (35)	ALL	≥4	5561	634	11.4%	4553	548	12.0%	1008	86	8.5%	0.68 [0.54–0.87]	2982	87	2.9%
Deniwar 2016 (37)	ALL	≥4	375	126	33.6%			36.0%			19.0%		192	28	14.6%
Megwalu 2017 (34)	ALL	≥4 only							101	10	9.9%
Raj 2012 (41)	ALL	≥4 only							223	16	7.2%
Rosario 2009 (21)	ALL	≥4 only							151	34	22.5%
Pinchot 2009 (14)	ALL	≥4 only							155	21	13.5%
Kulstad 2016 (24)	ALL	≥4 only							206	49	23.8%
Wharry 2014 (31)	ALL	≥4 only							382	83	21.7%
McCoy 2007 (32)	ALL	≥4 only							223	43	19.3%
Kim 2014 (15)	ALL	≥4 only							263	154	58.6%
Khalife 2016 (26)	Benign only	≥4											101	16	15.8%
Giles 2015^c (16)	Benign only	≥4											323	32	9.9%
Shi 2017 (42)	Benign only	≥4											337	7	2.1%
Taghipour 2013 (22)	ALL	>5	271	76	28.0%	194	62	32.0%	77	14	18.2%	0.47 [0.25–0.91]	52	8	15.4%
Bulk Sum 3 cm			10963	1705	15.6%	4393	859	19.6%	7619	997	13.1%	0.62 [0.56–0.69]	856	67	7.8%
Bulk Sum 4 cm			20109	4033	20.1%	16778	3342	19.9%	4660	975	20.9%	1.06 [0.98–1.16]	6879	376	5.5%

Totals exclude benign-only and large-only studies.

Totals exclude large-only studies.

Article in which two size cutoffs were analyzed.

Nodules <1 cm excluded but no comment on correlation to index nodule.

CI, confidence interval; ETE, extrathyroidal extension; FNA, fine-needle aspiration; FTC, follicular thyroid carcinoma; HTC, Hürthle cell variant of FTC; MTC, medullary thyroid carcinoma; NIFTP, noninvasive papillary thyroid neoplasm with papillary-like nuclear features; NR, not reported; OR, odds ratio; Palp, palpation guided; PDC, poorly differentiated carcinoma; PTC, papillary thyroid carcinoma; PTC-FV, papillary thyroid carcinoma follicular variant; PTMC, papillary thyroid microcarcinoma; UNK, unknown; US, ultrasound-guided; US-Bx, ultrasound-guided FNA biopsy; UTC, undifferentiated thyroid carcinoma; WDT-UMP, well-differentiated thyroid neoplasm of uncertain malignant potential.

Results of meta-analysis

Of the 35 main articles, a size cutoff of 3 cm could be analyzed in 14 articles, and a size cutoff of 4 cm could be analyzed in 24 articles. (In three articles, size cutoffs of both 3 and 4 cm were analyzed. Therefore, the number of analyses is greater than the number of articles.) All but two studies (28,38) separated nodules at ≥3 cm; all but two studies separated nodules at ≥4 cm (20,30). The studies separating nodules at >3 and >4 cm were included with the ≥3 and ≥4 cm groups, respectively. Therefore, in the text, the symbol “≥” is be used to denote both “greater than” and “greater than and equal to.” In only one article was a cutoff of >5 cm used (Table 1).

Effect size: rates of malignancy by nodule size

This analysis included articles with all cytologic diagnoses; articles with only benign cytologic diagnoses were excluded. Comparisons between small and large nodules at a 3 cm cutoff size could be made in eight studies. The overall rates of malignancy in nodules <3 cm versus ≥3 cm ranged from 10.2% to 48.3% (mean 26.8%, median 18.5%) versus 11.1% to 33.3% (mean 20.3% median 20.1%), respectively (10 –13,17,18,25,27 –29). The total number of nodules studied ranged from 84 to 7348 (mean 1370, median 439). In two studies, a statistically significant difference was not found between malignancy rates in smaller versus larger nodules (25,27). In five studies, larger nodules were found to have statistically significantly lower rates of malignancy than smaller nodules (10,12,13,17,28). In one study, larger nodules were found to have statistically significantly higher rates of malignancy than smaller nodules (11). Additionally, two studies only analyzed nodules ≥3 cm and showed malignancy rates of 10.2% and 31.6% (18,38). The malignancy rate of all nodules <3 cm (n = 4393) was compared to that of all nodules ≥3 cm (n = 7619). Bulk malignancy rates were 19.6% versus 13.1%, respectively (OR 0.62 [CI 0.56–0.69]). In a meta-analysis, nodules ≥3 cm showed a statistically significantly lower rate of malignancy with fixed-effects modeling, but an equal risk of malignancy with random-effects modeling (fixed effects: OR 0.72 [CI 0.64–0.81]; random effects: OR 0.73 [CI 0.48–1.11]; Fig. 2A and Supplementary Fig. S1A; Supplementary Data are available online at www.liebertpub.com/thy).

Comparisons between small and large nodules at a 4 cm cutoff size could be made in 12 studies. The overall rates of malignancy in nodules <4 cm versus ≥4 cm ranged from 10.9% to 70.7% (mean 31.1%, median 27.2%) versus 8.5% to 57.5% (mean 26.6%, median 24.6%), respectively (6,9 –12,19,20,30,35 –37,39). The total number of nodules studied ranged from 95 to 7348 (mean 1644.5, median 692.5). In five studies, a statistically significant difference was not found between malignancy rates in smaller versus larger nodules (16,26,30,39,40). In five studies, larger nodules were found to have statistically significantly lower rates of malignancy than smaller nodules (9,12,20,35,36). In four studies, larger nodules were found to have statistically significantly higher rates of malignancy than smaller nodules.(7,10,11,19). Additionally, one study only provided data in percentages. Therefore, independent statistical analysis was not performed. However, they reported no significant difference between small and large nodules (p = 0.89) (37). Eight studies only analyzed nodules ≥4 cm and showed malignancy rates of 7.2–58.6% (mean 22.1%, median 20.5%). In these studies, comparisons to smaller nodules could not be made (14,15,21,24,31,32,34,41). The malignancy rate of all nodules <4 cm (n = 16,778) was compared to that of all nodules ≥4 cm (n = 4660). Bulk malignancy rates were 19.9% versus 20.9%, respectively (OR 1.06 [CI 0.98–1.16]). In a meta-analysis, nodules ≥4 cm showed a statistically significantly lower rate of malignancy with fixed-effects modeling, but an equal risk of malignancy with random-effects modeling (fixed effects: OR 0.85 [CI 0.77–0.95]; random effects: OR 0.87 [CI 0.62–1.22]; Fig. 2B and Supplementary Fig. 1B).

The rates of malignancy in nodules <5 cm versus >5 cm (a single study, 271 nodules) were 32% versus 18.2%, respectively (22). In this case, larger nodules had a statistically significantly lower rate of malignancy than smaller nodules (OR 0.47 [CI 0.25–0.91]).

Effect size: false-negative rates by nodule size

This analysis included articles reporting benign cytologic diagnosis; nodules without benign cytologic diagnoses were excluded. Comparisons between small and large nodules at a 3 cm cutoff size could be made in five studies. The overall false-negative rates in nodules <3 cm versus ≥3 cm ranged from 0% to 21.9% (mean 6.8%, median 4.8%) versus 6.7% to 16.7% (mean 10.3%, median 11.7%), respectively (11,12,16 –18,25,27,29). In four studies, a statistically significant difference was not found between false-negative rates in smaller versus larger nodules (12,16,25,27). In one study, larger nodules were found to have statistically significantly lower false-negative rates than smaller nodules (17). Additionally, five studies only analyzed nodules ≥3 cm and showed false-negative rates ranging from 0.7% to 13.4%. However, comparisons to smaller nodules could not be made (15,18,23,29,33). One study showed extremely high false-negative rates of 43.7% and 77.3% in small versus large nodules. However, they did not exclude incidental papillary thyroid microcarcinomas (PTMC) outside the index nodule (11). Combining all studies using a cutoff size of 3 cm, the false-negative rate of all nodules <3 cm (n = 246) was calculated and compared to that of all nodules ≥3 cm (n = 1435). Bulk false-negative rates were 5.7% versus 7.2% (OR 1.29 [CI 0.72–2.49]). In a meta-analysis, nodules ≥3 cm were not statistically significantly different (fixed effects: OR 1.47 [CI 0.80–2.69]; random effects: OR 1.57 [CI 0.51–4.83]; Fig. 3A and Supplementary Fig. 2A).

Comparisons between small and large nodules at a 4 cm cutoff size could be made in 12 studies. The overall false-negative rates in nodules <4 cm versus ≥4 cm ranged from 1.3% to 28.2% (mean 9.0%, median 6.9%) versus 0% to 20% (mean 10%, median 8.2%), respectively (7,9,12,14,16,19 –21,24,26,30 –32,35,36,40 –42). In 11 studies, a statistically significant difference was not found between false-negative rates in smaller versus larger nodules (7,9,12,16,19,20,26,30,36,40,42). In one study, false-negative rates were higher in larger nodules (35). Additionally, eight studies only analyzed nodules ≥4 cm and showed false-negative rates of 0–12.7% (mean 6.6%, median 7.1%). However, comparisons to smaller nodules could not be made (14,15,21,24,31,32,34,41). Combining all studies using a cutoff size of 4 cm, the false-negative rate of all nodules <4 cm (n = 3655) was calculated and compared to that of all nodules ≥4 cm (n = 2232). Bulk false-negative rates were 4.5% versus 6.7%, respectively (OR 1.07 [CI 0.85–1.36]). In a meta-analysis analyzing only studies with both small and large nodules with a 4 cm cutoff, those ≥4 cm had a slightly higher false-negative rate, although the difference did not reach our assigned p-value of <0.01 (fixed effects: OR 1.38 [CI 1.06–1.80]; random effects: OR 1.38 [CI 1.03–1.85]; Fig. 3B and Supplementary Fig. S2B).

Six studies did not stratify false-negatives by size but showed overall rates of 0–15.4% (mean 8.0%, median 6.5%) (10,13,22,28,37,39).

Publication bias

When performing the Begg and Mazumdar adjusted rank correlation test for publication bias, Egger's bias (0.70; p = 0.65) at the 3 cm cutoff was not significant, nor was it at the 4 cm cutoff (−0.49; p = 0.92; Supplementary Fig. S3A and B). Notably, however, there were only nine studies within the 3 cm cutoff group, which does not reach the minimum of 10 studies typically expected to analyze a meta-analysis for symmetry fully (43). While interpretation of the 3 cm group was limited by the number of studies, neither group demonstrated any clear publication bias, although significant inter-study heterogeneity existed.

Histology of false-negatives diagnoses

Of the six articles that documented false-negative diagnoses in nodules ≥3 cm, the most frequently reported diagnosis within the index nodule was follicular variant of papillary thyroid carcinoma (FV-PTC; 32 cases), followed by follicular thyroid carcinoma (FTC; 24 cases, including one Hürthle cell carcinoma [HTC]), PTC (10 cases, including one oxyphilic, one tall cell, and one with “mixed” histologic patterns), two undifferentiated or anaplastic thyroid carcinomas (UTC), and one poorly differentiated thyroid carcinoma (PDC) (16,18,23,25,29,33). One of these also reported four FV-PTC in nodules <3 cm (16). Of the nine articles that documented false-negative diagnoses in nodules ≥4 cm, the most frequently reported diagnosis within the index nodule was PTC (38 cases), FV-PTC (28 cases, including two deemed well-differentiated thyroid neoplasm of uncertain malignant potential), followed by FTC (24 cases, including three HTC), three PTMC, one medullary thyroid carcinoma (MTC), and three other (including one lymphoma and two unspecified) (7,8,14,15,21,24,31,32,35). One of these also reported 47 PTC, 5 FTC, and 3 FV-PTC (deemed well-differentiated thyroid neoplasm of uncertain malignant potential) in nodules <4 cm (35). One article that reported false-negative diagnoses at cutoffs of 3 and 4 cm showed one PTC <3 cm, one FV-PTC, and one PTC between 3 and 4 cm, and two FV-PTC ≥4 cm (12). Four articles reported false-negative diagnoses without specifying size (20,22,27,30). In these articles, the most frequently reported diagnosis was PTC (28 cases), FTC (8), FV-PTC (5), PTMC (2), and one each of MTC and UTC (20,22,27,30). Overall, 180 false-negative diagnoses within the index nodule were reported: the most frequent was PTC (125; 39.4%), followed by FV-PTC (75; 23.7%), FTC (61; 19.2%, including four HTC), PTMC (5; 1.6%), UTC (3; 0.9%), MTC (2; 0.6%), and four other (one PDC, one lymphoma, two unknown).

Conclusions of original authors

Authors of 31 articles made recommendations on whether to resect all large, cytologically benign thyroid nodules surgically. Overall, 7 (22.6%) recommended surgical resection, and 24 (77.4%) did not. Of authors who made recommendations based on a 3 cm size cutoff, 4/9 (44.4%) recommended surgical resection (11,16,25,38), and 5/9 (55.5%) did not recommend resection (13,17,18,29), including one who recommended close follow-up with repeat US-guided FNA or surgery in large nodules (27). Of authors who made recommendations based a 4 cm size cutoff, 3/21 (14.3%) recommended surgical resection (14,31,32), and 18/21 (85.7%) did not recommend resection (7,10,15,24,26,30,34,36,37,39 –42), including (i) the only article whose data showed slightly higher false-negative rates in larger nodules (35), (ii) one that additionally recommended personalized practice driven by institutional false-negative rates (21), (iii) two that recommended close follow-up with repeat US-guided FNA or surgery (19,20), and (iv) one that recommended consideration of resection of large nodules in women but not men (8). One set of authors evaluated both 3 and 4 cm cutoffs and did not recommend surgical resection (12).

Summary

This meta-analysis of 35 articles evaluated the overall rates of malignancy and false-negative rates in thyroid nodules by size, using US size cutoffs of 3 and 4 cm. In bulk analyses, larger nodules did not have an increased risk of malignancy compared to smaller nodules: 19.6% (<3 cm) versus 13.1% (≥3 cm) and 19.9% (<4 cm) versus 20.9% (≥4 cm). Similarly, bulk false-negative rates were statistically equivalent: 5.7% (<3 cm) versus 7.2% (≥3 cm) and 4.5% (<4 cm) versus 6.7% (≥4 cm; Table 1). Results were similar in formal meta-analyses, with the exception of slightly higher false-negative rates in nodules ≥4 cm (Fig. 3B). This slight difference in nodules ≥4 cm is not statistically significant at the assigned p-value cutoff of 0.01, is not considered clinically meaningful, and is still less than the false-negative rate in nodules ≥3 cm. Based on these data, surgical resection of large cytologically benign nodules is not recommended in the absence of other clinical indications for resection.

More than 20,000 pooled nodules were analyzed in this study, >7000 with benign cytology. Only with such large numbers can accurate data be generated. In 21 data sets, malignancy rates could be compared by size. The majority (81%) showed similar or lesser rates of malignancy in larger nodules. False-negative rates could be compared in 17 data sets, one of which showed higher false-negative rates in larger nodules. In light of these findings, it is surprising that multiple authors recommend surgical resection of large cytologically benign nodules due to perceived (but not actual) high false-negative rates. Of the seven articles that recommended resection, one included incidental microcarcinomas in the false negatives, yielding very high false-negative rates (11), and four did not evaluate small nodules (14,31,32,38). Of the remaining two, a statistically significant difference between false-negative rates in small and large nodules was not found on the analysis (16,25). Giles et al. (16) found statistical significance using a size cutoff of 3 cm (p = 0.03). Meko et al. did not perform statistical analysis on this subgroup (25). The recommendations to resect may have been based more on institutional preferences/practices rather than being driven by data.

Missed “follicular lesions”

Two groups that strongly recommended resection of large nodules also claimed that benign cytology may miss or incorrectly classify “follicular lesions,” namely follicular adenomas (including oncocytic or Hürthle cell adenomas). McCoy et al. and Pinchot et al. found that 26.7% and 42.3% cytologically benign nodules ≥4 cm were diagnosed as follicular adenomas (14,32). Since follicular neoplasms require resection for determination of follicular adenoma versus carcinoma, they claim that follicular adenomas with benign cytology may be inappropriately treated nonsurgically. However, patients with cytologically benign nodules that are not immediately resected should still be followed clinically. Repeat FNA or subsequent resection may be prompted by growth, worrisome US features, or patient symptoms—likely to occur in cases of true follicular carcinoma. The natural history of follicular neoplasms is not well understood, but malignant transformation is thought to be rare (44). Similarly, repeat biopsy showing malignancy in a cytologically benign nodule has only been reported rarely (45 –47). Additionally, not all follicular adenomas are highly cellular or composed of microfollicles to prompt a diagnosis of FLUS or follicular neoplasm on FNA. The Bethesda System comments that FNA cannot distinguish a dominant nodule in a multinodular goiter from a solitary colloid-rich, macrofollicular adenoma, such that both may receive a benign Bethesda II diagnosis (48). The World Health Organization Classification of Tumours of Endocrine Organs also confirms that follicular adenoma may contain follicles of varying sizes with varying amounts of intrafollicular colloid (49). In light of the vague criteria and nonspecific terminology, uniform diagnoses are not rendered by all pathologists, and what one may call a follicular adenoma, another may call a hyperplastic or adenomatoid or colloid nodule. The only way to determine true clonality in a nodule (i.e., adenoma) is via genetic analysis, which is not routinely performed on benign thyroid lesions. In a clonality analysis using X-inactivation, Apel et al. found that 18/27 (67%) histologically hyperplastic thyroid nodules were monoclonal and morphologically identical to the polyclonal cases (50). Others have supported the clonal origin of a subset of benign nodules in multinodular goiters (51 –55). Benign adenomatoid or histologically hyperplastic nodules may also harbor clonal RAS point mutations (53,56 –58). Due to the interobserver variability among both surgical pathologists and cytopathologists, caution must be exercised when evaluating so-called “missed follicular lesions,” as the significance is unclear (4,59).

Since follicular carcinomas are usually of larger size than PTC (60), higher false-negative rates for FTC might be expected in larger compared to smaller nodules. Missed FTC were reported in 13 studies. However, as most studies reported findings only in larger nodules or did not differentiate false-negative diagnoses based on size, comparisons based on size could not be made.

Missed “follicular variants of papillary”

The Endocrine Pathology Society recently proposed a revision of the nomenclature of noninvasive encapsulated follicular variant of papillary thyroid carcinoma (FV-PTC) to “noninvasive follicular thyroid neoplasm with papillary-like nuclear features” (NIFTP) if strict diagnostic criteria are met (61). This change will impact rates of malignancy and false-negative rates. In this meta-analysis, considering all 20 articles that reported false-negative diagnoses within the index nodule, PTC and FV-PTC were the most frequent (39.4% and 23.7%). Some of these PTC likely represent NIFTP, which should be treated by lobectomy and followed clinically. However, like follicular adenomas, not all are microfollicular or highly cellular. Some are macrofollicular or colloid-rich, and a clinically aggressive but cytologically benign NIFTP could also be brought to attention by worrisome clinical or radiographic features. Decreases in the malignancy rates of cytologically benign (Bethesda II) nodules have been reported by various institutions (ranging from 3% to 60% relative percent decrease with removal of NIFTPs) (62,63). Therefore, false-negative rates in this context are expected to decrease. In this meta-analysis, three studies published following the NIFTP proposal addressed this concept, and each study recognized only three NIFTP. In studies by Cavallo et al., Bakkar et al., and Nam et al., respectively, the false-negative rates of nodules >3 cm were 1.4%, 6.4%, and 13.4% with NIFTP considered benign and 5.8%, 7.9%, and 15.2% with NIFTP considered malignant (12,23,33).

Interobserver variability

Rates of malignancy in cytologically benign nodules are highly variable and depend upon individual and institutional practices of surgeons, endocrinologists, radiologists, and pathologists. First, false-negative rates may be affected by variability in sampling practices. FNA under US guidance by an interventional radiologist with adequacy assessment by a cytopathologist is favored at the authors' institution in order to qualify the imaging characteristics of the nodule and target the nodule or region of interest more accurately. Palpation-guided aspiration is not recommended and is hypothesized to result in an increased number of inadequately or inappropriately sampled specimens (64,65). In this meta-analysis, 22 articles utilized US-guided FNA, six US- or palpation-guided FNA, and seven did not specify the method used. Although comparisons are difficult to make in light of the high rates of US use, those using a mixture of biopsy techniques had overall false-negative rates ranging from 1.9% to 13% (6,14,25,39,42). These rates are similar to those reporting only US guidance (mean 15.2%).

Second, interobserver variability among cytopathologists and surgical pathologists may affect false-negative rates. Cibas et al. evaluated differences between an academic cytopathology panel and local community cytopathologists and found that overall concordance in the standard six-category Bethesda system was 64%. Furthermore, they found that local pathologists made fewer benign cytologic diagnoses, but their risk for malignancy was slightly higher. In other words, academic pathologists made more benign diagnoses with a concurrent decrease in final false-negative rate. Additionally, overall concordance in a two-tier histopathology system (benign vs. malignant) was 90.7%, with the most common disagreements being in PTC and FV-PTC (4). Unanimous agreement in diagnosis of FV-PTC/NIFTP, even among expert pathologists, is rare (in some studies only 13%) and depends upon the perceived severity of nuclear features (66). In this regard, a diagnosis of follicular adenoma, atypical follicular neoplasm, or papillary carcinoma may be given to the same nodule by different pathologists. Furthermore, FTC are associated with a similar challenge, namely variability in interpretation of the presence and extent of capsular or vascular invasion (67). Again, the same nodule may be given a diagnosis of adenoma or carcinoma by different pathologists. In summary, false-negative rates may be higher in community cytopathology practices or in aggressive surgical pathology practices in diagnostically equivocal nodules.

Clinical indications for resection of cytologically benign nodules

While this meta-analysis found that cytologically benign nodules need not be resected based on large size alone, other features may serve as indications for resection. The ATA advocates repeat biopsy after one to two years for US suspicious or growing nodules (3). Large nodules might still require resection if they are symptomatic, growing, of cosmetic concern, or substernal. Symptoms may include difficulty swallowing, difficulty breathing, or tracheal deviation. Young patients in whom nodule growth or symptomatology is predicted may elect to undergo surgical resection. Some physicians advocate removal of substernal goiters because they are likely to show progressive enlargement, and they cannot be easily biopsied or monitored for cancer (68).

Limitations

In most studies, moderate numbers of large nodules were evaluated, likely due to low resection rates of cytologically benign nodules in standard practice. All but two were retrospective, leading to a selection bias in most studies. Kuru et al. claimed to recruit patients prospectively. However, not all cytologically benign nodules were resected, and the patient population in this study was similar to that in the retrospective studies (7). Therefore, only the study by Rosario et al. was truly prospective, such that all patients with thyroid nodules ≥4 cm on US and no contraindication to or refusal of surgery proceeded to surgical resection regardless of the FNA result (21). In this study, the false-negative rate in large nodules was low (3.6%), and two of three were FV-PTC. A number of retrospective studies reported that surgical resection was routinely offered to patients with large nodules (16,17,27,31,32,41). However, it is unclear whether all patients with large nodules underwent resection. Therefore, some level of selection bias is likely still present in these studies.

Additionally, this study did not evaluate the sonographic features of thyroid nodules in the context of decision to biopsy and/or resect. While sonographic features play a role in risk stratification and clinical decision making in thyroid nodules, these features were not uniformly reported. The evaluation of imaging characteristics was not a goal of this study. However, differences in imaging characteristics of nodules may have affected the included data.

Another limitation is that only 20/35 articles included a pathologist among the authors (10,12,13,15,17,19,20,22,26 –29,31,32,35,37 –39). Cytologic-histologic comparison is not always straightforward, and difficulties may arise when confirming that the nodule sampled on FNA was appropriately correlated to the surgical specimen. Inclusion of a pathologist to assist with pathologic correlation is encouraged.

Conclusions

Resection of cytologically benign (Bethesda II) thyroid nodules with large size on US (≥3, 4, or 5 cm) is not recommended in the absence of other clinical indications in light of three main factors: (i) false-negative rates in larger nodules are low, widely variable, and, in most studies, not significantly different from false-negative rates in smaller nodules; (ii) false-negative rates are highly institution and practice dependent; and (iii) false-negative rates are expected to decrease in light of new recommendations on the nomenclature of NIFTP versus FV-PTC. Although a number needed to treat or cost–benefit analysis was not performed, surgical resection may lead to increased morbidity (physical, psychological, and possibly financial) compared to close clinical follow-up of cytologically benign nodules. Close follow-up (including repeat US and/or FNA) can identify patients who ultimately require resection, and those with indolent disease may avoid surgery for cytologically benign but large thyroid nodules. Based on the findings, the authors have stopped recommending thyroidectomy for large cytologically being nodules based on size alone.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

References

Cibas

, Ali

. 2009. The Bethesda System For Reporting Thyroid Cytopathology. Am J Clin Pathol, 132:658–665.

Cibas

, Ali

. 2017. The 2017 Bethesda System for Reporting Thyroid Cytopathology. Thyroid, 27:1341–1346.

Haugen

, Alexander

, Bible

, Doherty

, Mandel

, Nikiforov

, Pacini

, Randolph

, Sawka

, Schlumberger

, Schuff

, Sherman

, Sosa

, Steward

, Tuttle

, Wartofsky

. 2016. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: the American Thyroid Association Guidelines Task Force on Thyroid Nodules and Differentiated Thyroid Cancer. Thyroid, 26:1–133.

Cibas

, Baloch

, Fellegara

, LiVolsi

, Raab

, Rosai

, Diggans

, Friedman

, Kennedy

, Kloos

, Lanman

, Mandel

, Sindy

, Steward

, Zeiger

, Haugen

, Alexander

. 2013. A prospective assessment defining the limitations of thyroid nodule pathologic evaluation. Ann Intern Med, 159:325–332.

Krauss

, Mahon

, Fede

, Zhang

. 2016. Application of the Bethesda classification for thyroid fine-needle aspiration: institutional experience and meta-analysis. Arch Pathol Lab Med, 140:1121–1131.

Kuru

, Gulcelik

, Dincer

. 2009. Predictive index for carcinoma of thyroid nodules and its integration with fine-needle aspiration cytology. Head Neck, 31:856–866.

Kuru

, Gulcelik

, Dincer

. 2010. The false-negative rate of fine-needle aspiration cytology for diagnosing thyroid carcinoma in thyroid nodules. Langenbecks Arch Surg, 395:127–132.

Parikh

, Allan

, Lew

. 2013. Sex variability of fine-needle aspiration reliability in the diagnosis of malignancy in thyroid nodules ≥4 cm. Am J Surg, 206:778–782.

Albuja-Cruz

, Goldfarb

, Gondek

, Allan

, Lew

. 2013. Reliability of fine-needle aspiration for thyroid nodules greater than or equal to 4 cm. J Surg Res, 181:6–10.

10.

Kamran

, Marqusee

, Kim

, Frates

, Ritner

, Peters

, Benson

, Doubilet

, Cibas

, Barletta

, Cho

, Gawande

, Ruan

, Moore

, Pou

, Larsen

, Alexander

. 2013. Thyroid nodule size and prediction of cancer. J Clin Endocrinol Metab, 98:564–570.

11.

Ucar

, Sarikaya

, Parlak Ö, Yalcin

. 2014. Effect of nodule size on the reliability of fine-needle aspiration biopsy in thyroid nodules. Turk J Med Sci, 44:1002–1009.

12.

Cavallo

, Johnson

, White

, Siddiqui

, Antic

, Mathew

, Grogan

, Angelos

, Kaplan

, Cipriani

. 2017. Thyroid nodule size at ultrasound as a predictor of malignancy and final pathologic size. Thyroid, 27:641–650.

13.

Magister

, Chaikhoutdinov

, Schaefer

, Williams

, Saunders

, Goldenberg

. 2015. Association of thyroid nodule size and Bethesda class with rate of malignant disease. JAMA Otolaryngol Head Neck Surg, 141:1089–1095.

14.

Pinchot

, Al-Wagih

, Schaefer

, Sippel

, Chen

. 2009. Accuracy of fine-needle aspiration biopsy for predicting neoplasm or carcinoma in thyroid nodules 4 cm or larger. Arch Surg, 144:649–655.

15.

Kim

, Kim

, Oh

, Kim

, Chung

, Kim

. 2014. The validity of ultrasonography-guided fine needle aspiration biopsy in thyroid nodules 4 cm or larger depends on ultrasonography characteristics. Endocrinol Metab (Seoul), 29:545–552.

16.

Giles

, Maclellan

, Gawande

, Ruan

, Alexander

, Moore

, Cho

. 2015. False negative cytology in large thyroid nodules. Ann Surg Oncol, 22:152–157.

17.

Ucler

, Usluogulları

, Tam

, Ozdemir

, Balkan

, Yalcın

, Kıyak

, Ersoy

, Guler

, Ersoy

, Cakır

. 2015. The diagnostic accuracy of ultrasound-guided fine-needle aspiration biopsy for thyroid nodules three centimeters or larger in size. Diagn Cytopathol, 43:622–628.

18.

Yoon

, Kwak

, Moon

, Kim

E-K

. 2011. The diagnostic accuracy of ultrasound-guided fine-needle aspiration biopsy and the sonographic differences between benign and malignant thyroid nodules 3 cm or larger. Thyroid, 21:993–1000.

19.

Carrillo

, Frias-Mendivil

, Ochoa-Carrillo

, Ibarra

. 2000. Accuracy of fine-needle aspiration biopsy of the thyroid combined with an evaluation of clinical and radiologic factors. Otolaryngol Head Neck Surg, 122:917–921.

20.

Koo

, Song

, Kwon

, Bae

, Kim

J-H

, Min

, Lee

, Youn

Y-K

. 2016. Does tumor size influence the diagnostic accuracy of ultrasound-guided fine-needle aspiration cytology for thyroid nodules?. Int J Endocrinol, 2016:3803647.

21.

Rosario

, Salles

, Bessa

, Purisch

. 2009. Low false-negative rate of cytology in thyroid nodules >or = 4 cm. Arq Bras Endocrinol Metabol, 53:1143–1145.

22.

Taghipour Zahir

, Binesh

, Mirouliaei

, Khajeh

, Noshad

. 2013. Malignancy risk assessment in patients with thyroid nodules using classification and regression trees. J Thyroid Res, 2013:983953.

23.

Nam

, Kwak

, Moon

, Yoon

, Kim

E-K

, Koo

. 2017. Large (≥3 cm) thyroid nodules with benign cytology: can Thyroid Imaging Reporting and Data System (TIRADS) help predict false-negative cytology?. PLoS One, 12:e0186242.

24.

Kulstad

. 2016. Do all thyroid nodules >4 cm need to be removed? An evaluation of thyroid fine-needle aspiration biopsy in large thyroid nodules. Endocr Pract, 22:791–798.

25.

Meko

, Norton

. 1995. Large cystic/solid thyroid nodules: a potential false-negative fine-needle aspiration. Surgery, 118:996–1003, discussion 1003–1004.

26.

Khalife

, Bouhabel

, Forest

V-I

, Hier

, Rochon

, Tamilia

, Payne

. 2016. The McGill Thyroid Nodule Score's (MTNS+) role in the investigation of thyroid nodules with benign ultrasound guided fine needle aspiration biopsies: a retrospective review. J Otolaryngol Head Neck Surg, 45:29.

27.

Mehanna

, Murphy

, McCarthy

, O'Leary

, Tuthill

, Murphy

, Sheahan

. 2013. False negatives in thyroid cytology: impact of large nodule size and follicular variant of papillary carcinoma. Laryngoscope, 123:1305–1309.

28.

Deveci

, Deveci

, LiVolsi

, Gupta

, Baloch

. 2007. Concordance between thyroid nodule sizes measured by ultrasound and gross pathology examination: effect on patient management. Diagn Cytopathol, 35:579–583.

29.

Porterfield

, Grant

, Dean

, Thompson

, Farley

, Richards

, Reading

, Charboneau

, Vollrath

, Sebo

. 2008. Reliability of benign fine needle aspiration cytology of large thyroid nodules. Surgery, 144:963–968, discussion 968–969.

30.

Bohacek

, Milas

, Mitchell

, Siperstein

, Berber

. 2012. Diagnostic accuracy of surgeon-performed ultrasound-guided fine-needle aspiration of thyroid nodules. Ann Surg Oncol, 19:45–51.

31.

Wharry

, McCoy

, Stang

, Armstrong

, LeBeau

, Tublin

, Sholosh

, Silbermann

, Ohori

, Nikiforov

, Hodak

, Carty

, Yip

. 2014. Thyroid nodules (≥4 cm): can ultrasound and cytology reliably exclude cancer?. World J Surg, 38:614–621.

32.

McCoy

, Jabbour

, Ogilvie

, Ohori

, Carty

, Yim

. 2007. The incidence of cancer and rate of false-negative cytology in thyroid nodules greater than or equal to 4 cm in size. Surgery, 142:837–44, discussion 844.e1–3.

33.

Bakkar

, Poma

, Corsini

, Miccoli

, Ambrosini

, Miccoli

. 2017. Underestimated risk of cancer in solitary thyroid nodules ≥3 cm reported as benign. Langenbecks Arch Surg, 402:1089–1094.

34.

Megwalu

. 2017. Risk of malignancy in thyroid nodules 4 cm or larger. Endocrinol Metab (Seoul), 32:77–82.

35.

Bestepe

, Ozdemir

, Tam

, Dellal

, Kilicarslan A

RLAK Ö, Ersoy

, Cakir

. 2016. Malignancy risk and false-negative rate of fine needle aspiration cytology in thyroid nodules ≥4.0 cm. Surgery, 160:405–412.

36.

Varshney

, Forest

V-I

, Zawawi

, Rochon

, Hier

, Mlynarek

, Tamilia

, Payne

. 2014. Ultrasound-guided fine-needle aspiration of thyroid nodules: does size matter?. Am J Otolaryngol, 35:373–376.

37.

Deniwar

, Hambleton

, Thethi

, Moroz

, Kandil

. 2015. Examining the Bethesda criteria risk stratification of thyroid nodules. Pathol Res Pract, 211:345–348.

38.

Raguin

, Schneegans

, Rodier

J-F

, Volkmar

P-P

, Sauleau

, Debry

, Debonnecaze

, Ghnassia

J-P

, Dupret-Bories

. 2017. Value of fine-needle aspiration in evaluating large thyroid nodules. Head Neck, 39:32–36.

39.

Shrestha

, Crothers

, Burch

. 2012. The impact of thyroid nodule size on the risk of malignancy and accuracy of fine-needle aspiration: a 10-year study from a single institution. Thyroid, 22:1251–1256.

40.

Godazandeh

, Kashi

, Zargarnataj

, Fazli

, Ebadi

, Kerdabadi

. 2016. Evaluation the relationship between thyroid nodule size with malignancy and accuracy of fine needle aspiration biopsy (FNAB). Acta Inform Med, 24:347–350.

41.

Raj

, Grodski

, Woodruff

, Yeung

, Paul

, Serpell

. 2012. Diagnostic lobectomy is not routinely required to exclude malignancy in thyroid nodules greater than four centimetres. ANZ J Surg, 82:73–77.

42.

Shi

, Bobanga

, McHenry

. 2017. Are large thyroid nodules classified as benign on fine needle aspiration more likely to harbor cancer?. Am J Surg, 213:464–466.

43.

Sterne

JAC

, Sutton

, Ioannidis

JPA

, Terrin

, Jones

, Lau

, Carpenter

, Rücker

, Harbord

, Schmid

, Tetzlaff

, Deeks

, Peters

, Macaskill

, Schwarzer

, Duval

, Altman

, Moher

, Higgins

JPT

. 2011. Recommendations for examining and interpreting funnel plot asymmetry in meta-analyses of randomised controlled trials. BMJ, 343:d4002–d4002.

44.

Arora

, Scognamiglio

, Zhu

, Fahey

. 2008. Do benign thyroid nodules have malignant potential? An evidence-based review. World J Surg, 32:1237–1246.

45.

Alexander

, Hurwitz

, Heering

, Benson

, Frates

, Doubilet

, Cibas

, Larsen

, Marqusee

. 2003. Natural history of benign solid and cystic thyroid nodules. Ann Intern Med, 138:315–318.

46.

Kim

S-Y

, Han

, Moon

, Kwak

, Chung

, Kim

E-K

. 2014. Thyroid nodules with benign findings at cytologic examination: results of long-term follow-up with US. Radiology, 271:272–281.

47.

Kuma

, Matsuzuka

, Yokozawa

, Miyauchi

, Sugawara

. 1994. Fate of untreated benign thyroid nodules: results of long-term follow-up. World J Surg, 18:495–498, discussion 499.

48.

Ali

, Cibas

. 2010. The Bethesda System for Reporting Thyroid Cytopathology: Definitions, Criteria and Explanatory Notes. Springer, New York, NY.

49.

World Health Organization. 2004. World Health Organization Classification of Tumours. IARC Press, Lyon, France.

50.

Apel

, Ezzat

, Bapat

, Pan

, LiVolsi

, Asa

. 1995. Clonality of thyroid nodules in sporadic goiter. Diagn Mol Pathol, 4:113–121.

51.

Kim

, Piao

, Park

, Chung

, Park

. 1998. Clinical significance of clonality in thyroid nodules. Br J Surg, 85:1125–1128.

52.

Chung

, Kang

, Kim

, Ro

. 1999. Clonal analysis of a solitary follicular nodule of the thyroid with the polymerase chain reaction method. Mod Pathol, 12:265–271.

53.

Krohn

, Reske

, Ackermann

, Müller

, Paschke

. 2001. Ras mutations are rare in solitary cold and toxic thyroid nodules. Clin Endocrinol (Oxf), 55:241–248.

54.

Krohn

, Paschke

. 2002. Somatic mutations in thyroid nodular disease. Mol Genet Metab, 75:202–208.

55.

Namba

, Matsuo

, Fagin

. 1990. Clonal composition of benign and malignant human thyroid tumors. J Clin Invest, 86:120–125.

56.

Daniels

. 2011. What if many follicular variant papillary thyroid carcinomas are not malignant? A review of follicular variant papillary thyroid carcinoma and a proposal for a new classification. Endocr Pract, 17:768–787.

57.

Salabè

. 2001. Pathogenesis of thyroid nodules: histological classification?. Biomed Pharmacother, 55:39–53.

58.

Johnson

, Cipriani

. 2018. Noninvasive follicular thyroid neoplasms with papillary-like nuclear features (NIFTPs) are genetically and biologically similar to adenomatous nodules and distinct from papillary thyroid carcinomas with extensive follicular growth. Arch Pathol Lab Med, 142:838–850.

59.

Cipriani

, Nagar

, Kaplan

, White

, Antic

, Sadow

, Aschebrook-Kilfoy

, Angelos

, Kaplan

, Grogan

. 2015. Follicular thyroid carcinoma: how have histologic diagnoses changed in the last half-century and what are the prognostic implications?. Thyroid, 25:1209–1216.

60.

Chow

S-M

, Law

SCK

, Au

S-K

, Leung

T-W

, Chan

PTM

, Mendenhall

, Lau

W-H

. 2002. Differentiated thyroid carcinoma: comparison between papillary and follicular carcinoma in a single institute. Head Neck, 24:670–677.

61.

Nikiforov

, Seethala

, Tallini

, Baloch

, Basolo

, Thompson

LDR

, Barletta

, Wenig

, Ghuzlan Al

, Kakudo

, Giordano

, Alves

, Khanafshar

, Asa

, El-Naggar

, Gooding

, Hodak

, Lloyd

, Maytal

, Mete

, Nikiforova

, Nosé

, Papotti

, Poller

, Sadow

, Tischler

, Tuttle

, Wall

, LiVolsi

, Randolph

, Ghossein

. 2016. Nomenclature revision for encapsulated follicular variant of papillary thyroid carcinoma: a paradigm shift to reduce overtreatment of indolent tumors. JAMA Oncol, 2:1023–1029.

62.

Strickland

, Howitt

, Marqusee

, Alexander

, Cibas

, Krane

, Barletta

. 2015. The impact of noninvasive follicular variant of papillary thyroid carcinoma on rates of malignancy for fine-needle aspiration diagnostic categories. Thyroid, 25:987–992.

63.

Faquin

, Wong

, Afrogheh

, Ali

, Bishop

, Bongiovanni

, Pusztaszeri

, VandenBussche

, Gourmaud

, Vaickus

, Baloch

. 2016. Impact of reclassifying noninvasive follicular variant of papillary thyroid carcinoma on the risk of malignancy in The Bethesda System for Reporting Thyroid Cytopathology. Cancer Cytopathol, 124:181–187.

64.

Can

, Peker

. 2008. Comparison of palpation-versus ultrasound-guided fine-needle aspiration biopsies in the evaluation of thyroid nodules. BMC Res Notes, 1:12.

65.

. 2011. A comparative study of 200 head and neck FNAs performed by a cytopathologist with versus without ultrasound guidance: evidence for improved diagnostic value with ultrasound guidance. Diagn Cytopathol, 39:743–751.

66.

Elsheikh

, Asa

, Chan

JKC

, Delellis

, Heffess

, LiVolsi

, Wenig

. 2008. Interobserver and intraobserver variation among experts in the diagnosis of thyroid follicular lesions with borderline nuclear features of papillary carcinoma. Am J Clin Pathol, 130:736–744.

67.

Franc

. 2003. Interobserver and intraobserver reproducibility in the histopathology of follicular thyroid carcinoma. Hum Pathol, 34:1092–1100.

68.

Hashmi

, Premachandra

, Bennett

AMD

, Parry

. 2006. Management of retrosternal goitres: results of early surgical intervention to prevent airway morbidity, and a review of the English literature. J Laryngol Otol, 120:644–649.