Abstract
Background:
The management of thyroid nodules >4 cm with benign cytology after fine-needle aspiration biopsy (FNAB) is controversial. FNAB is associated with a high false-negative rate in this setting, and may result in a delayed diagnosis and management of thyroid cancer. However, the majority of these nodules are benign. Therefore, the objective of this study was to determine the cost-utility of observation versus surgical management for thyroid nodules >4 cm with benign cytology after FNAB.
Methods:
A microsimulation model comparing routine thyroid lobectomy with observation for low-risk patients with >4 cm thyroid nodules with benign FNAB cytology was constructed. Costs, quality-adjusted life-years (QALYs), and life-years gained were calculated over a lifetime time horizon from a U.S. Medicare perspective.
Results:
The proportion of patients undergoing thyroid lobectomy for benign final pathology was 40% in the observation strategy versus 66% in the surgical strategy (p < 0.001). Overall, the surgical strategy was associated with higher lifetime costs compared with the observation strategy (incremental difference: + US$12,992 [confidence interval (CI) 13,042–13,524]), but also more QALYs (+0.12 QALYs [CI 0.02–0.24]) and longer life expectancy (+1.67 years [CI 1.00–2.41]). Incremental lifetime costs were lower for patients <55 years compared with those ≥55 years (+11,181 vs. +14,811, p < 0.001). The probability of cost-effectiveness of the surgical strategy was 49% at a $100k/QALY threshold or 65% at a $100k/life-year gained threshold.
Conclusions:
Routine thyroid lobectomy is associated with improved outcomes at an acceptable cost compared with observation for thyroid nodules >4 cm with benign cytology after FNAB. Surgical resection may be a cost-effective strategy to rule out malignancy in these nodules.
Introduction
T
However, the majority of these large nodules are benign, and routine thyroid lobectomy may accrue unnecessary costs and add unnecessary burden on the healthcare system (12). Furthermore, routine surgery puts more patients at risk for permanent thyroidectomy-related complications, most importantly recurrent laryngeal nerve (RLN) injury, which significantly affects postoperative functional status and quality of life (13). Furthermore, thyroid malignancies with a delayed diagnosis due to false-negative FNAB results are still likely to have a favorable natural history, and prognosis may not be significantly affected (14). Therefore, the objective of this study was to determine the cost-utility of observation versus surgical management for thyroid nodules >4 cm with benign cytology after FNAB.
Materials and Methods
Simulation design
An individual patient microsimulation model (one million simulations; one-year cycle length) was constructed to investigate the cost-effectiveness of surgical and observation strategies for the management of patients with a single thyroid nodule >4 cm with benign cytology after FNAB (Fig. 1) over a lifetime horizon (3% discount rate). Patients with other risk factors for thyroid cancer (such as familial cancer syndromes or history of neck irradiation) and who had a thyroid nodule that did not exhibit any suspicious characteristics (as those patients would likely undergo upfront surgery) on ultrasound, and those with atypia of undetermined significance, suspicious for malignancy, or malignant cytology after FNAB were excluded from the simulated patient population. Specific literature searches were performed to identify probabilities for the events and outcomes included in this model. The probability of malignancy in nodules >4 cm with benign FNAB cytology was obtained from studies that reported the incidence of malignancy specifically in this patient group who underwent FNAB and subsequent surgery (i.e., studies that included patients with benign FNAB cytology who did not undergo surgery were not included). Weighted pooled proportions (random-effects model) were calculated if more than one relevant probability was identified. Base-case probabilities and their surrounding uncertainty estimates are reported in Table 1.

Overview of the microsimulation model. The square represents a decision node, ovals represent states, and rectangles represent events.
FNAB, fine-needle aspiration biopsy; TT, total thyroidectomy; RAI, radioactive iodine ablation; TL, thyroid lobectomy; RLN, recurrent laryngeal nerve.
A microsimulation generates patients as individual subjects and tracks them through the entire simulation. This allows for the generation of specific characteristics for each subject, as well as tracking of individual outcomes and disease history, compared with cohort “Markov” simulation, which models groups as a whole and is limited by the “memory-less” property. Microsimulation also includes both stochastic (“first-order”) and parameter (“second-order”) uncertainty for each simulation, thus incorporating both population and individual patient variability.
In the present model, individual characteristics such as age and sex were randomly assigned according to population distributions from a study by Kamran et al. (15), one of the largest studies examining the probability of malignancy in thyroid nodules, in order to account for age- and sex-related probabilities of overall survival (U.S. life tables) (16). Transient or permanent complications that may arise because of surgical intervention, such as RLN injuries or hypoparathyroidism, were tracked, as they had important adverse effects on lifetime costs and quality of life. Each RLN was modeled independently, that is, each nerve can be injured separately. Recurrences could present as locoregional or distant metastasis. Locoregional recurrences would then be treated by neck dissection, and could then further present as distant metastases as the model progresses. If recurrent disease presents as distant metastasis, then radioactive iodine ablation would be performed. The model did not take into account the use of adjuvant kinase inhibitor therapy. Locoregional recurrences did not affect overall survival (17). The probability of death from distant metastases was obtained from several studies (17 –19). The range of values for sensitivity analyses factored the different survival of single versus multiple organ metastases (19). Withdrawal ablation therapy was modeled and did not include the use of recombinant human thyrotropin (TSH) ablation. The adverse effects of ablation therapy on quality of life lasted for two months in the model. The model was created and analyzed using TreeAge 2012 (TreeAge Software, Inc., Williamstown, MA) and Stata v12 (StataCorp, College Station, TX).
In the surgical strategy, all patients undergo a diagnostic thyroid lobectomy, which may result in benign or malignant pathology. If benign, patients enter the “no evidence of disease” state, and may eventually experience an unrelated mortality (age- and sex-adjusted mortality according to U.S. life tables). Patients with a pathologic diagnosis of malignancy after thyroid lobectomy undergo a total thyroidectomy. In the observation strategy, patients would be managed by close clinical observation, as defined by the 2015 American Thyroid Association (ATA) Management Guidelines for Adult Patients with Thyroid Nodules and Differentiated Thyroid Cancer (1). Repeat FNAB would be performed if the nodule grew by >50% in volume, which is the indication threshold according to the ATA guidelines (1). The probability of growth was defined as per a study examining the size-related rate and frequency of growth in thyroid nodules using ultrasonography (20). Patients with subsequent FNAB diagnosis of malignancy after nodule growth underwent total thyroidectomy. All other patients with nodule growth (≥50% change in volume) underwent diagnostic thyroid lobectomy and completion thyroidectomy if a malignancy was diagnosis on final surgical pathology. A decision was made to model thyroid lobectomy, rather than continued clinical observation, in cases of nodule growth with a benign FNAB result, due to the high clinical suspicion of malignancy and the high probability of a symptomatic mass (7). In this strategy, patients may potentially have a delayed diagnosis of malignancy due to a false-negative result on the initial FNAB, therefore subjecting the patient to a higher risk of recurrence and cancer-specific mortality (21,22). Lifetime surveillance after thyroid malignancy was defined per the ATA guidelines in both strategies (1).
Outcomes
Costs
Total costs were calculated from the U.S. Medicare perspective, and expressed as 2014 U.S. dollars (Table 2). Relevant costs were obtained from the Healthcare Cost and Utilization Project from the Agency of Healthcare Research and Quality using national average 2012 Medicare costs for the appropriate ICD-9 procedure codes, and/or from the 2014 Medicare physician and clinical laboratory fee schedules (conversion factor [i.e., amount of reimbursement for one relative value unit] = $35.75) (23,24). Drug costs were based on U.S. wholesale costs (25).
All costs are displayed in 2013 US$.
Ranges were calculated using 50% of the base estimate for the lower limit and 200% for the upper limit.
Gamma distributions were fitted around each cost estimate using α = 1 and β = base case cost as parameters.
TSH, thyrotropin.
Effectiveness
Quality-adjusted life-years (QALYs) and life-years gained (LYGs) were the main effectiveness measure in this model. QALYs were calculated by multiplying the quality of life weight (“utility”; measured on a 0 [death] to 1 [perfect health] scale) by the time spent in that health state. For example, five years spent with a bilateral RLN palsy (weight = 0.21) (26) would result in 1.05 QALYs. Quality-of-life weights for the relevant health states are reported in Table 1, which were obtained from literature review. LYGs refer to the absolute survival time, regardless of quality-of-life weights.
Sensitivity and uncertainty analysis
Deterministic (one- and two-way) sensitivity analyses were performed to determine the specific impact of individual costs or probabilities on outcomes. This was performed by varying specific parameters across a range of plausible values identified from the literature or pooled analyses (with upper and lower limits set at the 2.5th and 97.5th percentile values, or the lowest and highest reported probabilities if the data were very heterogeneous). Threshold analyses were also performed as part of deterministic sensitivity analysis to determine the threshold value at which the base case result would no longer be true. All variables were tested across the entire spectrum of values in that probabilities were tested from 0% to 100%, quality-of-life weights from 0 to 1, and relative risks from 0 to 10. Probabilistic sensitivity analysis is inherently included as part of a microsimulation model, as values for each parameter are drawn from its fitted distribution (normal distributions for relative risks, β distributions for probabilities and utilities, and γ distributions for costs) for every individual patient simulation. Uncertainty around cost-effectiveness results was expressed graphically using cost-effectiveness acceptability curves. These graphs report the probability of cost-effectiveness at varying willingness-to-pay (WTP) thresholds (i.e., the amount a decision maker is willing to pay to gain an additional unit of effectiveness). WTP thresholds vary between healthcare systems and perspectives, but a $100,000/QALY threshold is commonly accepted (conceptually representing the yearly cost of renal replacement therapy).
Several alternate observation strategies were also tested in sensitivity analyses. In the base-case models, nodules in the observation arm underwent surgical resection if they grew in size, regardless of repeat FNAB cytology (assumed symptomatic if large) (7). In alternate observation strategy A, enlarging nodules underwent FNAB and continued clinical follow-up depending on the FNAB cytology result. In alternate observation strategy B, nodules were managed as per the base-case scenario (i.e., resection if nodule growth regardless of repeat FNAB), but with follow-up every six months instead of every year. In alternate strategy C, follow-up occurred every two years.
Results
The mean age of the simulated cohort was 54.3 years (standard deviation [SD] = 22.1), and 14% were male. The pooled probability of malignancy in thyroid nodules >4 cm with benign cytology was 9.6% [confidence interval (CI) 4.2–15.0] (10,11,27 –31). The incremental cost and effectiveness result of each individual patient simulation are shown in Figure 2A and B. The mean incremental lifetime costs and effectiveness of surgery versus observation are reported in Table 3. Overall, the surgical strategy was more costly but was also more effective in terms of longer survival, as there was a minimal difference in QALYs. Importantly, incremental lifetime costs were lower in patients <55 years old compared with those >55 years old (p < 0.001). Incremental LYGs were also higher in the younger age group, but the difference in QALYs remained negligible. The incremental costs and effects of the surgical strategy over the alternate observation strategies are also reported in Table 3. In all alternate observation scenarios, the surgical management strategy remained more costly and more effective in terms of LYGs, but with minimal incremental QALYs. Similarly, the incremental costs and effectiveness was more favorable for the surgical management in younger compared with older patients.

Incremental costs and effects of surgical management (i.e., upfront thyroid lobectomy) over observation using (
Confidence intervals were derived from the 2.5th and 97.5th percentile of bootstrap estimates (10,000 iterations).
CI, confidence interval; QALYs, quality-adjusted life-years; LYGs, life-years gained.
Deterministic analysis demonstrated that the surgical strategy was always more costly and also more effective compared with the observation strategy across the range of plausible values for all of the tested variables (data not shown). Ten model parameters exhibited threshold values for incremental costs and/or effectiveness expressed as QALYs (Table 4). None of the threshold values for these 10 parameters were contained within the range of plausible values reported in Table 1. The relationship between variations in FNAB sensitivity and differences in incremental cost, QALYs, and LYGs is shown in Figure 3. As the sensitivity of FNAB increases, the incremental costs decrease (resulting in more patients requiring life-long follow-up), but the incremental effectiveness (in both QALYs and LYGs) also decreases, resulting in a very high incremental cost-effectiveness ratio (i.e., not cost-effective). Cost-effectiveness acceptability curves using QALYs and LYGs as the effectiveness measure are shown in Figure 4A and B. The probability of cost-effectiveness was 49% for the whole cohort at a threshold of $100,000 per additional QALY, whereas this value was higher for patients <55 years old (60%) compared with older patients (38%). At a threshold of $100,000 per additional LYG, the probability of cost-effectiveness was 65% for the whole cohort, 81% for patients <55 years old, and 61% for those >55 years old.

Changes in incremental costs, QALYs, and LYGs based on variations of fine-needle aspiration biopsy sensitivity.

Cost-effectiveness acceptability curves of surgical management versus observation using threshold values for (
Variables not included in this table did not contain a threshold value that changed the base-case interpretation (surgical management is more costly and more effective than observation).
At value x of the variable, surgical management would become less costly and/or less effective than observation.
Discussion
The management of thyroid nodules >4 cm with benign FNAB cytology is controversial. Nodule size has not been unequivocally demonstrated as an independent risk factor for malignancy (32,33). However, the larger the nodule, the greater the risk that FNAB may miss foci of malignant transformation within the nodule and result in a false-negative result. However, the majority of these nodules are still benign, and the cost-effectiveness of routine thyroid lobectomy for large nodules regardless of FNAB cytology has not been evaluated. In the present modeling study, routine surgery for patients with nodules >4 cm was associated with an acceptable increase in cost, as well as longer overall survival compared with observation of these nodules.
The cost-effectiveness results were more favorable for younger patients. The incremental cost of routine surgery was lower in younger patients, likely because the cost of routine surgery was offset by a proportionally higher amount in the cost of lifelong follow-up in the observation strategy in cases of true benign final pathology. As expected, younger patients had more QALYs and LYGs compared with older patients because of longer baseline life expectancy (i.e., a missed diagnosis causing decreased overall survival in an elderly patient has much less of an impact than in a younger patient). However, there was a minimal difference in QALYs compared with the difference in LYGs between the two strategies, which is representative of the important adverse impact of postoperative complications, especially RLN injuries. These results may help guide shared decision making when deciding whether to perform a thyroid lobectomy in patients with nodules >4 cm—longer overall survival at the risk of significantly reduced quality of life if permanent postoperative complications occur. The effectiveness results were also duplicated in older patients but to a lesser degree. Therefore, observation may be a more viable option in older patients.
Sensitivity analyses demonstrate that the base case results were largely insensitive to a range of assumptions and parameter uncertainty. None of the range of plausible values that were obtained from literature review changed the overall interpretation of the base case result on deterministic sensitivity analysis (i.e., surgical management is more costly and effective than observation). Similarly, threshold analysis did not identify plausible values for most of the model parameters, except for certain quality-of-life weights. In the case of recurrent disease, the quality-of-life weight used in this model may be counterintuitively low (0.54), but this value obtained from a study that utilized an appropriate techniques, time trade-off, for preference elicitation. Other studies have also reported significantly impaired quality of life for patients with recurrent disease (34,35). Probabilistic analyses show that the majority of simulations report that surgical management was cost-effective at WTP thresholds of $50,000/QALY and $100,000/QALY, but this was associated with some uncertainty (Fig. 4A). However, there was considerably less uncertainty in the probability of cost-effectiveness using LYGs as the effectiveness measure (Fig. 4B), again demonstrating the important impact of postoperative complications.
The main inputs for this simulation were the diagnostic performance criteria of FNAB and the probability of malignancy in nodules >4 cm. Certainly, a FNAB cytology result of malignancy or suspicious for malignancy is an indication for surgery. In cases of benign cytopathology, the risk of sampling error in large nodules coupled with the potentially higher risk of malignancy may warrant diagnostic thyroid lobectomy. In particular, there was large variability in FNAB sensitivity for nodules >4 cm. The input for the base case analysis was obtained from a meta-analysis including 6589 patients with palpable nodules who underwent surgical resection after FNAB in which the pooled sensitivity ranged from 66% to 95% (36). The pooled probability of malignancy in thyroid nodules >4 cm was calculated using studies that specifically reported the FNAB cytology and surgical pathology in this subset of patients, which varied across a wide range of values (0.8–29.6%) (10,11,27 –31). This variation was accounted for by performing a sensitivity analysis (Fig. 4) in which the main model outputs were calculated for a wide range of plausible sensitivity values. As FNAB sensitivity improves, incremental costs of routine surgery decrease paradoxically because the accrued lifetime follow-up costs start to catch up to the cost of upfront surgery (lifelong follow-up would be needed to rule out malignant degeneration of the nodule). However, the difference in effectiveness between the two strategies also decreased as FNAB sensitivity increases, since there will be fewer false negatives with subsequent poor outcome.
It was also difficult to decide when it is necessary to proceed with surgery if large nodules were to be observed. In the present model, patients in the observation strategy arm underwent thyroid lobectomy if the nodule demonstrated growth from one follow-up to the next, even if the repeat FNAB resulted in benign cytology. While the rate of tumor growth may not differentiate benign versus malignant nodules (37), it is likely that further enlargement of a thyroid nodule that was already 4 cm in size would cause compressive symptoms (7), especially given the 50% increase in volume threshold as recommended by the ATA (1). Therefore, significant growth may be an indication for surgical resection. Nevertheless, alternate observation strategies were added in additional sensitivity analyses, one of which included continued observation after nodule growth depending on the repeat FNAB cytology (rather than immediate surgical resection with nodule growth). None of these alternate observation strategies significantly changed the interpretation of the base case results.
This study should be interpreted with several limitations in mind. Most importantly, these data are the result of a microsimulation, and are only as reliable as the model inputs themselves. The main model inputs were subject to large variability as reported in the literature. This uncertainty was incorporated within the fitted distributions around each variable, as well as through deterministic sensitivity analyses. These analyses did not change any of the overall conclusions of the model (i.e., routine surgery was more costly, but also more effective), suggesting that the model structure and parameter inputs were robust. Furthermore, certain assumptions regarding treatment options, such as routine radioactive iodine ablation and total thyroidectomy as the primary treatment for thyroid malignancy, are controversial (38). However, these management options were modeled according to the ATA guidelines (1). Finally, the cost inputs that were used in the simulation model may not reflect the regional variations in healthcare costs (39), but rather are representative of average U.S. Medicare costs.
In summary, the results of modeling suggest that routine surgery for all thyroid nodules >4 cm was associated with higher costs and improved effectiveness compared with clinical observation. The incremental costs of routine surgery were found to be acceptable, and the effectiveness gains were mainly due to longer overall survival. These results were even more pronounced in younger patients, and were insensitive to variations in important model parameters. Routine surgery should be considered a viable option for patients with thyroid nodules >4 cm, especially in younger patients and if overall survival is the primary goal of treatment.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
