Abstract
Background:
Patients with low-risk papillary thyroid cancer (PTC) who demonstrate an excellent response to initial therapy have a 2% recurrence rate and 100% disease-specific survival within 10 years. Thus, annual surveillance may be excessive. We hypothesized that less frequent postoperative surveillance in these patients is cost effective.
Methods:
A Markov discrete time state transition model was created to compare postoperative surveillance tapered to 3-year intervals after 5 years of annual surveillance versus conventional annual surveillance in low-risk PTC patients with negative neck ultrasound and stimulated thyroglobulin less than 2 ng/mL 1 year postoperatively. Outcome probabilities, utilities, and costs were determined via literature review, the Medicare Physician Fee Schedule, and Healthcare Cost and Utilization Project data. Sensitivity analyses were performed to assess areas of uncertainty.
Results:
The cost of annual surveillance was $5,239 per patient and yielded 22.49 quality-adjusted life-years (QALYs). The 3-year strategy cost $2,601 less, but also yielded 0.01 less QALYs. Thus, the incremental cost per QALY of annual surveillance was $260,100. Probabilistic sensitivity analysis demonstrated that less frequent surveillance was more cost effective in 99.98% of 10,000 simulated patients. One-way sensitivity analysis revealed that annual surveillance would be cost effective if the total cost of neck ultrasound could be reduced to $23 or less.
Conclusion:
Extending postoperative surveillance to 3-year intervals after 5 years of annual surveillance in patients with low-risk PTC with excellent response to therapy is more cost effective than annual surveillance.
Introduction
T
The aim of this study was to evaluate the cost effectiveness of tapering postoperative surveillance to 3-year intervals after 5 years of annual surveillance rather than perpetual annual follow-up for patients with low-risk PTC who demonstrate excellent therapeutic response. We hypothesize that a reduced frequency surveillance strategy is cost effective.
Methods
Reference case
The reference case was defined as a healthy 45-year-old female patient with a 2-cm noninvasive PTC who underwent total thyroidectomy without prophylactic central neck dissection and received radioactive iodine ablation. This patient had excellent response to therapy, which was defined as a negative neck ultrasound, negative thyroglobulin antibodies, and a stimulated thyroglobulin level less than 2 ng/mL at 1-year follow-up.
Decision model
A Markov transition state model was created using decision analysis software (TreeAge Pro, Williamstown, MA). We created two distinct management strategies: 1) 5 years of annual surveillance, tapered to follow-up every 3 years (3-year strategy) and 2) annual surveillance. Patients in the 3-year strategy who developed abnormal findings on imaging or laboratory testing were transitioned to annual surveillance, even if further testing was negative for disease recurrence. Additionally, patients in the 3-year strategy whose serum thyroglobulin levels rose above 1 ng/mL on levothyroxine suppression, or who were diagnosed and treated for disease recurrence, were transitioned to annual surveillance thereafter. The control arm was modeled after the American Thyroid Association 2009 guidelines for patients with thyroid cancer (7): patients receive annual clinic visits with serum thyroglobulin testing, and patients with no suspicious findings receive neck ultrasound every 3 years. The probabilities of each clinical outcome were determined from literature review (Table 1).
US, ultrasound; FNA, fine-needle aspiration; Tg, thyroglobulin; WBS, whole-body scan; rhTSH, recombinant human thyrotropin; PET, positron emission tomography; RLN, recurrent laryngeal nerve.
The added risk of death after treated recurrence in the 3-year strategy represents a large area of uncertainty. Conceptually, this figure quantifies the disadvantage incurred by the patient when a recurrence is discovered and treated in a delayed fashion. The upper and lower bounds of added risk of death were estimated by multiplying the 30-year mortality of PTC after surgery and radioactive iodine (RAI) (2) by the hazard ratio for disease-specific survival of possessing more than 5 nodal metastases or large nodal metastases larger than 3 cm found during initial presentation of PTC among older patients (8). This assumes the worst-case scenario in which surveillance in 3-year intervals leading to delayed diagnosis of recurrence would culminate in the development of clinically relevant nodal metastases in all cases. This was done to bias the model toward annual surveillance.
No current guidelines delineate the appropriate duration of follow-up for our reference case. Thus, to determine the appropriate time horizon for our model we sought the expert opinion of 12 endocrinologists from multiple tertiary referral centers through an informal survey. The majority of endocrinologists replied they would follow the patient described in the reference case indefinitely. As a result, the model was made to progress in 1-year cycles for a total of 38 cycles; this reflects the median life expectancy predicted for our index patient by the National Center for Health Statistics (9).
The more cost-effective strategy was defined as the strategy that produced the greatest utility, measured in quality-adjusted life-years (QALYs), without exceeding a cost of $100,000/QALY gained over the inferior strategy.
Disease recurrence distribution calculation
Probabilities of disease recurrence described in the existing literature are typically reported over multiple-year timespans. However, our model required input of annual probabilities. Given that risk of disease recurrence is not distributed evenly over time, we used polynomial interpolation and appropriate scaling to determine annual rates of disease recurrence that reflect the natural history of disease. Probabilities of disease recurrence at various time intervals were obtained from a literature review. Using technical computing software (MATLAB, The MathWorks, Inc., Natick, MA), smooth curve-fitting data were obtained through interpolation by piecewise Hermite polynomials. The area under the fitted curve (AUC) was computed by the trapezoidal method of numerical integration. Annual occurrence probabilities were obtained by setting the AUC equal to the total probability of lifetime disease.
Cost estimation
This analysis is based upon a third-party payer perspective, and focused on direct costs of care. Model costs were determined using the Medicare Physician Fee Schedule and Healthcare Cost and Utilization Project National Inpatient Sample data (10,11). Costs for reoperation, adjuvant therapy, permanent recurrent laryngeal nerve injury, and permanent hypoparathyroidism after thyroidectomy were determined previously (12,13). All costs are listed in Table 2.
TSH, thyrotropin; Tg, thyroglobulin; FNA, fine-needle aspiration; WBS, whole body scan; 18F-FDG PET/CT, fluorine-18 2-fluoro-2-deoxy-
Costs were reported as 2014 U.S. dollars. When necessary, costs were adjusted to 2014 U.S. dollar value using an inflation rate equal to the mean of the annual changes of the Consumer Price Index for Medical Care from the year the cost was reported (14). All future costs were discounted 3% annually.
Effectiveness
Effectiveness was reported in QALYs. QALY adjustments were determined by literature review. QALY adjustments for surgical complications were obtained from weighted event probabilities of permanent recurrent laryngeal nerve injury and permanent hypoparathyroidism. These values were previously extrapolated from a time trade off questionnaire given to 109 random, healthy patients (15). Patients who required reoperation received a nonrecurring 11-day QALY deduction to reflect utility lost during treatment; this value was previously determined by expert opinion (13). All future QALYs were discounted 3% annually. All QALY adjustments are listed in Table 3.
QALY, quality-adjusted life-years.
Sensitivity analysis
One-way sensitivity analysis was performed on each model variable for a $100,000/QALY threshold. For probabilistic sensitivity analysis, all outcome probabilities were set as static probabilities with triangular frequency distributions; each outcome probability was assigned a different value within its triangular distribution for each simulation. The Monte Carlo probabilistic sensitivity analysis was performed with 10,000 simulations. The distribution of each model variable is listed in Tables 1 –3, and reflects the maximum range reported in the existing literature, a maximum variance of ±50% of the index case value, or the greatest difference between the reference value and 0 or 1.
Results
The perpetual annual surveillance strategy cost $5,239, and yielded 22.49 QALYs. The 3-year strategy cost $2,601 less, but also yielded 0.01 QALY less than annual surveillance. The added utility of annual surveillance represents an incremental cost of $260,100 per QALY gained. Thus, postoperative surveillance in 3-year intervals after 5 years of annual surveillance was determined to be the more cost-effective strategy.
One-way sensitivity analysis revealed that only three model probabilities had sufficient influence to independently make annual surveillance more cost effective than the 3-year strategy. Annual surveillance would become more cost-effective than the 3-year strategy: 1) if the lifetime probability of suspicious ultrasound findings in patients without recurrence exceeds 19%, compared to the index value of 3.7%, 2) if the annual probability of suspicious ultrasound findings after treated recurrence exceeds 22.4% in the 3-year strategy, compared to less than 1% annually in the annual surveillance arm, and 3) if the annual added risk of death after recurrence exceeds 0.5% in the 3-year strategy (Fig. 1).

One-way sensitivity analysis of added mortality after recurrence in 3-year surveillance.
One-way sensitivity identified only one model cost that could independently alter the outcome of our model. Annual surveillance would become cost effective if the cost of ultrasound could be reduced to less than $23, from an index value of $140. No other model probabilities, costs, or QALY adjustments were found to alter model outcomes independently, even at extreme values.
Probabilistic sensitivity analysis using 10,000 simulated patients revealed that the 3-year strategy for low-risk PTCs with excellent therapeutic response was found to be more cost-effective than annual surveillance in 99.98% of simulated patients (Fig. 2). In this analysis, the mean cost of the 5-year surveillance strategy was $2,428 and generated 22.48 QALYs. The annual surveillance arm cost $5,084 and generated 22.49 QALYs. These values are consistent with the expected values using index model variables.

Monte Carlo simulation of annual versus 3-year surveillance. *Every point on the graph represents the difference between annual and 3-year surveillance strategies in a single model simulation, with 10,000 total simulations.
Discussion
In our model, less frequent postoperative surveillance tapered to 3-year intervals after 5 years of annual surveillance was more cost effective than annual surveillance for low-risk PTC with excellent response to initial therapy. Although the 3-year strategy yielded 0.01 QALYs less than the control arm, it was associated with a cost savings of $2,610 per patient. The incremental cost per QALY for annual surveillance was $260,100, which exceeds the $100,000/QALY threshold for cost-effectiveness. This is the first formal cost-effectiveness analysis of postoperative surveillance strategies of low-risk PTC.
Sensitivity analysis revealed that the four most influential variables in our cost effectiveness model were: 1) added risk of death after treated disease recurrence in the 3-year strategy, 2) probability of suspicious ultrasound findings in patients without recurrence, 3) probability of suspicious ultrasound findings in patients with treated recurrence in the 3-year strategy, and 4) cost of neck ultrasound. However, these model variables only changed the outcome of the decision model at extreme values.
Undoubtedly there is a concern that less frequent postoperative surveillance may permit undetected tumor growth leading to greater disease-specific morbidity and mortality. To address this area of uncertainty, we created an “additional risk of death” pathway after disease recurrence in the 3-year strategy. This does not represent the overall mortality rate, only a theoretical increase from baseline due to less frequent follow-up. The added risk of death after treated recurrence needed to exceed 0.5% annually to make annual surveillance the dominant strategy. Given that the 10-year disease-specific mortality of low-risk PTC approaches 0% despite an 8% recurrence rate, an annual added mortality of 0.5% per year for a 3% recurrence rate would be extremely unlikely (6). In our model, recurrences peak at 2–3 years after diagnosis. An annual 0.5% added risk of death for patients treated for recurrent disease at 10 years would have a cumulative additional mortality rate of 14% over the duration of model. This is far in excess of the 3–6% 30-year all-stage mortality rate associated with PTC (2).
Sensitivity analysis also revealed that annual surveillance would become cost effective if the lifetime probability of positive ultrasound in patients with previously negative testing exceeds 19%. This would represent a greater than fivefold increase from the index value of 3.7%. After a median of 7 years of follow-up, patients treated for PTC with an excellent response to therapy only had 4% recurrence (5). Annual surveillance also becomes cost-effective if the annual probability of suspicious ultrasound findings after treated recurrence in the 3-year strategy exceeds 22.4%. In comparison, only 3.1% of patients had clinical evidence of disease per year following reoperation for recurrent PTC (16); thus, 22.4% would be more than a sixfold increase.
Finally, the only modifiable variable that could change the outcome of our model is the cost of neck ultrasound. In order for annual surveillance to be cost effective, the cost of ultrasound would have to be reduced from $140 to $23. Thus, appropriate risk stratification and decreased frequency of postoperative surveillance process may be more feasible than reducing costs of neck ultrasound for maximizing cost effectiveness.
Although no other studies to our knowledge have formally evaluated the cost effectiveness of postoperative surveillance strategies, several previous studies have affirmed low recurrence rates for PTC when neck ultrasound and biochemical testing are negative after therapy (5,17,18). A recent study validated a dynamic risk restratification scheme that predicts only a 2% risk of recurrence for low-risk PTC if stimulated thyroglobulin and neck ultrasound are negative at initial follow-up; moreover, only 0.6% had structural evidence of recurrence (5). Current recommendations from the American Thyroid Association 2009 guidelines state that low-risk patients “who have had remnant ablation, negative cervical US, and undetectable TSH-stimulated Tg can be followed primarily with yearly clinical examination and Tg measurements” and “periodic cervical US” (7). Our findings suggest that these recommendations may merit further refinement after cost considerations are taken into account.
This study has several limitations. There is a paucity of data regarding outcomes following reoperation for recurrent PTC. Thus, model variables such as probability of positive neck ultrasound or serum thyroglobulin are based on a small number of studies. Furthermore, there are limited data on the natural history of undetected locoregional recurrences and the effect of node number, node size, and extranodal extension on outcomes of reoperation. Thus, the magnitude of added morbidity and mortality associated with recurrences and their management after delayed recognition in the 3-year surveillance arm may be greater than we have estimated in our base case. Nonetheless, the uncertainty of these variables was assessed using sensitivity analysis.
The decision tree omitted several potential outcomes for the sake of simplicity. The model assumes that rates of temporary recurrent laryngeal nerve injury, temporary hypocalcemia, wound infection, hematoma formation, and operative mortality would not differ significantly between surveillance strategies. The decision tree also automates clinical decisions, and cannot adequately simulate the complex decision-making associated with long-term management of PTC. Finally, our analysis focused on direct medical costs, and omitted indirect costs, such as lost productivity or increased anxiety due to more frequent surveillance testing. Nevertheless, excessive, “defensive” testing is associated with worse outcomes and decreased utility for patients, and inclusion of indirect costs would likely favor the less aggressive surveillance strategy (19).
The $100,000 per QALY threshold was used to interpret our results because it is commonly accepted and cited in cost-effectiveness literature (20), nonetheless, this value is somewhat arbitrarily defined, and the rationale for using this threshold varies between studies. The $100,000/QALY threshold has been used to represent either the approximate inflation-adjusted value of the originally established $50,000 per QALY standard set in 1950, which reflects the annual Medicare coverage for treatment of end-stage renal disease requiring dialysis, or twice the per capita annual income in the United States (20,21). While useful as a method of standardization, these are imperfect benchmarks to measure the merit of medical expenditures. Moreover, an analysis of spending behavior within the United States showed that in 2008, actual willingness to pay for a QALY is between $100,000 and $300,000 (21). Thus, our results are meant to inform, but not dictate, clinical decision-making.
In conclusion, tapering postoperative surveillance to 3-year intervals after 5 years of annual surveillance is more cost effective than annual surveillance for patients with low-risk PTC with excellent response to initial therapy. Given the favorable prognosis in this subset of patients, and the added costs of perpetual annual surveillance with essentially equivalent outcomes, clinicians could consider using a less frequent long-term follow-up interval. The proposed 3-year surveillance strategy is not being specifically advocated, but simply serves to indicate that less frequent surveillance is cost effective compared to current practice. Indeed, it is plausible that even less frequent surveillance strategies may prove to be more cost effective. Future studies are needed to characterize the natural history of undetected locoregional PTC recurrences better and validate less aggressive surveillance strategies using actual cost data.
Footnotes
Acknowledgment
The H. H. Lee Research Award provided partial research support for James Wu, MD.
Author Disclosure Statement
No competing financial interests exist.
