Abstract
Background:
Considerable uncertainty remains about the pattern of use of treatment options for Graves' disease (GD) and their comparative effectiveness and safety.
Methods:
Between 2005 and 2013, we identified patients with GD who received antithyroid drugs (ATDs), radioactive iodine (RAI) or surgery, and were represented in a large administrative data set in the United States (OptumLabs® Data Warehouse).
Results:
We identified 4661 patients with GD: mean age 48 (SD ±14) years, white (63%), and female (80%). Patients received ATD, n = 2817 (60%), RAI, n = 1549 (33%), or surgery, n = 295 (6%). Success rates were 50% for ATD, 93% for RAI, and 99% for surgery. Median time to treatment failure was 6.8 months for ATD and 3 months for RAI and surgery. When patients were required to be on ATD for at least one year before assessing failure, the failure rate decreased to 25%. Adverse effects occurred in 12% of patients receiving ATD, 6% with RAI, and 24% with surgery. Factors associated with treatment success were age >55 years (for ATD) and female sex (for RAI). About 12% of patients receiving ATD continued this treatment for >24 months as initial therapy. When patients failed ATD therapy, the most common second-line therapy was reinitiation of ATD (65%), RAI (26%), and surgery (9%). Overall, 26% of patients remain on ATD therapy (combined first or second line).
Conclusions:
ATD therapy was the most common GD therapy and demonstrated the lowest efficacy and infrequent significant adverse effect profile. With one fourth of patients remaining on ATD treatment (initial or second modality treatment), it becomes imperative to determine the long-term efficacy, safety, costs, and burdens of this modality of treatment.
Introduction
Graves' disease (GD), the most common cause of endogenous hyperthyroidism (1,2), can cause substantial morbidity (3 –5) and loss of quality of life (6); untreated, it can be lethal (7,8). There are three treatment options for patients with GD: antithyroid drugs (ATD) to block the synthesis of thyroid hormones; radioactive iodine (RAI) to destroy the thyroid gland; and surgery to remove the thyroid gland (4,5,9). Determining which treatment modality best addresses the patient situation requires an evidence-based conversation and shared decision making (10). These conversations should make use of the best available research evidence. The available estimates of efficacy and safety, however, reflect the experience at specialty centers and are limited by referral bias, small samples, and selective publication. It is unclear whether the existing published evidence accurately reflects current GD treatment trends and outcomes in the United States.
Based on U.S. surveys to clinicians administered over the years, RAI is the dominant treatment modality (11,12). Although declining in the frequency of use, RAI still accounts for >50% of treatments. However, more recent data from commercially insured patients in the United States indicates that more than two-thirds of patients are treated with ATDs, not just as a bridge to RAI or surgery, but as definitive long-term treatment (13). Therefore, it is important to understand the distribution of treatments if we are to judge the quality of care for patients with GD, and to identify research needs.
To this end, we aimed to characterize the pattern of use and their comparative effectiveness and safety of each treatment modality for patients with GD using data from a large nationwide representative data set of privately insured individuals in the United States.
Methods
The study involves a retrospective analysis of claims data from the OptumLabs® Data Warehouse (OLDW), which includes de-identified claims data for privately insured and Medicare Advantage enrollees in a large private U.S. health plan. The database contains longitudinal health information on enrollees, representing a diverse mixture of ages, ethnicities, and geographical regions across the United States. The health plan provides comprehensive full insurance coverage for physician, hospital, and prescription drug services (14). OLDW data set contains a large population of privately insured and Medicare Advantage enrollees of all ages and races/ethnicities from all 50 states. In 2014, ∼19% of the U.S. population in commercial health plans, 19% of those in Medicare Advantage plans, and 24% of those in Medicare part D only plans were represented in the OLDW data set. Similarly, in 2014, OLDW census region coverage was mostly from south midwest (38%), midwest (24%), west (20%), and northeast (18%) states. Although a study sample derived from the OLDW is not nationally representative, per se, the distribution of patient characteristics is similar to those of the general population (15). Since this study involved analysis of pre-existing de-identified data, it was exempt from institutional review board approval.
Study population
We identified patients 18 or older for whom ATD, RAI, or surgery was their first treatment for GD between 2005 and 2013. We required patients to have at least one diagnosis code for GD in the 6 months preceding any of these treatments (International Classification of Diseases (ICD-9-CM) 242.00, 242.01). All individuals had continuous medical and pharmacy plan coverage for at least 12 months before their treatment and 24 months post-treatment. We excluded patients who received surgery, RAI or ATD before the diagnosis of GD, had a diagnosis of non-Graves' thyrotoxicosis (e.g., subacute thyroiditis) or thyroid cancer at any time in the year before their GD diagnosis (Fig. 1). To estimate effectiveness of therapy, patients were followed for two years after the initiation of one of the three therapies.

Study population, success rate, and treatment pathways. aIn adherence to OptumLabs report policies, events (<11 events) cannot be reported. ATD, antithyroid drug; RAI, radioactive iodine.
Treatment efficacy and burden
The primary outcome was the effectiveness of therapy defined as percentage of patients who did not meet criteria for treatment failure. For patients who were treated with ATD (ATD cohort) >60 days, treatment failure was defined as either (1) receiving RAI or surgery or (2) having a break of >90 days in ATD use followed by reinitiation of ATD at any time during follow-up. We did not consider ATD use for >24 months postinitial therapy as a treatment failure. ATD is usually used before RAI to reduce elevations in thyroid hormone levels after RAI. ATD is usually used before surgery to perform thyroidectomies in as near a euthyroid state as possible. Therefore, patients who received 60 days or less of ATD and who were switched to another therapy were analyzed within the definitive cohort under the label “pre” (i.e., pre-RAI). Patients in the RAI and surgical cohort were deemed to have failed therapy if they required a subsequent treatment in the two years after their initial RAI or surgical treatment (surgery, another dose of RAI or ATD). However, we did not consider as failure those patients in the RAI group that received ATD for <90 days after therapy. To assess the side effects of each treatment modality, we identified the time and the type of adverse effects of therapy by ICD-9-CM codes (Supplementary Table S1). In adherence to OptumLabs report policies, sparse adverse events (<11 events) were combined and presented as a composite outcome.
Analysis
For each treatment, we calculated descriptive statistics (mean, median, and percentages) for all relevant patient characteristics and presented proportion of patients that failed treatment. Differences between categorical variables were assessed using the χ2 test and between continuous variables using t-test. We used multivariable Cox proportional hazard models to assess predictors of treatment failure. We adjusted for patients' age, sex, race/ethnicity, geographic region, comorbidities (using Charlson comorbidity index), baseline Graves' orbitopathy, and year of treatment. The results are presented as hazard ratios (HRs) and 95% confidence intervals [CIs]. A p-value of <0.05 was considered to be statistically significant and all testing was two-sided. All analyses were carried out using SAS (version 9.4; SAS Institute, Inc.).
Sensitivity analysis
The comparative effectiveness of treatments options for patients with GD could be influenced by the length of initial ATD course and by definition of treatment failure in regard to the length of break in ATD use followed by reinitiation of ATD therapy. Therefore, we examined whether outcomes differed among patients with a length of initial ATD course of 90 and 365 days; and if the interval between subsequent ATD therapies was >120 days.
Results
We identified 4661 patients treated for GD with a mean age of 48 (±14) years (Table 1). Most were white (63%) and female (80%). Patients were initially treated with ATD 2817 (60%), RAI 1549 (33%), and surgery 295 (6%). About 12% of patients receiving ATD continued this therapy for >24 months as initial therapy (Fig. 1).
Baseline Characteristics
ATD, antithyroid drug; NR, not reported due to OptumLabs policy; RAI, radioactive iodine.
Effectiveness of therapy
Figure 1 shows that surgery was most effective (99%), followed by RAI (93%) and ATD (50%). The effectiveness of RAI was no different in patients pretreated or not with ATD (93% vs. 93%, p = 0.71). Median time to treatment failure was 6.8 months for ATD, 3.4 months for RAI, and 3.2 months for surgery (Fig. 2).

Treatment failure over time.
ATD treatment effectiveness with methimazole was no different than treatment with propylthiouracil (50% vs. 48%, p = 0.55). The median time from the initiation of ATD treatment to ATD treatment break (90-day period) was 213.94 days (standard deviation 148 days). Patients <35 years had a higher risk of failing ATD therapy than patients between 55 and 64 years (HR 0.77 [CI 0.64–0.92]) and >65 years (HR 0.78 [CI 0.63–0.98]). Black patients also had a higher risk of failing ATD than white patients (HR 1.2 [CI 1.07–1.42]). When patients failed ATD therapy, the most common second-line therapy was reinitiation of ATD (65%), followed by RAI (26%) and surgery (9%). Of the ones reinitiating ATD, 8% were eventually treated with RAI or surgery, the rest continued on ATD therapy. Women were at lower risk of failing therapy with RAI than men (HR 0.55 [CI 0.36–0.83]; Table 2). When patients failed RAI, they most often opted to repeat RAI (56%).
Multivariable Cox Regression Models on Factors Associated with Treatment Failure Among Graves' Disease Patients
Factors associated with thyroidectomy could not be modeled due to small sample of failures.
CI, 95% confidence interval; HR, hazard ratio.
Sensitivity analysis
When the population inclusion criteria of the length of initial and continuous ATD treatment was extended to 90 days and 365 days, we noticed that the percentage of treatment failure decreased from 50% (when assessed at 60 days) to 44% and 25%, respectively, without impacting the frequency of use of second-line therapies. When an ATD therapy break period was extended to 120 days, the ATD failure rate was 40% with similar distribution of second-line therapies than when ATD therapy break period was 90 days.
Adverse events
There were more adverse events with surgery (24%) compared with ATD (12%) and RAI (6%), p < 0.0001. Hypoparathyroidism (transient or chronic) was the most common complication of surgery; new onset Graves' ophthalmopathy complicated the disease course in 7% of patients receiving ATD and 6% of those receiving RAI (Table 3).
Frequency of Adverse Effects
Given to OptumLabs report policies, adverse events between 0 and 10 events could only be described as composite. This composite has been added to the total frequency of complication for each treatment modality.
Includes acute and subacute necrosis of liver, drug-induced cholestasis, and anti-neutrophil cytoplasmic antibody (ANCA)-associated vasculitis.
Includes agranulocytosis, acute and subacute necrosis of liver, hepatitis, unspecified (includes drug induced).
Includes hemorrhage, seroma, infection, vocal cord paralysis, and dysphonia.
Includes hemorrhage, seroma, infection, vocal cord paralysis, dysphonia, and hypoparathyroidism.
Discussion
In this very large-scale observational assessment of the effectiveness and safety profiles of the treatment modalities for GD among privately insured Americans, we found that in the United States, ATD has replaced RAI as the most common GD therapy as initial therapy; however, ATD failed in one out of every two patients. The shift from RAI to ATD may have occurred for the past decade.
The ATD failure rate of 50% is consistent with previous reports. Yet, we noted a variation in this failure rate based on the required initial ATD length and ATD therapy break period. When ATD therapy length was extended to one year, ATD failure rate was significantly lower (25%), which suggests that longer and continuous ATD therapy regimens may be more effective than short ones in avoiding treatment failure. This finding may support guideline recommendations for the use of ATD for at least 12–18 months before considering ATD discontinuation. However, an alternative explanation to this lower failure rate is that patients taking ATD for more than one year may be at lower failure risk than patients discontinuing ATD before a year of therapy. For instance, patients who have a low burden of disease requiring small doses of ATDs may be more likely to stay on ATD longer than patients with a high burden of disease requiring higher doses of ATD and thus a higher likelihood of suffering from ATD side effects. Furthermore, we found that ATD treatment failure was 40% when we considered a longer ATD therapy break interval (120 days as opposed to 90 days) followed by reinitiation of ATD. This longer treatment break interval assumes that during 120 days patients did not fill a prescription for ATDs, but likely are still taking them due to modification of dosing that does not require new refills. For instance, a patient who was initially taking 10 mg daily of methimazole, and was then switched to 5 mg daily, will have many more months of coverage despite not filling a new prescription.
We have also found that 1 in 10 GD patients receives ATD not as treatment to facilitate natural remission of GD (recommended length <24 months), but as ongoing therapy for GD; and that most patients who failed ATD restarted ATD and did not undergo RAI therapy or surgery. In total, from the whole GD cohort, and combining initial and second-line therapy, 26% patients received long-term therapy ATD. Although ongoing therapy with ATD is relatively new in the United States, its use is common elsewhere (16 –20) and is consistent with the 2016 guidelines from the American Thyroid Association for the treatment of hyperthyroidism (5). Guideline authors suggested ongoing ATD for patients at low risk of treatment failure who prefer to avoid RAI and surgery. Long-term continuous ATD treatment outcomes for hyperthyroidism in non-U.S. population have shown methimazole to be apparently safe (16,21,22). For instance, in Europe, long-term ATD has been favored as first-line treatment, particularly in the GO setting and as a part of a block and replace regimen (18). In terms of cost-effectiveness, this might certainly vary geographically; in Iranian patients the cost of long-term methimazole did not exceed that of RAI (21). In a study from the United Kingdom, the most cost-effective primary treatment modality for thyrotoxicosis was deemed to be RAI (23). In a U.S. study, for patients with GD that failed to achieve euthyroidism after 18 months of ATD, total thyroidectomy was more cost-effective than RAI or lifelong ATD; and this continued until the cost of total thyroidectomy became >19,300 U.S. dollars (24). Although the 2016 ATA guidelines (5) consider the possibility of long-term ATD in selected patients, they do not specify the ideal follow-up strategy; this is a current terrain of uncertainty and an issue that deserves further study.
Predictors of treatment failure in this very large cohort suggest that young patients had a higher failure rate than older individuals. This difference might be explained by the fact that younger patients with GD have more severe hyperthyroidism at baseline (25). This, and the novel finding of a higher rate of failure with ATD among African American GD patients, needs careful assessment to confirm them, and to then uncover and untangle biological and socioeconomic explanations.
We also found a higher-than-expected rate of hypoparathyroidism with thyroid surgery for GD. This may reflect a limitation of our analysis (i.e., unable to distinguish transient vs. permanent hypoparathyroidism), or selective publication of surgical complication rates or of surgical performance from referral centers in which experienced thyroid surgeons operate. Our estimate may stand for now as a good representation of the outcomes of thyroid surgery in the United States, where many of the procedures are performed by surgeons with less experience in thyroid surgery (26,27).
A similar risk of incident Graves' ophthalmopathy in patients treated with ATD (7%) and RAI (6%) was observed. Randomized trials found the risk of new Graves' ophthalmopathy to be higher with RAI than with ATD (28,29). However, a recent analysis from a large nationwide-managed care network in the United States showed that neither medical therapy with ATD alone nor in combination with RAI appeared to alter the risk for developing Graves' ophthalmopathy compared with receiving RAI alone (30). Case selection may be responsible for this difference between practice-based observational findings and randomized trial results. It is possible, for example, that clinicians are more likely to use RAI to treat patients with less severe GD and those who are at a lower risk for ophthalmopathy based on smoking status or thyroid-stimulating hormone receptor antibody titers. Along with these patient characteristics, the rate could be lowered by the concomitant administration of steroids. We could not assess these variables in this analysis, and our results call for further evaluation of the effect of treatment modalities on the incidence of ophthalmopathy.
Implication for clinical practice and research
The three available treatment modalities for GD differ greatly in their effectiveness, safety profiles, and convenience (i.e., modes of administration) (3,4). The American Thyroid Association and the American Association of Clinical Endocrinologists state that “Once the diagnosis has been made, the treating physician and patient should discuss each of the treatment options, including the logistics, benefits, expected speed of recovery, drawbacks, potential side effects, and cost” (5,31). Our findings help to facilitate this discussion as we provide estimates of the effectiveness and safety for each option based on a very large cohort of patients receiving treatment across the United States. Despite the additional information provided here, patients and clinicians will still face uncertainty about the best treatment option, which will vary depending on each patient's context and informed preferences. To identify the best treatment for each patient, the one that makes most sense, clinicians need to work through the relative differences between treatments across the issues that matter to each patient the most. This so-called shared decision-making approach can be facilitated by the use of GD Choice, a freely available and effective conversation aid (
Limitations
Our results are limited by the insufficient granularity of the data. For instance, our data set did not include laboratory parameters, so we were not able to assess thyroxine levels or thyroid-stimulating hormone receptor antibodies. Likewise, we were not able to extract smoking status, 24-hour RAI uptake results, goiter size, thyroid nodularity, or RAI activities; all factors that can contribute to the different rates of success between the treatment modalities. The lack of laboratory data impacts the accuracy of treatment failure definitions for those patients after RAI, as we could not estimate how many of these patients actually achieved hypothyroidism or euthyroidism. Furthermore, our analysis relied on administrative claims that are susceptible to differential coding and billing practices. For instance, the complications from ATD, which are mostly minor, may have been overlooked or not billed; or we could have also missed patients who were diagnosed with hyperthyroidism and never received a code for GD. The use of claims data also limited our ability to understand the reason why patients switch therapy. We assumed treatment failure for all patients on ATD who switched to RAI or surgery, but perhaps for some of those patients treatment failure was not the reason for switching, but rather a change in patient preferences. Finally, our sample size includes patients with commercial health insurance; their applicability to underinsured populations is unclear.
Conclusions
Our study of the prevalence and comparative effectiveness and safety of the three available treatment modalities for patients with GD confirms their known efficacy and safety, but documents a shift away from permanent hypothyroidism-inducing therapies toward ongoing ATD therapy. With one fourth of patients undergoing ATD treatment (initial or second modality treatment), it becomes imperative to investigate its long-term outcomes, determine appropriate follow-up strategies, measure the burden of this treatment, and determine the cost of care for patients with GD. In the interim, our estimates may facilitate shared decision making and patient-centered care for patients with GD.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This study was funded by the Mayo Clinic Robert D. and Patricia E. Kern Center for the Science of Health Care Delivery. J.P.B. is supported by the Karl-Erivan Haub Family Career Development Award in Cancer Research at Mayo Clinic in Rochester, honoring Richard F. Emslander, MD.
Supplementary Material
Supplementary Table S1
