Abstract
Background:
The prognosis of Graves' disease (GD) is reportedly related to sex, age, and genetic factors, although there is no consensus. The objective of this study was to investigate the relationship between severity and prognosis of GD and sex or age.
Methods:
Subjects were patients newly diagnosed with GD between January 2005 and June 2019, and medical records were retrospectively reviewed. Patients diagnosed between January 2009 and December 2010 and followed up for at least 12 months were enrolled. Patients were divided into nine age-stratified groups. Remission was defined as maintenance of a euthyroid state for more than one year after withdrawal of antithyroid drugs (ATDs).
Results:
Participants comprised 21,633 patients (3954 males, 17,679 females). Initial free triiodothyronine (fT3) and free thyroxine (fT4) levels significantly decreased with increasing age, including after sex stratification. fT4 was significantly higher in males than females aged 20–39 years. In 2191 patients treated with ATDs alone, median durations until remission were 37.7 and 30.6 months in males and females, respectively. Remission and recurrence were observed in 1391 patients (204 males, 1187 females) and 262 patients (37 males, 225 females), respectively. By Kaplan–Meier analyses, males required a significantly longer time to achieve remission than females (p < 0.0001), although there were no significant age-related differences (p = 0.08). Cox proportional hazard modeling showed a 41% higher hazard ratio (HR) for remission in females than males (adjusted HRs [aHR] confidence interval [CI] = 1.41 [1.21–1.64]), and each additional 10 years of age had a 14% lower rate of recurrence (age [per 10-year increase], aHR [CI] = 0.86 [0.78–0.94]); no significant relationship between recurrence rate and sex was identified.
Conclusions:
Severity of hyperthyroidism in GD was significantly higher in males in their 20s and 30s, declining with advancing age in both sexes. Females were more likely to achieve remission than males, and younger patients had a higher risk of recurrence, although recurrence was unrelated to sex.
Introduction
Graves' disease (GD) is known to more commonly affect younger women. In addition, some reports have shown age- and sex-related differences in the prognosis and severity of hyperthyroidism due to GD.
Aizawa et al. (1) showed that older patients exhibited milder hyperthyroidism, although they did not analyze disease prognosis. Several reports from different countries have investigated the prognosis of GD, with one report from England showing that male sex and younger age correlated with poorer prognosis of resolution of Graves' hyperthyroidism (2). However, Struja et al. found no such relationships (3). Other reports have shown that only younger age correlated with prognosis (4,5), and that older age and genetic factors were relevant to the prognosis (6). A review by Fatourechi suggested that these inconsistent results might be attributable to differences in iodine sufficiency between countries (7).
Since there is currently no consensus on correlations between severity and prognosis of GD and factors such as sex and age, this study aimed to investigate how sex and age correlate with the severity and prognosis of GD. Thus, we investigated patients newly diagnosed with GD between 2005 and 2019 seen at the Ito Hospital, Japan, a dedicated thyroid disease hospital that sees ∼350,000 outpatients and inpatients per year.
Materials and Methods
Subjects
All subjects were patients seen at Ito Hospital. A total of 21,633 patients newly diagnosed with GD (3954 males, 18.3%; 17,679 females, 81.7%) between January 2005 and June 2019 were enrolled in this retrospective study, and laboratory data were collected from their medical records. Since patients can choose which hospital to visit in Japan regardless of the type of health insurance, 60% of new cases were referred from other medical facilities and the remaining 40% were self-referred. Subjects were divided into nine groups by age: 4–9 years old; 10–19 years old (10s); 20–29 years old (20s); 30–39 years old (30s); 40–49 years old (40s); 50–59 years old (50s); 60–69 years old (60s); 70–79 years old (70s); and 80–92 years old (80s and older). For the prognosis analysis, we enrolled 2749 patients (505 males, 2244 females) who had been newly diagnosed with GD between January 2009 and December 2010, and who were followed up for ≥12 months. Among these 2749 patients, a total of 558 patients were treated with definitive therapy during the observation period (surgery in 90 patients, radioactive iodine therapy in 468 patients). These 558 subjects were excluded from the prognostic analysis. GD was diagnosed based on identification of both hyperthyroidism and thyrotropin (TSH) receptor antibody (TRAb) or thyroid-stimulating antibody (TSAb), or diffuse high uptake of radioactive iodine (123I) by the thyroid gland. Antithyroid drug (ATD) therapy was usually commenced with methimazole (MMI) unless the subject was pregnant or planning to get pregnant. ATDs were discontinued once the TSH value remained within the normal range for >6 months on a dose of ≤2.5 mg/day of MMI or 25 mg/day of propylthiouracil, or when the subject displayed overt hypothyroidism at these dosages. Remission of GD was defined as maintenance of a euthyroid state for more than one year after ATD withdrawal, and recurrence of GD was defined using the same diagnostic protocol as the initial diagnosis. Recurrence, thus, only applied to those patients who had previously met the criteria for being in remission. This study was approved by the Ethics Committee of Ito Hospital (approval no. 307) and written informed consent was obtained from all participants.
Laboratory studies
TSH, serum-free triiodothyronine (fT3), and serum-free thyroxine (fT4) were measured using electrochemiluminescence immunoassay kits (ECLusys TSH, ECLusys fT3, and ECLusys fT4, respectively; Roche Diagnostics, Basel, Switzerland). The reference ranges provided by the manufacturer were as follows: TSH 0.2–4.5 mIU/L, fT3 2.2–4.3 pg/mL, and fT4 0.8–1.6 ng/dL. TRAb was measured using a TRAb CT radioimmunoassay kit (normal range, <10%; Cosmic, Tokyo, Japan) until September 2008, and using an ECLusys TRAb electrochemiluminescence immunoassay kit (normal range, <2.0 IU/L; Roche Diagnostics) from October 2008 onward. A formula derived from the regression curve representing the relationship between the TRAb CT value and ECLusys TRAb value was used to compare the two values. The same devices were used for measuring TSH, fT3, and fT4 throughout the observation period, and so, the reference ranges for these values remained the same. TSAb was measured using a TSAb radioimmunoassay bioassay kit (Yamasa, Choshi, Japan). Until June 2014, the normal range of TSAb was considered to be <180%, but this was changed to <120% starting from July 2014. Thyroid volume was estimated by ultrasonography by measuring the length, width, and depth of each lobe of the thyroid gland in millimeters, and calculating the volume according to the following formula: thyroid volume = ([0.7365 × right lobe length × width × depth] + [0.7412 × left lobe length × width × depth]) −0.55 (8).
Statistical analyses
Baseline data were analyzed using JMP version 14.0 software (SAS Institute, Cary, NC). The Steel-Dwass test was used to compare initial variables between all age groups, and the Wilcoxon rank-sum test was applied to compare variables between sexes. Pearson's chi-square test was used to compare the prevalence of GD between age groups. Disease prognosis analysis was performed using the Kaplan–Meier curve method for univariate analyses and the Cox proportional hazards model for multivariate analyses. Receiver operating characteristic (ROC) curve analysis was performed to determine cutoff values. Adjusted hazard ratios (aHR) and confidence intervals [CIs] were estimated from Cox proportional hazard models. Values of p < 0.05 were defined as statistically significant.
Results
Characteristics and relationship of initial parameters between age groups
The details of the study subjects in each group are shown in Table 1. Among the 21,633 subjects, 20,816 patients (96.2%) were positive for TRAb, and the remaining 817 patients (3.8%) were TRAb negative. Of the 817 TRAb-negative patients, 57 patients (7.0%) were positive for TSAb; thus, the remaining 760 patients were diagnosed with GD using radioactive iodine testing. Since the number of subjects tested for TSAb was smaller than the number tested for TRAb, TRAb was used for the analysis. The number of GD patients was largest in the 30- to 39-year-old age group in the total cohort and for each sex. Smoking rates were 46.1% in males and 21.2% in females. In females, the prevalence of GD did not differ significantly between patients <50 years old (premenopause group) and ≥50 years old (postmenopause group) (p = 0.82). Since the number of subjects in the 4–9-year-old group and the 80s and older group was particularly small, these groups were not included in the statistical analyses.
Number of Subjects in Each Age Group
Median values and ranges of fT3, fT4, TRAb, TSAb, and thyroid volume in each age group are shown in Table 2. Initial fT3 and fT4 values declined significantly with age (Fig. 1). A similar trend was seen in median TRAb value, showing a significant decrease with advancing age except for between the 30s and 40s groups, and between the 60s and 70s groups (Fig. 1). Median thyroid volume in the total cohort also tended to show a gradual decrease with advancing age, but the differences between groups were not statistically significant (Fig. 1).

Median values of initial fT3, fT4, TRAb, and thyroid volume in all the subjects. Initial fT3 and fT4 values declined significantly with advancing age, and TRAb showed gradual declines, with significant differences between most age groups (p < 0.05). Thyroid volume tended to show gradual decrease with advancing age, although differences between age groups were not significant. fT3, free triiodothyronine; fT4, free thyroxine; TRAb, thyrotropin receptor antibody.
Median Value and Ranges of fT3, fT4, TRAb, TSAb, and Thyroid Volume in Each Age Group
TRAb, thyrotropin receptor antibody; TSAb, thyroid-stimulating antibody.
When the same analyses were performed on median values of fT3 and fT4 by sex, the same trends seen in the total subject cohort were seen in females, showing a significant decline with advancing age for all age groups (Fig. 2). In male subjects, although fT3 and fT4 values declined with advancing age, significant differences were not seen between the 10s and 20s, 20s and 30s, 50s and 60s, and 60s and 70s groups for either hormone, and between the 10s and 30s for fT4 (Fig. 2). The median value of TRAb decreased with advancing age in female subjects, although significant differences were not seen between the 30s and 40s, 50s and 70s, and 60s and 70s groups. In male subjects, a gradual decrease in TRAb was seen with advancing age, although no significant differences were apparent between the 10s and 20s, 10s and 30s, 20s and 30s, 30s and 40s, 50s and 60s, 50s and 70s, and 60s and 70s groups (Fig. 2). Similar to the trend in the entire cohort, thyroid volume tended to gradually decrease with advancing age in females, although the differences between groups were not significant. Median thyroid volume showed a significant decrease with advancing age in males ≥50 years old, although the same trend was not seen in the patients <50 years old (Fig. 2).

Sex-related comparison of median values of the initial parameters. In females, fT3 and fT4 showed significant declines with advancing age, although there were no age-related differences in males between any of the age groups. Median values of fT3 were significantly higher in males than in females in the 20s, 30s, and 40s groups, and fT4 value was significantly higher in males than in females in the 20s and 30s groups. TRAb showed a gradual decline with advancing age, although the difference was not significant in either sex. Thyroid volume showed a significant decrease with advancing age in male subjects ≥50 years old, and was significantly larger in males than in females at all ages.
Comparing initial parameters by age and sex
When initial parameters were compared by sex in each age group (Table 3), median fT3 was significantly higher in males than in females in the 20s to 40s groups (20s, p = 0.001; 30s, p < 0.0001; 40s, p = 0.014). Similarly, fT4 was significantly higher in males than in females in the 20s and 30s groups (20s, p = 0.007; 30s, p = 0.0001). Thyroid volume was significantly larger in males than in females at all ages, although no significant difference in TRAb value was seen between sexes (Fig. 2).
Median Value and Ranges of fT3, fT4, TRAb, and Thyroid Volume in Each Age Group by Sex
Significant differences between sexes.
Prognostic analysis by sex and age
Among the 2749 patients (505 males, 2244 females) newly diagnosed with GD between 2009 and 2010, a final total of 2191 patients (389 males, 17.8%; 1802 females, 82.2%) who had been treated with ATDs alone were analyzed. Significant differences were identified in age (p < 0.0001), fT3 (p < 0.0001), fT4 (p < 0.0001), and TRAb values (p < 0.0001) between the 2191 patients in the ATD group and the 558 patients in the definitive treatment group, but there were no significant differences in sex (p = 0.09). Patients who received definitive therapies were significantly younger and displayed higher fT3, fT4, and TRAb values. These differences explain why the 558 patients had definitive therapies with surgery or radioactive iodine therapy.
The median observation period was 104 months (range, 12–132 months). Remission was achieved in 1391 patients (63.5%), including 204 males and 1187 females, while 262 among the 1391 patients (18.8%) (37 males, 225 females) developed recurrence. Median interval until remission was 31.1 months (range, 3.0–121.4 months) in the total cohort, 37.7 months (range, 5.2–121.2 months) in males, and 30.6 months (range, 2.6–120.1 months) in females. Significant differences in the time to remission were seen between subjects with and without recurrence (p = 0.003). Median time to remission was 28.9 months (range, 3.8–115.1 months) in patients with recurrence and 32.0 months (range, 2.6–121.4 months) in patients without recurrence. A cutoff value for time to remission of 45.1 months was calculated from ROC curve analysis, but the area under the ROC curve (AUC) was only 0.55. Kaplan–Meier curve analysis showed that the duration required for remission was significantly longer in males than in females (p < 0.0001) (Fig. 3). However, no significant age-related differences in the duration required for remission were seen (p = 0.08) (Fig. 3), and no significant differences in recurrence rates were identified between sexes (p = 0.78). Moreover, on multivariate Cox proportional hazard analysis, females had a 41% higher HR of remission than males (aHR [CI] = 1.41 [1.21–1.64]; Table 4). Likewise, nonsmokers and patients with lower initial TRAb levels were more likely to achieve remission than smokers and those with higher initial TRAb levels, respectively (nonsmokers vs. smokers, aHR [CI] = 1.58 [1.38–1.81]; per 1-U increase in TRAb: aHR [CI] = 0.98 [0.97–0.98]). We also found that each 10 additional years of age was associated with a 14% lower rate of recurrence (age [per 10-year increase], aHR [CI] = 0.86 [0.78–0.94]; Table 4), and that patients with lower initial TRAb levels showed a higher risk of recurrence (TRAb [per 1-U increase], aHR [CI] = 0.97 [0.95–0.98]; Table 4). There was no correlation between smoking and recurrence.

Comparison of observation periods required until GD remission using Kaplan–Meier curve analysis. Remission of GD was defined as remaining euthyroid for more than one year after ATD withdrawal. We started counting the onset of remission from the day the patient stopped taking ATD. Patients remaining euthyroid for more than year were regarded as being in remission. Some patients were able to withdraw ATD within a year after starting medication. (
Cox Proportional Hazard Model Showing the Relationship Between Initial Parameters and Graves' Disease Remission and Recurrence
HR describes the relative risk of achieving remission or recurrence when continuous variables increase by 1 unit, or compared to the opposite condition (e.g., male vs. female, smoker vs. nonsmoker). In remission analysis, the aHR is described with remission as the event of interest.
Compared to male sex.
Compared to smoker.
CI, confidence interval; aHR, adjusted hazard ratio.
Discussion
Patients with GD who have higher TRAb values, higher thyroid hormone levels, and larger goiters are regarded as having more severe GD and are thought to be likely to experience difficulty achieving remission. The present study showed that thyroid hormone levels, as well as TRAb values, declined significantly with advancing age in patients with GD. Aizawa et al. (1) showed that T4 was significantly higher between the 10s and 50s, and T3 was significantly higher between the 10s and 60s. In their study, significance was only shown between these two pairs of groups, and they concluded that hyperthyroidism was milder in older patients with GD. The difference between their results and our results might be because of a difference in the number of subjects. Moreover, Aizawa et al. suggested that the cause of lower thyroid hormone levels in the elderly was due to reduced responsiveness of thyrocytes to stimulation by TRAb. Our study showed that TRAb values decreased with advancing age, and we suggest that the lower severity of GD in older patients might be attributable to the declining potential for antibody production with aging. Although the severity of hyperthyroidism was lower with advancing age, the prognosis of GD was unaffected by aging. We assume this could be because disease severity after starting treatment might be affected by environmental factors such as smoking, comorbidities, and other stressors.
Our study revealed that male GD patients required longer to achieve remission than female GD patients, and females exhibited a 41% higher HR for achieving remission, although no significant differences in remission were evident between ages. According to Da Silva (9), among patients with autoimmune diseases, estrogen increases the activity of B cell lymphocytes and enhances autoantibody-producing potential, resulting in deterioration of the severity of such diseases. Moreover, Chailurkit et al. (10) reported a correlation between estrogen levels and TRAb positivity in males, and stated that higher estrogen levels might be a cause of GD. The differences in severity of hyperthyroidism between sexes shown in the present study could be due to sex hormones, including estrogen. However, since no significant difference in disease frequency was seen between females before and after menopause, this association being due to sex hormones could not be confirmed in the present study.
Several studies have attempted to identify or predict prognostic factors for GD, but have yielded conflicting results (2 –6,11). A review article by Fatourechi (7) concluded that these differences were due to variations in iodine sufficiency. Causes of hyperthyroidism are known to differ depending on iodine sufficiency (12,13), and Japan is an iodine-sufficient country in which causes of hyperthyroidism are mostly due to GD. Our prognostic results that male patients showed lower remission rates than female patients are consistent with a report from England (2), another iodine-sufficient country. We agree with Fatourechi that iodine sufficiency might represent one of the causes of inconsistencies in GD prognosis reported from different countries. Moreover, the different results shown in countries with similar iodine intakes could be due to differences in the number of patients, or differences in ethnicity. Furthermore, male patients in our study had a higher incidence of smoking (males, 46.1%; females, 21.2%). In Japan, the latest report from the Japanese Ministry of Health, Labour and Welfare showed smoking rates of 29.0% in males and 8.1% in females. Smoking rates were higher in patients with GD in both sexes, suggesting that smoking habits may contribute to both the etiology and lower remission rates of GD.
The limitations of this study include the fact that it was a retrospective study performed in a single hospital in an Asian country, which would present various sources of bias in the study. We also did not check for genetic polymorphisms.
In conclusion, severity of hyperthyroidism in GD was significantly higher in males in their 20s and 30s, declining with advancing age in both sexes. Females were more likely to achieve remission than males, and younger patients had a higher risk of recurrence, although recurrence was unrelated to sex.
Footnotes
Acknowledgment
We appreciate the contribution of Dr. Kosuke Inoue to the statistical analyses.
Authors' Contributions
Data curation: N.S., R.Y., K.M., A.K., A.S., T.M., A.H., M.F., and M.M. Critical revision of the article: J.Y.N., A.Y., and N.W. Supervision: K.S. and K.I. All authors read and approved this article.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
