Abstract
Background:
The treatment of hyperthyroidism is aimed at improving health-related quality of life (HRQoL) and reducing morbidity and mortality. However, few studies have used validated questionnaires to assess HRQoL prospectively in such patients. The purpose of this study was to assess the impact of hyperthyroidism and its treatment on HRQoL using validated disease-specific and generic questionnaires.
Methods:
This prospective cohort study enrolled 88 patients with Graves' hyperthyroidism and 68 with toxic nodular goiter from endocrine outpatient clinics at two Danish university hospitals. The patients were treated with antithyroid drugs, radioactive iodine, or surgery. Disease-specific and generic HRQoL were assessed using the thyroid-related patient-reported outcome (ThyPRO) and the Medical Outcomes Study 36-item Short Form (SF-36), respectively, evaluated at baseline and six-month follow-up. The scores were compared with those from two general population samples who completed ThyPRO (n = 739) and SF-36 (n = 6638).
Results:
Baseline scores for patients with Graves' hyperthyroidism and toxic nodular goiter were significantly worse than those for the general population scores on all comparable ThyPRO scales and all SF-36 scales and component summaries. ThyPRO scores improved significantly with treatment on all scales in Graves' hyperthyroidism and four scales in toxic nodular goiter, while SF-36 scores improved on five scales and both component summaries in Graves' hyperthyroidism and only one scale in toxic nodular goiter. In Graves' hyperthyroidism, large treatment effects were observed on three ThyPRO scales (Hyperthyroid Symptoms, Tiredness, Overall HRQoL) and moderate effects on three scales (Anxiety, Emotional Susceptibility, Impaired Daily Life), while moderate effects were seen in two ThyPRO scales in toxic nodular goiter (Anxiety, Overall HRQoL). However, significant disease-specific and generic HRQoL deficits persisted on multiple domains across both patient groups.
Conclusions:
Graves' hyperthyroidism and toxic nodular goiter cause severe disease-specific and generic HRQoL impairments, and HRQoL deficits persist in both patient groups six months after treatment. These data have the potential to improve communication between physicians and patients by offering realistic estimates of expected HRQoL impairments and treatment effects. Future studies should identify risk factors for persistent HRQoL deficits, compare HRQoL effects of the various therapies, and thereby aid in determining the optimal treatment strategies.
Introduction
G
Much is known about the clinical features of hyperthyroidism and diagnostic strategies, and many studies have evaluated the efficacy of the therapies in terms of restoring euthyroidism (4,5). Further, it has been demonstrated that hyperthyroidism is associated with increased somatic (6 –9) and psychiatric (10) morbidity, work disability and higher sickness absence (11), and, ultimately, excess mortality (12,13). In contrast, much less is known about the impact of hyperthyroidism and its treatment on health-related quality of life (HRQoL) (14).
HRQoL assessments are increasingly used to evaluate treatment effects in clinical studies and practice (15,16), and indeed, both disease-specific (17) and generic (18) questionnaires are available for quantifying HRQoL in patients with hyperthyroidism. The thyroid-related patient-reported outcome (ThyPRO) is a HRQoL instrument that can be used across all benign thyroid diseases, and it is the only thyroid-related HRQoL instrument that has been validated in patients with hyperthyroidism (17,19 –22). The Medical Outcomes Study 36-item Short Form (SF-36) is the most extensively validated and applied survey that measures generic HRQoL (18). Despite the availability of these instruments and despite treatment for hyperthyroidism being aimed to improve HRQoL, very few studies have systematically assessed disease-specific and generic HRQoL in patients with hyperthyroidism (14). Thus, further investigation of HRQoL in patients with this condition is needed to evaluate the impact of disease and treatment.
The purpose of this prospective cohort study was to evaluate thyroid-related and generic HRQoL, using ThyPRO and SF-36, respectively, before and six months after treatment of hyperthyroidism, and to compare the results with normative data from the general population.
Materials and Methods
Patient samples and biochemical measurements
From October 2008 to May 2012, patients treated for GD and TNG were prospectively recruited from the endocrine outpatient clinics at two Danish university hospitals: Copenhagen University Hospital Rigshospitalet and Odense University Hospital. Serum TSH (reference range 0.35–4.0 mIU/L), serum total thyroxine (T4; reference range 70–140 nmol/L), serum total triiodothyronine (T3; reference range 1.4–2.8 nmol/L), and serum TSH receptor antibodies (TRAb; reference range <1.0 IU/L) were analyzed using the electrochemiluminescence immunoassay method (Roche Cobas) at Rigshospitalet. Serum TSH (reference range 0.3–4.0 mIU/L), serum total T4 (reference range 60–130 nmol/L), and serum total T3 (reference range 1.3–2.4 nmol/L) were analyzed with the chemiluminescent microparticle immunoassay method (Abbott Architect), and TRAb (reference range <0.7 IU/L) with TRAK human RIA (Brahms) at Odense University Hospital. Patients were eligible for inclusion if they were aged ≥18 years, if they were able to complete paper-and-pencil questionnaires in Danish, if they had serum TSH levels <0.1 mIU/L, and if they were expected to undergo treatment with antithyroid drugs, thyroidectomy, or radioiodine. Patients with GD had to have TRAb >1.0 IU/L. Exclusion criteria were pregnancy, cancer, and other major comorbidities suspected to impact HRQoL substantially (e.g., congestive heart failure). Further, patients with GD diagnosed with Graves' orbitopathy by either endocrinologists or ophthalmologists were not included in this study. Finally, patients were excluded if no treatment was prescribed. Patients who were pregnant, had cancer, or had other major comorbidities were not invited to participate. However, they were not registered in our study database, and therefore data for these patients are not reported. A subset of the data has previously been used to evaluate responsiveness of ThyPRO (21). Paper surveys, collected by mail, were completed prior to and six months after treatment. Eligible patients received a booklet containing the ThyPRO and SF-36 v2 questionnaires, as well as questions about sociodemographic factors, comorbidity, and non-thyroid medication.* A reminder was sent to non-responders after two weeks. Clinical data, including specific diagnosis, previous and current treatment of thyroid disease, and biochemical measurements, were obtained by chart review. Measurement of thyroid volume by ultrasound was not assessed systematically and hence was not included in this study.
General population samples
ThyPRO data from the Danish general population were gathered from a random sample of 1200 adults, stratified by age and sex, as previously described (23). The sex distribution (80% women) was chosen to resemble that of patients with benign thyroid disease. They received an invitation describing the purpose of the study, a questionnaire, and a stamped addressed response envelope. The questionnaire included items from the nine scales of the ThyPRO survey, which do not make specific attribution to thyroid disease. The survey also included questions about sociodemographic factors, comorbidity, and medication. Respondents with cancer (self-reported) were excluded.
SF-36 data from the Danish general population were derived from the Danish Health Interview Survey in 2005 (24), which included the SF-36 v1 Health Survey. Denmark is divided into five administrative regions. This study only includes general population data for respondents >18 years of age living in the three regions from which patients with GD and TNG were recruited. Furthermore, respondents with cancer (self-reported) were excluded.
Patient-reported outcomes
The ThyPRO survey assesses HRQoL in patients with benign thyroid disease, and has been validated for use in patients with goiter, hyperthyroidism, hypothyroidism, and Graves' orbitopathy (17,19 –22). ThyPRO consists of 85 items assessing physical, mental, and social domains of functioning and well-being, summarized in 13 multi-item scales and one overall HRQoL-impact item. Each scale ranges from 0 to 100, with higher scores indicating worse health status.
The SF-36 survey was constructed to assess health outcomes from a wide range of interventions on a common HRQoL instrument (18). It assesses generic HRQoL with 36 items summarized in eight scales. The scale scores can further be aggregated into two component summary scores: physical and mental. In this study, SF-36 scores were standardized using norm-based scoring to facilitate the comparison between SF-36 v1 and v2. With norm-based scoring, mean and standard deviation (SD) is standardized to 50 and 10, respectively, in the general U.S. population (25). Higher scores indicate better health status.
Statistical analysis
Changes between baseline and six-month follow-up were analyzed with the paired t-test, and magnitude of changes by effect sizes (mean difference/SD baseline). In accordance with Cohen, an effect size of 0.2–0.5 is defined as small, while 0.5–0.8 and >0.8 are defined as moderate and large effects, respectively (26). Differences between patients and the general population samples were analyzed with multiple linear regression analysis adjusted for age, sex, comorbidity, and educational status. Magnitude of differences was evaluated by effect sizes (mean difference/SD pooled) (27). Differences in biochemical variables between GD and TNG were analyzed with the Wilcoxon–Mann–Whitney test, and differences in HRQoL with multiple linear regression analysis adjusted for age, sex, comorbidity, and educational status. Multiple comparisons of HRQoL data were adjusted by Hochberg correction (28). A p-value of <0.05, adjusted for multiple testing, was considered significant. All analyses were performed using SAS v9.3 (29).
Ethical considerations
According to Danish law, HRQoL studies do not require and thus cannot obtain approval by ethical committees. A completed, returned survey is regarded as consent. The study was approved by the Danish Data Protection Agency and conducted in accordance with the Declarations of Helsinki.
Results
Patient samples and biochemical measurements
A total of 102 eligible patients with GD were invited, 88 of whom completed the baseline survey (response rate 86%); 66 patients completed the six-month follow-up (follow-up response rate 65%). The median age was 47 years, and 84% of patients with GD were women. Thirty-two patients with GD were enrolled at Rigshospitalet, and 56 were enrolled at Odense University Hospital. Eighty-two eligible patients with TNG were invited, of whom 68 completed the baseline survey (response rate 83%), and 53 patients completed the six-month follow-up (follow-up response rate 65%). The median age for patients with TNG was 62 years, and 78% were women. Nineteen patients with TNG were enrolled at Rigshospitalet, and 49 were enrolled at Odense University Hospital. Characteristics for the patient samples are shown in Table 1, and biochemical measurements are shown in Table 2. Patients with GD and TNG were excluded from the study for the following reasons: TSH ≥0.1 mIU/L (n = 64); Graves' orbitopathy (n = 32); no treatment prescribed (n = 10); pregnancy after study enrollment (n = 1); or treatment prior to completion of baseline survey (n = 2). The group of patients with TSH ≥0.1 mIU/L consisted predominantly of patients with TNG treated with antithyroid drugs referred to treatment with radioiodine therapy or surgery.
Information regarding treatment modality is provided for the patient samples.
Combined school and professional education classified in accordance with the International Standard Classification of Education (
Includes asthma, diabetes, ischaemic heart disease, stroke, chronic obstructive pulmonary disease, osteoarthritis, gastric/duodenal ulcer, anxiety and depression, other psychiatric diseases, and chronic back pain and other conditions of the back.
ThyPRO, thyroid-related patient-reported outcome; SF-36, Medical Outcomes Study 36-Item Short Form.
All values are given as medians (interquartile range).
Median significantly (p < 0.05) different from the baseline median in toxic nodular goiter as evaluated with the Wilcoxon–Mann–Whitney test.
Local reference intervals at Rigshospitalet and Odense University Hospital were used to determine thyroid function.
TSH, serum thyrotropin (mIU/L); total T4, serum total thyroxine (nmol/L); total T3, serum total triiodothyronine (nmol/L); TRAb, serum TSH receptor antibodies (IU/L).
General population samples
A total of 754 Danish citizens returned a completed ThyPRO survey (response rate 63%). Fifteen respondents with self-reported cancer were excluded. Data for the remaining 739 respondents (median age 50 years; 81% female) were used in the analyses (Table 1). A detailed description of the ThyPRO general population sample has been presented previously (23).
Of the 11,238 respondents who completed the SF-36 Health Survey, 6713 were >18 years of age and living in the three relevant regions. The 75 respondents with a self-reported cancer diagnosis were excluded, and SF-36 data for the remaining 6638 respondents were included in the analyses. The median age was 50 years, and 54% were female (Table 1).
Disease-specific health status
The mean ThyPRO scale scores for patients with GD and TNG at baseline and six-month follow-up are compared with scores from the general population in Table 3. Further, changes between baseline and six-month follow-up for patients completing the ThyPRO questionnaire at both assessments are shown in Table 3, and the ThyPRO baseline data are graphically presented in a radar plot in Figure 1.

Radar plot showing baseline thyroid-related patient-reported outcome (ThyPRO) scale scores for patients with Graves' disease and toxic nodular goiter, as well as scores from the general reference population. Each scale ranges from 0 to 100, with higher scores indicating worse health status. *Items in these ThyPRO scales are asked with attribution to thyroid disease and cannot be answered by respondents from the general population. GD, Graves' disease; TNG, toxic nodular goiter.
ThyPRO scale scores range from 0 to 100, with higher scores indicating worse HRQoL. Differences between patients and the general reference population were estimated with multiple linear regression analysis, adjusting for age, sex, comorbidity, and educational status, and magnitudes of differences were evaluated by effect sizes (mean difference/SD pooled). Changes between baseline and six-month follow-up for patients completing the questionnaire at both assessments were analyzed with the paired t-test, and magnitudes of changes by effect sizes (mean difference/SD baseline). Statistically significant differences (p < 0.05 with Hochberg correction) are in bold. Items in five ThyPRO scales (Impaired Social Life, Impaired Daily Life, Impaired Sex Life, Cosmetic Complaints, and Overall Quality of Life) are asked with attribution to thyroid disease and cannot be answered by respondents from the general population.
Small effect (0.2–0.5).
Moderate effect (0.5–0.8).
Large effect (>0.8).
Statistically significant difference between Graves' disease and toxic nodular goiter (evaluated with multiple linear regression analysis, adjusting for age, sex, comorbidity, and educational status).
SD, standard deviation; QoL, quality of life.
The baseline ThyPRO scale scores for patients with GD were significantly worse than those for the general population scores on all comparable scales, and all differences were large (effect size ≥0.8). Significant improvements were found on 10 scales, and changes were large (effect size ≥0.8) for three scales (Hyperthyroid Symptoms, Tiredness, and Overall Quality of Life) and moderate (effect size 0.5–0.8) for three scales (Anxiety, Emotional Susceptibility, and Impaired Daily Life). Six months after treatment, ThyPRO scale scores remained significantly worse in patients with GD compared with those for the general population on all comparable scales, except Anxiety. The largest differences at follow-up were observed in the Eye Symptoms scale (large effect size) and in the Hyperthyroid Symptoms scale (moderate effect size). At baseline, ThyPRO scores in GD were significantly worse than those observed in TNG in eight scales. At follow-up, patients with GD only had poorer scores than patients with TNG on the Cosmetic Complaints scale (Table 3).
At baseline, patients with TNG had significantly worse HRQoL on all comparable scales compared with the general population. The differences were large on three scales (Goiter Symptoms, Hyperthyroid Symptoms, and Anxiety) and moderate differences were found on four scales (Eye Symptoms, Tiredness, Depressivity, and Emotional Susceptibility). Six months after treatment, ThyPRO scores had significantly improved on four scales, moderately on Anxiety and Overall Quality of Life, and with small changes on Goiter Symptoms and Hyperthyroid Symptoms. Six months after treatment initiation, patients with TNG still had significantly worse scores on three scales compared with the general population sample. The persistent impairments were moderate on the Goiter Symptoms and Hyperthyroid Symptoms scales and small on Tiredness.
Generic health status
Mean SF-36 scale scores for patients with GD and TNG at baseline and six-month follow-up are summarized in Table 4, together with scores from the general population sample. Changes in scores for patients completing the SF-36 questionnaire at both assessments are also shown in Table 4, and the SF-36 baseline data are graphically presented in a radar plot in Figure 2.

Radar plot showing baseline Medical Outcomes Study 36-item Short Form scale scores for patients with GD and TNG, as well as scores from the general reference population. Higher scores indicate better health status. PCS, physical component summary; MCS, mental component summary.
Higher SF-36 scale scores indicate better HRQoL. Differences between patients and the general reference population were estimated with multiple linear regression analysis, adjusting for age, sex, comorbidity, and educational status, and magnitudes of differences were evaluated by effect sizes (mean difference/SD pooled). Changes between baseline and six-month follow-up for patients completing the questionnaire at both assessments were analyzed with the paired t-test, and magnitudes of differences by effect sizes (mean difference/SD baseline). Statistically significant differences (p < 0.05 with Hochberg correction) are in bold.
Small effect (0.2–0.5).
Moderate effect (0.5–0.8).
Large effect (>0.8).
Statistically significant difference between Graves' disease and toxic nodular goiter (evaluated with multiple linear regression analysis, adjusting for age, sex, comorbidity, and educational status).
HRQoL, health-related quality of life.
Baseline SF-36 scale scores for patients with GD were significantly worse than those of the general population on all eight SF-36 scales as well as on both summary components. Differences were large for six scales and the Mental Component Summary, moderate for the General Health scale and the Physical Component Summary, and small for Bodily Pain. Significant improvements were observed on five scales and both component summaries; a large change was observed on the Vitality scale, while moderate changes were observed on three scales (Physical Functioning, Role-Physical, and Mental Health) and the Mental Component Summary. Six months after treatment, SF-36 scale scores remained significantly worse in patients with GD than for those in the general population sample on six of eight scales and the Mental Component Summary. Moderate differences at follow-up were observed on the Vitality, Role-Emotional, and Mental Health scales, and on the Mental Component Summary. Patients with GD had significantly poorer scores than patients with TNG on the Vitality and Mental Health scales at baseline, while no significant differences were observed at the six-month follow-up (Table 4).
Patients with TNG scored significantly worse than the general population sample did on all scales and both component summaries. Differences were large on three scales (Vitality, Role-Emotional, and Mental Health) and the Mental Component Summary, and moderate on the Physical Functioning, Role-Physical, General Health, and Social Functioning scales. At six-month follow-up, a significant improvement in SF-36 was observed in only one scale (Bodily Pain), and patients with TNG still had significantly poorer SF-36 scale scores than the general reference sample did on six of eight scales and the Mental Component Summary. These differences were moderate for Physical Functioning, Vitality, Role-Emotional, Mental Health, and the Mental Component Summary.
Discussion
Baseline HRQoL was significantly poorer in patients with hyperthyroidism on all disease-specific and generic health domains compared with general population reference samples, and HRQoL improved markedly after six months of treatment for GD while improving more modestly in patients with TNG. Six months after treatment initiation, HRQoL impairments persisted on a wide range of health domains in both patient groups compared with the general population samples.
Many similarities were observed in the HRQoL patterns in GD and TNG. Pronounced differences between patients and the general population were found on the Goiter Symptoms, Hyperthyroid Symptoms, and Tiredness scales. Both patient groups had poor scores on Anxiety at baseline, which may be explained partly by the effect of hyperthyroidism on mental functioning and partly by concerns regarding prognosis, potential side effects, and complications to treatment. An unexpected deficit in TNG was observed on the Eye Symptoms scale, as these patients are not expected to have orbitopathy. Whether this deficit is directly related to thyroid disease or whether it is a nonspecific effect of being diagnosed with a medical condition remains unclear.
Patients with GD had more severe HRQoL impairments than patients with TNG did, which may be related to a more severe biochemical hyperthyroidism at baseline. It is well-known that Graves' orbitopathy substantially impacts HRQoL (30,31). Therefore, patients with GD diagnosed with Graves' orbitopathy were not included in this study, since the aim was to assess the impact of hyperthyroidism, not orbitopathy, on HRQoL. However, some of the patients with GD may have had mild undiagnosed orbitopathy, which could contribute to the greater HRQoL deficits observed in GD. A third explanation could be that TNG develops gradually over years, and the patients often deny having symptoms at all due to their mental and physical adaptation. In contrast, GD usually develops over a shorter time. Symptoms, therefore, may be more apparent for the patients with GD.
Normative HRQoL scores facilitate the clinical interpretation of patient scores (32), and this study is the first to include normative ThyPRO data to evaluate HRQoL in hyperthyroidism. Although significant treatment effects were observed, the normative data show that HRQoL impairments persist on a wide range of scales. The persistent health problems after treatment are probably caused by multiple factors, but may partly be explained by the fact that some patients still had thyroid dysfunction at the six-month follow-up. Further, being diagnosed with a disease may change patients' perceptions of their health and affect mental well-being.
A Serbian study applied ThyPRO in patients with hyperthyroidism before and after surgery, and found, in accordance with the present study, that GD had a greater HRQoL impact than TNG did (33). The observed treatment effects were more pronounced than those demonstrated in the present study were. An Italian cross-sectional study found patients to be less affected, as measured with the SF-36, compared with the present study. However, significant differences between patients and the population norm were found on four SF-36 scales (34). A Danish study used the SF-36 before treatment, after reaching euthyroidism, and one year after initiation of treatment (35,36). The authors concluded that untreated GD resulted in impairments on a wide range of health domains, and despite significant treatment effects, a considerable proportion of patients had persistent HRQOL impairments one year after treatment initiation. Again, these results are in line with the current findings. Two studies examined HRQoL in patients with GD many years after treatment, and both demonstrated long-term HRQoL deficits (37,38).
The present findings have clinical relevance for a number of reasons. First, the data may better inform physicians and patients about the impact of hyperthyroidism and its treatment on HRQoL. Second, the data may improve communication between physicians and patients by offering realistic estimates of expected HRQoL impairments and treatment effects. High levels of anxiety were observed in both patient groups, which implies that clinicians should carefully address the anxiety and distress associated with hyperthyroidism and its treatment. Third, the persistent HRQoL deficits six months after treatment underpin the need for further research to determine the optimal treatment strategies.
The strengths of this study include its longitudinal design, the use of validated surveys, the use of normative data from general population samples, and the availability of clinical data, in contrast to only self-reported data. However, more assessment points, longer follow-up, and larger sample sizes would allow improved insights. The study lacked power to assess potential differences in treatment efficacy by treatment modality, and nor was there power to examine the relation between normalization, or not, of thyroid function, and the effect on HRQoL. Finally, thyroid function in the general population samples was not known.
Treatment of hyperthyroidism is founded not only on improving HRQoL but just as importantly on reducing the excess morbidity and mortality associated with hyperthyroidism (6 –8,12,13,39). While there is paramount evidence for the effect of treatment on biochemical variables, much less is known about the HRQoL effects of the various treatment modalities. Indeed, more studies of the relationship between clinical variables, treatment modalities, and HRQoL are needed to improve the quality of patient-centered medicine and to guide future clinical practice.
Conclusions
Patients with hyperthyroidism experience severe HRQoL impairments, particularly due to tiredness, anxiety, and physical symptoms related to goiter and hyperthyroidism, compared with general population samples. GD has a more severe HRQoL impact on a wider range of health domains than TNG does. Despite significant treatment effects, patients with GD and TNG have persistent disease-specific and generic HRQoL impairments six months after treatment.
Footnotes
Acknowledgments
This study was supported by grants from the Danish Agency for Science, Technology, and Innovation: Council for Strategic Research and Council for Independent Research, and Agnes & Knut Mørks Foundation. Laszlo Hegedüs and Ulla Feldt-Rasmussen are supported by unrestricted grants from the Novo Nordisk Foundation. We thank the staff and colleagues at the Department of Endocrinology at Rigshospitalet and Odense University Hospital. Special thanks to our research assistant, Sofie Larsen Rasmussen, for her assistance with collection of paper surveys and clinical data entry. Finally, we would like to thank the patients who participated in this study, and the people who served as a normal reference population. Grants: Rigshospitalet (E-22379-02).
Author Disclosure Statement
The authors have nothing to disclose.
