Abstract
Background:
While health-related quality of life (HRQoL) issues often prompt treatment of benign nontoxic goiter (NTG), few clinical studies have systematically assessed HRQoL in patients with this condition. The purpose of the present study was to evaluate thyroid-related and generic HRQoL in patients with benign NTG, as compared to the general population, before and six months after treatment.
Methods:
Thyroid-related and generic HRQoL were assessed with Thyroid Patient-Reported Outcome (ThyPRO) and Medical Outcomes Study 36-item Short Form (SF-36), respectively. Baseline and six-month post-treatment HRQoL assessments were obtained from 111 patients with NTG who underwent radioiodine therapy (32%), hemithyroidectomy (53%), total thyroidectomy (12%), or cyst aspiration with ethanol sclerotherapy (4%). Euthyroid patients were enrolled at baseline, 80% of whom remained euthyroid six months post-treatment, with 20% experiencing subclinical thyroid dysfunction. Normative ThyPRO (n=739) and SF-36 (n=6638) data were collected from representative general population samples. Score differences between patients and the general population were analyzed with multivariate linear regression analysis, adjusting for age, sex, comorbidity, and educational status. Changes in scores between baseline and follow-up were analyzed with the paired t-test, and magnitudes of score changes were evaluated as effect sizes (mean difference/SD baseline; 0.2–0.5 indicating small, 0.5–0.8 moderate, and >0.8 large effects).
Results:
Patients' baseline scores were significantly worse than those in the general population on 9 of the 13 ThyPRO scales. Six months after treatment, the patients' ThyPRO scores had improved on six scales, with large/moderate effects on the Goiter Symptoms and Anxiety scales. However, on eight scales, the post-treatment patient scores were still significantly worse than the general population scores. At baseline, patients had worse scores than the general population on four of the eight SF-36 scales and the SF-36 Mental Component Summary, none of which improved after treatment.
Conclusions:
Compared with the general population, patients with NTG had greatest HRQoL impairment at baseline on the Goiter Symptoms and Anxiety scales, which also demonstrated the largest post-treatment improvements. However, both disease-specific and generic HRQoL deficits persisted six months after treatment. In order to improve individualized care, future studies should focus on identifying risk factors for persistent HRQoL deficits and compare HRQoL effects of the various goiter treatment modalities in relation to thyroid phenotype.
Introduction
T
HRQoL assessments are increasingly used to evaluate treatment effects in clinical studies and practice (9,10), and HRQoL outcomes are increasingly used in drug development, as these data can be part of the evidence submitted for drug approval and included in drug labeling claims (11). Here, a comprehensive definition of HRQoL was adopted, including physical symptoms, in accordance with a current theoretical HRQoL framework. HRQoL can thus be defined as the subjective assessment of the impact of disease and treatment across the physical, psychological, social, and somatic domains of functioning and well-being (12). Instruments that measure HRQoL can be divided into two categories: disease-specific and generic questionnaires. Disease-specific questionnaires focus on specific symptoms and functioning associated with particular conditions, whereas generic questionnaires are intended for quantifying the impact of disease on broader functioning and well-being and thus allow for comparisons across disease groups. Generally, it is recommended to combine disease-specific and generic questionnaires when measuring HRQoL (13,14). Both validated disease-specific and generic questionnaires are available for quantifying HRQoL in patients with NTG. The Thyroid Patient-Reported Outcome (ThyPRO) is a HRQoL instrument that can be used across all benign thyroid diseases, and it is the only thyroid-related HRQoL instrument that has been validated in patients with goiter (15 –19). The Medical Outcomes Study 36-item Short Form (SF-36) is the most extensively validated and used instrument to measure generic HRQoL (20,21). Despite the availability of these instruments, HRQoL remains poorly characterized in patients with NTG (8). It is rather paradoxical that while the overall HRQoL impact of NTG may be poorly characterized in the scientific literature, treatment is often chosen for HRQoL indications (8). For example, therapy for NTG can be chosen because of pressure-related symptoms (choking sensation, globus sensation, dysphagia) or cosmetic concerns (1). Thus, better quantification of HRQoL in patients with NTG is warranted to examine the impact of disease and treatment.
The purpose of this prospective cohort study was to evaluate thyroid-related and generic HRQoL, using the ThyPRO and the SF-36, respectively, before and six months after treatment for NTG, and to compare the results against data from the general population.
Materials and Methods
Patients with NTG
From October 2008 to May 2012, patients treated for NTG were prospectively recruited from the endocrine outpatient clinics at two Danish university hospitals: Copenhagen University Hospital Rigshospitalet and Odense University Hospital. Patients were eligible for inclusion if they were euthyroid at baseline, older than 18 years of age, had the ability to complete paper-and-pencil questionnaires in Danish, and had planned treatment (hemi- or total thyroidectomy, radioiodine therapy, or cyst aspiration with ethanol sclerotherapy) for the purpose of volume reduction. Exclusion criteria included thyroid dysfunction at baseline, pregnancy, thyroid cancer, and major comorbidities suspected to have a substantial HRQoL impact (e.g., congestive heart failure). Further, patients were excluded if no relevant treatment was initiated. A subset of the patient data has previously been used for a methodological evaluation of the responsiveness of the ThyPRO survey (19). Patients completed paper-and-pencil versions of the ThyPRO and SF-36 v2 surveys prior to and six months after undergoing treatment for NTG. The HRQoL questionnaires were collected by mail. Eligible patients were sent a booklet containing the two questionnaires (ThyPRO and SF-36 v2) and questions on sociodemographic parameters, comorbidity, and nonthyroid medication. One mailed reminder was sent after two weeks to nonresponders. Clinical data on thyroid disease, including specific diagnosis, previous and current treatment, and biochemical measurements, were obtained by chart review. Thyroid volume was determined by ultrasound performed in a subset of the patients, and biochemical thyroid analyses were thyrotropin (TSH; normal range: 0.3–4.0 mIU/L) and serum total thyroxine (T4; normal range: 65–135 nmol/L).
General population samples
ThyPRO data from the Danish general population were gathered from a random sample of 1200 adult citizens, using the Danish Civil Registration System (each Dane has a unique personal registration number). The sample was stratified by age and sex. Samples of 160 women and 40 men were drawn at random from each of the following age intervals: 18–30 years, 30–40 years, 40–50 years, 50–60 years, 60–70 years, and 70–80 years. The aim of the stratification was to have a sufficiently high number of respondents in each age interval, making analyses of the association between HRQoL and age possible. Furthermore, the sex distribution of the sample (80% women) was chosen to resemble the sex distribution of patients with benign thyroid disease. The 1200 randomly selected individuals received an invitation describing the purpose of the study, a questionnaire, and a stamped addressed envelope. The questionnaire included the items from the nine scales of the ThyPRO survey, which do not make specific attribution to thyroid disease. The questions from the remaining four ThyPRO scales make specific attributions to thyroid disease and therefore were not included for respondents from the general population. The general population questionnaire also included questions about sociodemographic variables, comorbidity, and medication. Respondents with cancer (self-reported) were excluded.
SF-36 data from the Danish general population were derived from the Danish Health Interview Survey in 2005 (22). The survey was based on a region-stratified random sample of adult Danish citizens (aged 16 years or older). The survey included face-to-face interviews in the respondents' homes. After the interview, the respondents were asked to complete a self-administered questionnaire. The interview included questions about sociodemographic variables, comorbidity, and medication; the self-administered questionnaire included the SF-36 v1 Health Survey. A total of 11,238 Danish citizens completed both the interview and the questionnaire. Denmark is divided into five administrative regions. This study only includes general population data for respondents older than 18 years of age living in the three regions from which NTG patients were recruited. Furthermore, respondents with cancer (self-reported) were excluded.
Outcomes
The ThyPRO † questionnaire assesses HRQoL in patients with benign thyroid disease, and it has been validated for use in NTG, hyperthyroidism, hypothyroidism, and Graves' orbitopathy (15 –19). ThyPRO consists of 85 items that assess physical, mental, and social domains of functioning and well-being. The items are summarized in 13 scales: Goiter Symptoms, Hyperthyroid Symptoms, Hypothyroid Symptoms, Eye Symptoms, Tiredness, Cognitive Complaints, Anxiety, Depressivity, Emotional Susceptibility, Impaired Social Life, Impaired Daily Life, Impaired Sexlife, and Cosmetic Complaints, as well as one single item (overall impact of thyroid disease on HRQoL). Each scale ranges from 0 to 100, with higher scores indicating poorer health status.
The SF-36 questionnaire was developed to assess health outcomes from a wide range of interventions on a common HRQoL instrument (20). The survey assesses generic HRQoL with 36 items summarized in eight scales: Physical Functioning, Role-Physical (role limitations due to physical health), Bodily Pain, General Health, Vitality, Social Functioning, Role-Emotional (role limitations due to mental problems), and Mental Health. The scale scores can further be aggregated into the physical and mental component summary scores. SF-36 scores were standardized using norm-based scoring to facilitate the comparison between SF-36 v1 and v2. With norm-based scoring, mean and SD is standardized to 50 and 10, respectively, in the general U.S. population. Norm-based data were used from 1998 where general population data were collected for both v1 and v2 of the SF-36 (23). Higher scores indicate better health status.
Statistical analysis
Differences in mean scale scores between baseline and six months after treatment for NTG were analyzed with the paired t-test. Magnitude of changes in patient scores was evaluated by effect sizes (mean difference/SD baseline). In accordance with Cohen, an effect size of 0.2–0.5 was defined as small, while 0.5–0.8 and >0.8 were defined as moderate and large effects, respectively (24). Treatment effects (change in scale scores) of the different treatment modalities were compared in multivariate linear regression analysis, adjusting for age, sex, comorbidity, and educational status. Differences between patients with NTG and the general reference population were analyzed with multivariate linear regression analysis, adjusting for age, sex, comorbidity, and educational status. Analyses of the relationships between thyroid function at six-month follow-up (euthyroid vs. subclinical thyroid dysfunction) and HRQoL scale scores were performed with multivariate linear regression analysis, adjusted for age, sex, comorbidity, and educational status. All pairwise comparisons were adjusted for multiple testing with the Bonferroni correction. A p-value of <0.05, adjusted for multiple testing, was considered significant. All analyses were performed using SAS v9.3 (25).
Ethical considerations
According to Danish law, questionnaire studies do not require and thus cannot obtain approval by ethical committees. A completed, returned questionnaire is regarded as consent. The study was approved by the Danish Data Protection Agency and conducted in accordance with the Declarations of Helsinki.
Results
Patients and general population samples
A total of 176 patients with benign NTG (meeting all inclusion criteria and no exclusion criteria) were invited to participate in the study. Of these, 135 patients completed the baseline (pretreatment) survey, yielding an initial response rate of 77%, and 111 patients completed the follow-up survey, yielding a follow-up response rate of 82% (Fig. 1). The median age was 53 years, and 79% were female. The patients were treated with radioactive iodine (32%), hemithyroidectomy (53%), total thyroidectomy (12%), and ethanol sclerotherapy of a thyroid cyst (4%). At baseline and six months post-treatment, 100% and 80% of the patients, respectively, were euthyroid. The remaining patients had subclinical thyroid dysfunction at six months of follow-up. The baseline characteristics (sex, age, education, comorbidity, and smoking status) of the 24 patients who only completed the baseline survey and the 111 follow-up responders were not significantly different, and the baseline HRQoL scores were comparable in the two groups. The SF-36 was not included in the surveys provided to patients between October 2008 and December 2009. Thus, only 92 patients completed the SF-36 survey at baseline and six months of follow-up, while 111 patients completed the ThyPRO survey at both time points.

Flow diagram for recruitment and follow-up of patients who were older than 18 years of age, diagnosed with benign nontoxic goiter, and receiving treatment for the purpose of volume reduction. Patients who were pregnant or had major comorbidities suspected to have a substantial health-related quality of life impact (e.g., congestive heart failure or active cancer) were not invited to participate. *111 patients completed the Thyroid Patient-Reported Outcome survey at baseline and six months of follow-up, while 92 patients completed the Medical Outcomes Study 36-item Short Form survey at both time points.
Of the 1200 Danish citizens from the general population who were invited between February 2010 and July 2010, a total of 754 returned a completed ThyPRO questionnaire, yielding a response rate of 63%. The median age was 50 years, and 82% were female. Young men (aged 18–30 years) had the lowest response rate (28%), while 59% of young women completed the survey. The response rate was highest in the age groups from 30 to 70 years for both men and women, with response rates of 67% and 70%, respectively. Elderly women (70–80 years) had the lowest response rate among women (43%), while men in the same age group had a response rate of 53%. Nonresponders had a median age of 53 years, and 77% of nonresponders were female. Fifteen respondents with self-reported cancer were excluded. Data for the remaining 739 respondents (median age 50 years; 81% female) were used in the analyses.
Of the 11,238 respondents from the Danish general population who completed the SF-36 Health Survey, 6713 were older than 18 years of age and living in the three relevant regions. The 75 respondents with a self-reported cancer diagnosis were excluded, and SF-36 data for the remaining 6638 respondents were included in the analyses. The median age was 50 years, and 54% were female. Characteristics of the patient sample and the two general population samples are shown in Table 1.
Combined school and professional education classified in accordance with the International Standard Classification of Education (
Includes asthma, diabetes, ischemic heart disease, stroke, chronic obstructive pulmonary disease, osteoarthritis, gastric/duodenal ulcer, anxiety and depression, other psychiatric diseases, and chronic back pain and other conditions of the back.
Information only available for 35 patients.
NTG, nontoxic goiter; ThyPRO, Thyroid Patient-Reported Outcome; SF-36, Medical Outcomes Study 36-item Short Form; Q1–Q3, interquartile range; TSH, thyrotropin; T4, thyroxine.
Thyroid-related HRQoL (ThyPRO)
The mean ThyPRO scale scores for patients at baseline and six months after treatment are shown in Table 2, together with scores from the general population sample. Baseline ThyPRO scores among patients were significantly higher (worse) than the general population scores on all nine relevant scales. Six months after treatment, ThyPRO scores had decreased (improved) significantly on 6 of 13 scales (Goiter Symptoms, Hyperthyroid Symptoms, Tiredness, Anxiety, Emotional Susceptibility, and Impaired Daily Life) in NTG patients, compared with baseline. The change in Hyperthyroid Symptoms, however, was barely significant after Bonferroni correction (p=0.0494). The magnitude of change in the patients' scale scores are shown in Table 2. The change was large (i.e., effect size ≥0.8) for the Goiter Symptoms scale and moderate (i.e., 0.5–0.8) for the Anxiety scale. Small changes (i.e., 0.2–0.5) were found on four scales (Hyperthyroid Symptoms, Tiredness, Emotional Susceptibility, and Impaired Daily Life). Treatment effects (change in scale scores) were similar across the different treatment modalities (hemi- and total thyroidectomy, radioactive iodine, and ethanol sclerotherapy; data not shown). Patients who experienced subclinical thyroid dysfunction at six months of follow-up had similar HRQoL scores compared to patients who were euthyroid at follow-up (data not shown). Six months after treatment, ThyPRO scale scores remained significantly higher (poorer) in patients compared to the general population on all comparable scales, except Cognitive Complaints.
Data shown are mean scale scores with standard deviations in parentheses.
p-Values and effect sizes are for comparison between baseline and six months after treatment.
Items in these four scales are asked with attribution to thyroid disease and cannot be answered by respondents from the general population sample.
Patient scores significantly different from the general population sample. Effect sizes (baseline vs. 6 months) are defined as mean difference/SD baseline.
Small effect (0.2–0.5); **moderate effect (0.5–0.8); ***large effect (>0.8).
Generic HRQoL (SF-36)
The mean SF-36 scale scores at baseline and six months after treatment of the goiter patients are shown in Table 3, together with those for the general population sample. The patients' baseline SF-36 scores were significantly lower (worse) than those of the general population on four SF-36 scales: General Health, Vitality, Role-Emotional, and Mental Health. The patients' baseline mental component summary score was also significantly lower than the score of the general population. There was a trend toward higher (better) scale scores six months after treatment, although no significant changes were observed in any of the eight SF-36 scales, nor did the physical and mental component summary scores improve significantly after treatment. Patients with subclinical thyroid dysfunction at six months of follow-up had similar HRQoL scores compared to euthyroid patients (data not shown).Six months after treatment, the patients still had significantly lower (worse) scores than the general population on three scales (General Health, Vitality, and Mental Health) and the mental component summary score.
Data shown are mean scale scores with standard deviations in parentheses.
p-Values and effect sizes are for comparison between baseline and 6 months after treatment.
Patient scores significantly different from the general population sample. Effect sizes (baseline vs. 6 months) are defined as mean difference/SD baseline.
Small effect (0.2–0.5).
Discussion
This study demonstrates that patients with untreated NTG have disease-specific and generic HRQoL impairments compared to the general population. Although disease-specific HRQoL improved after treatment, both disease-specific and generic HRQoL deficits persisted six months after treatment.
How does NTG affect HRQoL? The most pronounced differences between patients and the general population were found in the Goiter Symptoms (e.g., sensation of fullness in the neck, visible swelling in the front of the neck, sensation of a lump in the throat, etc.) and Anxiety (e.g., concerned about being seriously ill, feeling uneasy, feeling nervous, etc.) scales. The Goiter Symptoms scale was developed to detect the core physical symptoms associated with enlargement of the thyroid gland. Therefore, it is not surprising, and clinically meaningful, that Goiter Symptoms was the most affected domain. The second most affected domain was Anxiety. Until a definite diagnosis has been established, patients with NTG may experience anxiety and distress caused by the uncertainty of the character of the disease (8). Further, patients may have concerns regarding the potential side effects and complications of treatment. The high level of anxiety is an important finding, which implies that clinicians should be careful to address the anxiety and distress associated with a diagnosis of NTG and its treatment. Patients also had large HRQoL deficits on the Tiredness, Emotional Susceptibility, and Depressivity scales, which in part could be explained by the psychological distress of disease awareness (8).
The greatest improvements were found on the two most affected scales: Goiter Symptoms and Anxiety. Therapy, independent of modality (1,5)—in this study, radioiodine (5), ethanol sclerotherapy (26), or surgery (6)—reduces the volume of the thyroid gland and, therefore, it was in accordance with expectations that the core physical symptoms associated with goiter were reduced. Similarly, it is clinically meaningful that the level of anxiety was lower after treatment, since a benign diagnosis has been established and treatment (e.g., surgery) has been performed successfully. Unexpected improvements were, however, observed (e.g., the improvement on the Hyperthyroid Symptoms scale). Whether this change is directly related to therapy or whether it is a nonspecific effect of receiving medical attention remains unclear. No improvement in Cosmetic Complaints was observed. This was unexpected, as cosmetic issues, according to the literature, are prevalent among patients with goiter (27), and may warrant therapy (1,5). Such discrepancies between expected and observed treatment effects will continue to appear as the use of patient-reported outcomes become more widespread.
This study is the first to include normative ThyPRO data in the evaluation of patients with benign NTG. Such data facilitate the clinical interpretation of patient HRQoL scores (28). Although significant disease-specific HRQoL improvements were observed, the normative data show that both disease-specific and generic HRQoL deficits persist at least six months after treatment, the cause of which is most probably multifactorial. Thyroidectomy and radioiodine therapy cause hypothyroidism that may necessitate subsequent levothyroxine substitution therapy, and this may account for some of the observed HRQoL deficits. Being diagnosed with a disease may change patients' perceptions of their health or affect psychological well-being, which could also impact HRQoL scores.
This study is the first to combine validated disease-specific and generic surveys to evaluate HRQoL in patients with NTG. Comparing these two approaches, ThyPRO detected significant treatment effects in six scales, whereas no significant changes were observed on the SF-36 scales/summary measures. Slightly fewer patients completed the SF-36 (n=92) than ThyPRO (n=111), which reduces the power to detect significant changes in SF-36 scores. However, taking the small effect sizes on the SF-36 into account, this pronounced difference in findings in disease-specific and generic surveys, respectively, cannot be attributed to the small difference in sample size. These results are in line with previous longitudinal studies in patients with benign NTG, where disease-specific surveys have been responsive to changes (29 –31) and generic surveys have been unresponsive (32). While the SF-36 remains useful for comparing health status between patients with different diagnoses (and showed differences between goiter patients and the general population), the present study indicates that ThyPRO should be used in clinical studies evaluating HRQoL treatment effects in patients with NTG, as it may be more sensitive to relevant changes in such patients.
While the strengths of this study include its longitudinal design, the use of validated questionnaires, and the use of general population reference groups, more assessments points, longer follow-up, and larger sample size would allow for further insights. A major strength of this work is the availability of clinical data, in contrast to only self-reported data. However, complete clinical data was not available for all patients (e.g., thyroid volume measurements were only performed in 35 patients), which limited the ability to examine the association between thyroid volume and HRQoL at baseline. The present study lacks the power to assess potential differences in treatment efficacy by goiter type and by treatment modality, since it was not designed for that purpose. The study design, omitting an immediate postoperative evaluation (e.g., confirmation of a benign diagnosis), and long-term follow-up, leaves room for future studies.
The survey response rates observed in this study correspond well with response rates seen in other postal questionnaire surveys (33). Although response rates are not as important as representativeness, low response rates may obviously compromise representativeness. The initial response rate among patients with goiter was 77%, and since the patients who were lost to follow-up had similar baseline characteristics (sex, age, education, comorbidity, smoking status, and HRQoL) compared to patients who completed the six-month survey, the authors are confident that the patient sample is representative. The response rate was 63% in the general population sample completing the ThyPRO questionnaire. The age and sex distribution between responders and nonresponders were similar, but further nonresponse analyses were not possible due to lack of baseline data for the nonresponders. Response rates for the normative SF-36 data have been discussed elsewhere (22).
Treatment of NTG is primarily elected for HRQoL indications, and therefore well-controlled and well-powered studies should compare the HRQoL effects of the various treatment modalities (8). Indeed, an increasing number of studies are using HRQoL as a high-priority outcome, including studies in thyroid diseases. Two ongoing multicenter trials in patients with Hashimoto's thyroiditis and Graves' hyperthyroidism have HRQoL as the primary and secondary outcomes, respectively (34,35). These two clinical trials utilize a Web-based trial management system to deliver automated electronic surveys to the trial participants (36). The use of Web and mobile technologies provide a more efficient method to collect HRQoL data compared to the classic paper-and-pencil surveys, and can thus facilitate a more widespread use of HRQoL surveys in clinical studies. Indeed, more studies of the relationship between HRQoL, clinical variables and treatment modalities are needed to improve the quality of patient-centered care and to guide future clinical practice.
Conclusions
Patients with NTG have impaired HRQoL, particularly due to goiter symptoms and anxiety, compared to the general population. Large/moderate improvements after treatment were observed on the Goiter Symptoms and Anxiety scales, and small improvements were observed on four other ThyPRO scales. No significant effect of goiter treatment could be demonstrated by the SF-36 survey. Thyroid-related and generic HRQoL deficits persisted six months after treatment. Future larger and longitudinal studies with more evaluations—taking thyroid phenotype, size, and function into consideration—may identify risk factors for persistent HRQoL deficits, compare HRQoL effects of the various treatment modalities, and thereby aid in improving individualized care.
Footnotes
Acknowledgments
This study was supported by research grants from the Danish Agency for Science, Technology and Innovation: Council for Strategic Research and Council for Independent Research, and Agnes & Knut Mørks Foundation. L.H. was supported by an unrestricted grant from the Novo Nordisk Foundation. We thank staff and colleagues at the Department of Endocrinology at Rigshospitalet and Odense University Hospital. Special thanks to our research assistant, Sofie Larsen Rasmussen, for her assistance with collection of HRQoL surveys and clinical data entry. Finally, we thank the patients who participated in the study, and the respondents who served as a normal reference population. The data presented in this paper have previously been presented at an international conference as a poster at the ETA Annual Meeting 2014, Santiago de Compostela, Spain.
Author Disclosure Statement
The authors have nothing to disclose.
