Abstract
Background:
Postoperative quality of life (QoL) after surgery for Graves' disease is not well documented, and the effect of different surgical operations has not been compared. This study examines the impact on QoL of a shift in policy from operations intended to preserve thyroid function (PF) to those ablating thyroid function (AF).
Methods:
A cross-sectional assessment was performed on patients who underwent surgery for Graves' disease between 1986 and 2008 in a tertiary endocrine surgical unit. Patients completed the Short Form 36 (SF-36) questionnaire by post. SF-36 scores, including the physical and mental component summaries, were compared with the general population and by operative intent (AF vs. PF). Statistical analyses were performed using SPSS 16.0.
Results:
Of 150 patients, 3 had died of unrelated causes and 14 were not contactable. In the remaining 133 patients, the median age at time of assessment was 46 years (interquartile range 42–50) and 43 years (interquartile range 33–47) in the PF and AF groups, respectively. From these patients, 87 questionnaires (65.4% response rate) were completed with an item completion rate of 99.3%. The median follow-up was 18.4 years for PF and 7.9 years for AF surgery. Of 87 respondents, 38 (43.7%) underwent PF and 49 (56.3%) AF surgery. Study participants reported lower scores across all SF-36 subscales than British norms (p<0.05). Comparisons on operative intent showed no significant difference in long-term QoL (p>0.05).
Conclusion:
The shift to ablative surgery simplifies postoperative management with no adverse effect on QoL, justifying this practice from a patient perspective.
Introduction
Practice in this area has been modified by changing emphasis on different aspects of clinical outcome (2,3). In Aberdeen, there has been a progressive change in policy from operations intended to preserve thyroid function (PF), subtotal thyroidectomy (bilateral subtotal lobectomy), or a Dunhill procedure (unilateral total lobectomy and contralateral subtotal lobectomy) by preserving a significant remnant, to those intended to permanently ablate thyroid function (AF), namely, total thyroidectomy or Dunhill procedure with a minimal remnant. Dunhill procedures can be used to either preserve or ablate function depending on the size of remnant tissue.
Before 1992, policy was based on experience suggesting higher rates of thyroid failure with smaller remnants (4). Subsequent evidence indicated recurrence was greatest in small glands (2). Therefore, after 1992, smaller remnants of 2–4 g were preserved in small glands to avoid recurrent toxicity, and larger remnants of 9–10 g in large glands to avoid thyroid failure. With increasing emphasis on avoiding recurrent thyrotoxicosis, the policy moved gradually toward routine total thyroidectomy.
It is known that PF procedures do not reliably restore normal function. Many patients undergo a period of hypothyroidism before normal function returns; some never become euthyroid and there is a cumulative incidence of thyroid failure over time (2). In addition, PF procedures have a significant risk of recurrent toxicity and lifelong specialist follow-up is therefore recommended.
Surgical interventions have been assessed in terms of health service utilization measures, such as cost, length of hospital stay, and hospital readmission, and clinical measures, such as symptomatic improvement, complications, and death (5). With increasing patient expectations and the need for healthcare provision to be demonstrably patient centered, postoperative quality of life (QoL) is becoming a standard and preferred outcome measure in clinical trials of surgical interventions (6,7).
Although recent systematic reviews (8,9) and a cost-effectiveness analysis (10) have recommended total thyroidectomy, these recommendations are based on clinical measures, including the risk of operative complications, likelihood of recurrence, and the need for subsequent surgical reintervention. No study to date has compared the long-term postoperative QoL in Graves' disease by type of thyroidectomy and operative intent (AF vs. PF). The aim of this study was to examine the postoperative QoL, and to determine whether the change in local policy to total thyroidectomy altered the postoperative QoL of patients with Graves' disease.
Methods
The study population comprised patients from the northeast of Scotland who underwent surgery for Graves' disease in Aberdeen between January 1986 and September 2008. Patients were under the care of two consultant surgeons with an interest in endocrine surgery. Patients with a diagnosis of Graves' disease, confirmed by clinical presentation, biochemical and immunologic assays, and histology, were identified from a prospectively collected database, supplemented with information regarding current address and vital status from NHS Grampian patient administration system and the Grampian Automated Follow-up Register (GAFUR), one of the oldest thyroid registers in the world (established in 1967). At the time of the study, all patients were biochemically euthyroid without replacement or stabilized on thyroxine. The operative intent, whether to preserve or ablate function, was recorded at the time of surgery.
Study design and ethics approval
This was a cross-sectional questionnaire assessment of the long-term QoL of patients who underwent surgery for Graves' disease. QoL was assessed using version 2 of the Short Form 36 (SF-36) health survey questionnaire (QualityMetric, Inc.).
This study was approved by the North of Scotland Research Ethics Service as a service evaluation. To comply with local ethics requirement, all questionnaires were anonymized with removal of all identifier information (including age and gender) before distribution. No specific funding was received for this study.
After confirmation of their vital status, patients were sent a personalized invitation letter, a patient information leaflet, and an anonymized SF-36 questionnaire. All patients were also offered the choice of completing the questionnaire by telephone interview. A blanket reminder was sent to all patients at 1 month to encourage participation.
Statistical analysis
Algorithms provided by QualityMetric, Inc., were used to compute scores for SF-36 subscales and the standardized physical component summaries (PCS) and mental component summaries (MCS). Since the majority of SF-36 subscale scores were not normally distributed, nonparametric tests (Mann–Whitney U-test) were used to compare study groups. However, comparisons with mean reference scores from the British population (British norms) necessitated the use of parametric tests (one sample t-tests).
Due to exclusion of information regarding age and gender on the anonymized questionnaires, comparisons with the general population were made using nonadjusted British reference scores (11). Standardized PCS and MCS scores were calculated to represent the deviation from the British population. Statistical comparisons of the normally distributed PCS and MCS scores were performed using parametric tests (one sample t-test). Statistical significance was set at p<0.05. Statistical analysis was performed using the SPSS for Windows version 16.0 (SPSS).
The half-scale rule was used to handle missing data; if 50% or more items in one subscale were completed, the mean value of these items was used to substitute. However, if less than 50% of items were completed, the subscale score was excluded from the analysis. To allow PCS and MCS calculation, sufficient data must be present to calculate each of the subscale scores.
Results
Surgical practice in Graves' disease has changed (Fig. 1). Before 1992, the majority of operations in Aberdeen were performed with the aim of PF. These consisted mainly of subtotal thyroidectomy and rarely a Dunhill procedure. However, after 1992, the concept of AF with total thyroidectomy was gradually introduced. By 1999, a complete shift to thyroid ablation was observed and total thyroidectomy became the operation of choice for Graves' disease.

Trends in surgery over time by operative intent, 1986–2008.
Baseline characteristics at time of surgery
Between 1986 and 2008, a total of 150 patients underwent surgery for Graves' disease. Of these, 130 (86.7%) were female and 20 (13.3%) were male. The mean age at surgery was 30.2 years (interquartile range [IQR] 25–34, range 12–65). Seventy-three (48.7%) of 150 patients underwent a subtotal thyroidectomy; 23 (15.3%), a Dunhill procedure; and 54 (36.0%), a total thyroidectomy. In 78 (52.0%) patients, the intention was to PF; in 72 (48.0%), the intention was to ablate function (Table 1).
Values in parentheses are percentages unless otherwise indicated.
Some patients fulfilled more than one indication for surgery.
Pregnancy, poor compliance, size, severe thyroid eye disease, adverse drug reactions, and failure of radioiodine.
NA, not applicable.
The cross-sectional questionnaire assessment of QoL occurred at a median follow-up time of 18.4 years (IQR 16.8–21.7) for operations that preserve function and 7.9 years (IQR 4.4–13.7) for ablative surgery.
Study sample characteristics and response rates
Three patients died before the study and 14 were noncontactable due to incorrect address (Fig. 2). In the remaining 133 cases, the median age at time of assessment was 46 years (IQR 42–50) and 43 years (IQR 33–47) in the PF and AF groups, respectively. The difference in age was statistically significant (p=0.001; Mann–Whitney U-test). The specific age distributions for respondents, however, were not available due to ethics restrictions.

Study population.
Of 133 patients, 87 responded after one reminder, giving a response rate of 65.4%. The average item completion rate was 99.3% (range 97.7–100). No patient opted to complete the questionnaire via a telephone interview. Of the 87 respondents, 38 (43.7%) underwent PF and 49 (56.3%) underwent AF surgery. There was no significant difference in response rates between both groups (PF=58.5%, AF=72.1%, p=0.099; chi-square test). When comparing respondents and nonrespondents in each group, no significant differences were found (PF, p=0.942; AF, p=0.810; chi-square test).
Comparison with British norms
Despite not fulfilling a normal distribution, parametric tests had to be used to allow comparison of subscale scores from study population with reference scores for the British population (11). Patients participating in this study reported lower scores across all subscales of SF-36 than the British population (Table 2 and Fig. 3), indicating a reduced QoL (p<0.05). Differences were mostly marked in the subscales for bodily pain, vitality, and general health perceptions (p≤0.001). Additionally, standardized SF-36 component scores, PCS (46.6) and MCS (43.7), showed values lower than the reference mean of 50 for the British population, demonstrating a poorer overall QoL.

Comparison of study population with British norms (mean Short Form 36 [SF-36] scores).
Study population versus British norms (one sample t-test).
SF-36, Short Form 36; CI, confidence interval.
Comparison by operative intent
AF surgery was associated with higher median values for bodily pain, vitality, social functioning, and emotional role limitation than PF surgery. In contrast, PF surgery was associated with higher values for physical functioning, physical role limitation, and general health perceptions (Table 3 and Fig. 4). However, all differences did not reach statistical significance (p>0.05; Mann–Whitney U-test). Standardized PCS and MCS scores of both groups were lower than the mean of 50 for the British population (47.8 vs. 45.7 and 40.7 vs. 46.0, respectively). There was no significant difference in PCS (p=0.451) and MCS (p=0.066) scores between the two groups.

Median SF-36 scores by operative intent.
Intention to preserve versus intention to ablate (Mann–Whitney U-test).
IQR, interquartile range.
Discussion
In a recently published guideline, the American Thyroid Association and American Association of Clinical Endocrinologists recommend near-total or total thyroidectomy as the procedure of choice in Graves' disease (12). Several studies, including a meta-analysis (13) and a recent systematic review (8), have made similar recommendations. These are based on clinical measures, including risk of operative complications, likelihood of recurrence, and the need for subsequent surgical intervention. Little is known about the effect of surgery on QoL in Graves' disease and no study to date has compared the long-term postoperative QoL in these patients by operative intent (to ablate or preserve function) or type of thyroidectomy performed.
Assessments of QoL aspire to capture the patient's perspective of disease and treatment, and their perceived needs and preferences (14). Postoperative QoL is a crucial outcome measure especially when survival is similar between the interventions being studied (6). Hence, the evaluation of QoL following different surgical approaches is vital (15).
Previous studies have reported reduced QoL scores among patients with Graves' disease (16,17). Factors shown to affect QoL include untreated thyrotoxicosis (18) and the presence of thyroid eye disease (TED) (19). Restoring euthyroidism results in a significant improvement of psychiatric symptoms and overall QoL (18). The impact of TED was studied by Gerding et al. (20), who noted marked limitations in physical and mental functioning with even mild to moderately severe disease. Such effects were shown to last for many years after treatment (21,22).
Few studies have compared the effect of different treatment modalities on QoL. Abraham-Nordling et al. (23) obtained long-term SF-36 data on 147 patients with Graves' disease randomized to anti-thyroid drugs, radioiodine, or surgery. The author reported no differences between treatment options, concluding that the mode of treatment, whether surgical, medical, or radioiodine, had little impact on long-term QoL. Similarly, in a study of patient satisfaction, Ljunggren et al. (24) reported comparable outcomes in patients treated with the block-and-replace antithyroid regimen, subtotal thyroidectomy, or radioiodine treatment.
Our results indicate that even long after hyperthyroidism is cured by surgery, patients continue to have poorer QoL scores across all SF-36 subscales and component scores than that of general population. These findings accord with those reported by Abraham-Nordling et al. (23), who found diminished MCS scores among Graves' disease patients many years after treatment. Studies of clinical outcomes that did not compare QoL demonstrated insignificant differences between total thyroidectomy and types of subtotal resection (subtotal thyroidectomy and Dunhill procedure) (8,25,26). In this study, comparisons on the basis of operative intent, whether to preserve or ablate function, showed no significant difference in long-term QoL. Patients undergoing ablative surgery showed higher MCS and lower PCS scores than those undergoing subtotal resections, but these were not statistically significant.
Respondents and nonrespondents in this study were comparable for the type of thyroidectomy and operative intent. Nonparticipation is therefore unlikely to be due to the type of procedure undertaken, but the potential source of bias if ill patients were less likely to respond than those with better health cannot be overlooked (27).
Postal questionnaires are a convenient way of obtaining patient data, and unlike interview or telephone-based surveys, self-completed questionnaires are not affected by interviewer bias and variability (6). However, the problem of nonresponse (loss to follow-up) is a major issue, as it reduces effective sample size and validity by introduction of bias (28 –30). The relatively high response (65.4%) reported in this study was obtained despite long average follow-up times. Returned questionnaires also had high item completion rates (99.3%) allowing for comprehensive scoring of SF-36 subscales. This study utilized concise questionnaires, personalized letters, and reminders that increase response rates (28).
Given the cross-sectional design of this study, a validated generic questionnaire was deemed most useful. The SF-36 was chosen for its validity, sensitivity, and reliability. Its short and clear structure reduces respondent burden, improves understanding, and maximizes response rates. It also allowed comparisons to be made with scores from the general population. On the other hand, the addition of disease-specific measures may be more sensitive to the small changes in health important to clinicians and patients (31). However, no validated questionnaires are available and several studies have resorted to the use of unvalidated QoL questionnaires and other psychiatric questionnaires such as the Hamilton Anxiety Scales and the Hamilton Depression Rating Scale to assess QoL issues in patients with Graves' disease (16,17).
There are limitations regarding the interpretation of our data. The lack of preoperative QoL data coupled with long average follow-up times means that we cannot absolutely relate current health status to the surgery alone. Factors that are likely to impact on overall QoL include manifestations of Graves' disease such as TED and other co-morbid medical conditions arising since surgery, not to mention other life circumstances for which we have no means of accounting. The lack of preoperative scores affects the ability to detect and quantify change attributed to surgery, as pre-existing conditions are likely to impact on postoperative QoL scores.
Normative population scores (reference scores) are known to vary with age and gender. Absence of such information in our cohort due to ethics restrictions limited comparisons to nonadjusted normative scores only. Of the total sample of 133 contactable cases, those in the PF group were older at time of assessment due to longer follow-up times. Specific age distributions for respondents were not available, but provided these exhibited similar trends to the overall sample then this may introduce a potential source of bias masking differences in QoL between groups.
The lack of patient identifier data meant that correlations with individual clinical data, including the onset of thyroid failure requiring thyroxine replacement, and recurrent toxicity requiring reoperation or radioiodine ablation were not possible. To reduce possible sources of bias, medical notes and the GAFUR and laboratory databases were consulted before study commencement to confirm that all contactable patients were biochemically euthyroid without replacement or stabilized on thyroxine.
Despite the high completion rates in this study, one cannot exclude misinterpretation of questionnaire items. Other problems associated with the use of self-administered questionnaires relate to the ceiling and floor effects where changes in patients at either end of the QoL spectrum are not detected. Although version 2 of the SF-36 reduces this effect, it cannot be fully controlled (32).
Despite including patients from over a 22-year period and having a relatively high response rate, the final sample size was still low. The reduced number of operations for Graves' disease in the northeast of Scotland reflects the growing use of radioiodine ablation (2 –4). The sample size may have affected the ability to detect a difference between the study groups if one existed (type II error).
Reasons behind the persistent impairment in QoL after surgery warrants further study. Future research could be directed at comparing pre- and postoperative QoL scores for each type of thyroidectomy. In this setting, addition of a disease-specific questionnaire will allow responsiveness to change to be measured while also capturing any disease-specific characteristics relevant to QoL and perhaps not directly attributable to postoperative status (6). There is a requirement for validated questionnaires targeting patients with thyroid problems. Comparison studies would be needed to assess the validity, reliability, and responsiveness of questionnaires across various age groups and populations (31).
The shift to ablative surgery for Graves' disease simplified postoperative management with no adverse effect on postoperative QoL. This study provides further support for its routine use from a validated patient perspective.
Footnotes
Acknowledgment
No funding was received for this study.
Disclosure Statement
The authors declare that no competing financial interests exist.
