How Should Thyroid-Related Quality of Life Be Assessed? Recalled Patient-Reported Outcomes Compared to Here-and-Now Measures

Abstract

Background:

The impact of thyroid disease on quality of life is an important disease aspect that is best investigated by patient-reported outcomes. Recent patient-reported outcomes research has raised concern about the validity of traditional retrospective questionnaires. Therefore, ecological momentary assessments of patients' subjective well-being have been introduced to avoid recall bias and improve contextual validity. Despite theoretical advantages, the measurement properties remain unsubstantiated. This study examines the relationship between the retrospective thyroid-related quality of life patient-reported outcome measure (ThyPRO) and a momentary (here-and-now) version of ThyPRO.

Methods:

Eighty-three newly diagnosed hyperthyroid patients expected to undergo treatment completed questions on their thyroid-related quality of life. Head-to-head comparison was performed between 12 momentary items from four multi-item ThyPRO scales administered three times daily via a smartphone application during 28 days and the original retrospective ThyPRO on day 28. The measurement difference between recalled and momentary ratings was quantified for all four scales. Furthermore, correlations between the measures were investigated, and their agreement was explored using Bland–Altman plots. Finally, the study examined whether retrospective ratings were influenced by two forms of recall bias (the peak effect and the end effect).

Results:

Retrospective and mean momentary ThyPRO ratings were highly correlated (Pearson's correlations: 0.74–0.88). However, retrospective ratings provided significantly higher scores (i.e., worse quality of life) on all scales. Bland–Altman plots showed a skewed distribution, indicating low levels of agreement. Results supported a peak effect for retrospective ratings on tiredness but not for the remaining scales. Further, results supported end effects for retrospective ratings of emotional susceptibility and anxiety.

Conclusions:

Retrospective and mean momentary ThyPRO ratings correlated strongly, but retrospective ratings were higher, indicating more disease impact. The differences were of magnitudes normally deemed clinically relevant. Limited evidence supported peak and end effect bias for retrospective assessments. The two measurement modalities did not appear congruent and thus cannot be used interchangeably. When designing clinical studies, whether to use a momentary or retrospective measurement method may depend on the aim of measurement. Further prospective analyses are needed to compare any beneficial effects, for example in terms of higher precision or sensitivity to clinical change, of momentary assessments.

Introduction

Patient-reported outcomes are increasingly employed in a medical world where assessment of the quality of clinical care has been put into focus. To capture the impact of disease and treatment from the patient's perspective, a large number of patient-reported outcome measures (PROMs) have been developed. In thyroidology, the thyroid-related quality of life patient-reported outcome measure (ThyPRO) has been developed and extensively evaluated, including content validity, factorial validity, known groups validity, cross-cultural validity, internal consistency, test–retest reliability, and responsiveness (1 –6). A systematic review by Wong et al. found the ThyPRO to have strong measurement properties and recommended ThyPRO as the preferred PROM for patients with benign thyroid disease (7). The only limitation of ThyPRO mentioned was the length of the questionnaire, calling for an abbreviated version, which has now been developed (8). Two additional PROMs were found to have adequate measurement properties: Graves' ophthalmopathy-specific Quality of Life (GO-QOL) (9 –11) and Thyroid Treatment Satisfaction Questionnaire (ThyTSQ) (12,13). However, these PROMs target patients with Graves' ophthalmopathy and hypothyroidism, respectively.

Ecological momentary assessment is a recently introduced new measurement method with the ability to expand further on the quality of patient-reported outcomes. Ecological momentary assessments investigate experiences in real time as they occur in the daily lives of patients by repeatedly asking them how they currently feel (14). The method was introduced in response to raised concerns about the validity of traditional retrospective PROMs. The argument was that memory is flawed and that people are unable to retrieve past experience properly. Instead, mental shortcuts are used to reconstruct past experiences, which may introduce systematic bias and inaccuracy (15).

The method of ecological momentary assessments has mainly been used to study pain and fatigue, as well as being used in behavioral research (15). Compared to traditional retrospective assessments, the mean of momentary ratings usually produced lower ratings of symptom severity for both pain and fatigue (16 –18). Studies using momentary assessments to evaluate pain have shown that respondents may pay excessive attention to the most intense as well as to the most recent pain when retrospectively rating an experience, while duration of the pain was given less attention (19 –21). These types of recall bias are known as the peak effect and end effect, respectively.

In addition to avoiding recall bias, momentary assessments have other potential advantages. First, they provide ecological validity, that is, people are answering questions while living their everyday lives as opposed to answering questionnaires at a time dictated by convenience or even in the doctor's office (15,16). Second, repeated sampling enables investigation of symptoms over time, including daily fluctuations (14,15). Thus, ecological momentary assessments have the potential to provide more valid data on quality of life, and therefore a momentary version of the original ThyPRO was developed (22).

The objective of the present study was to examine the relationship between the original retrospective ThyPRO and momentary ThyPRO ratings and to evaluate the presence of recall bias in a standard retrospective PROM across four different scales.

Methods

Participants

Patients newly diagnosed with hyperthyroidism, including Graves' disease, toxic nodular goiter, and drug-induced thyrotoxicosis, were included. Patients were recruited from September 2014 to July 2017 from endocrine outpatient clinics at four Copenhagen University Hospitals: Rigshospitalet, Gentofte, Herlev, and Bispebjerg. Patients were eligible if they were ≥18 years of age, understood Danish, had serum thyrotropin (TSH) levels <0.1 mIU/L within the last month, and were scheduled to undergo treatment for hyperthyroidism (antithyroid drugs, radioiodine, or surgery). If treatment had already been initiated, patients had to have elevated free thyroxine (fT4; reference range 10–22 pmol/L) in addition to TSH <0.1 mIU/L as a marker of current disease activity. Exclusion criteria were pregnancy and major comorbidities suspected to impact quality of life substantially (e.g., cancer or congestive heart failure). Eligible patients were initially contacted by phone. Patients who were interested in participating then received written information via e-mail.

Collection of momentary and retrospective ratings

The ThyPRO assesses quality of life in patients with benign thyroid disease. It consists of 85 items summarized in 13 multi-item scales and one single-item scale. Each scale ranges from 0 to 100, with higher scores indicating worse quality of life. The original ThyPRO uses a four-week reference period, that is, it asks patients to summarize their experiences over the last four weeks. A momentary version of selected sections of the ThyPRO was developed. Items from the five scales previously found to be most responsive were evaluated by conducting cognitive interviews with patients from the target group. It was evaluated if items functioned in a momentary setting and if there were any problems with the new item versions (22). One scale was found to be incompatible with a momentary setting. To minimize response burden while allowing representation of sub-domains within each scale, three items from each of the remaining four multi-item scales were selected for this study. Items were chosen based on how well they functioned in a momentary setting, that is, if items were actually answered with a momentary reference period and the amount of problems detected during the cognitive interviews. The items and scales were: (i) Hyperthyroid Symptoms scale—“At this moment, do you have trembling hands?” “…are you experiencing palpitations (rapid heartbeats)?” and “…are you experiencing shortness of breath?”; (ii) Tiredness scale—“At this moment, are you tired?” “…are you exhausted?” and “…do you feel energetic?”; (iii) Anxiety scale—“At this moment, do you feel nervous?” “…do you feel tense?” and “…do you feel restless?”; and (iv) Emotional Susceptibility scale—“At this moment, do you have difficulty coping?” “…do you feel irritable?” and “…do you feel in balance?” (22).

The 12 momentary items were administered three times a day for a period of four weeks. On the last day, patients received an electronic ThyPRO survey via e-mail containing the same 12 items but with the original retrospective four-week reference period to cover the same four-week period by both measures.

Momentary assessments were collected using an Android smartphone application (app) specifically designed for the capture of momentary questionnaire data. If patients owned an Android smartphone, the app was installed on the patients' own devices; if not, they borrowed one. The app administered momentary questions three times a day at semi-randomized time points. A participant's waking hours were divided into three equal periods, and within each period, a prompt was issued at a random time. It featured auditory prompts and presented the items via the touch screen. If a notification was prompted at an inconvenient time, the assessment could be postponed by up to one hour. The system recorded the time and date of each data entry. Patients were able to set and adjust their diurnal rhythm in the app from day to day to ensure that every waking hour was represented. The app was integrated with a trial management system, PROgmatic (23), which enabled automatic distribution of retrospective questionnaires and daily monitoring of response rates.

Data analysis plan

Both retrospective and momentary ThyPRO ratings were aggregated and presented on a scale level. Differences between retrospective and mean momentary ratings (i.e., the mean of all momentary assessments collected during the study period) were analyzed with a paired t-test. Patients with very low response rates (<35%) were excluded from analysis. For patients with less missing data, the mean was taken over all available assessment (equivalent to mean score substitution for missing data). The magnitude of the difference in measurement method was analyzed by Cohen's d (mean difference/standard deviation). Investigations of previous studies using ThyPRO revealed an average scale standard deviation of 20 (24 –28), which is why this standard deviation was used for this study as well. Effect sizes were defined as small (0.2–0.5), moderate (0.5–0.8), and large (>0.8) (29). The level of agreement between the two measures was investigated using Bland–Altman plots, which plot the difference between retrospective and mean momentary score against their average for each participant (30). Preliminary analysis of the Bland–Altman plots showed that differences increased with higher mean score, resulting in a trumpet-shaped plot. To adjust for this skewness, a logarithmic data transformation was subsequently performed. Correlations between the two measures were calculated using Pearson's correlation coefficients.

To evaluate the presence of a peak effect, scale scores were exponentiated by e^x to give added weight to higher scores, consistent with the hypothesis that when judging a past experience, people pay strong attention to the most intense experiences (peaks) rather than simply averaging every moment of the experience.

To test for the end effect, each participant's momentary scores were aggregated weekly (weeks 1, 2, 3, and 4), and correlations between the retrospective score and mean momentary scores of each week were compared. It was hypothesized that correlations would be higher for later weeks due to the end effect. All analyses were performed using SAS Enterprise Guide v7.1.

Ethical considerations

The study was performed in accordance with the Declarations of Helsinki. The study protocol was reviewed by the local Ethical Committee (reg. no. H-A-2009-FSP23). According to Danish law, questionnaire studies do not require and thus cannot obtain formal approval by ethical committees. Informed consent was obtained from all individual participants. The study was approved by the Danish Data Protection Agency (local identifier at Rigshospitalet: 13-30-1092).

Results

Participants

A total of 273 eligible hyperthyroid patients with phone numbers available from the patient record were contacted, and 94 agreed to participate. The main reasons for non-participation were unwillingness to carry an extra phone among those without an Android phone, or being too busy. Others did not answer their phone or return calls after receiving the written information. Five participants withdrew from the study after initial consent by not responding to any of the momentary assessments. When contacted, these patients indicated that they had changed their mind. Three participants were excluded due to very low response rates (<35%), and three others did not answer the retrospective ThyPRO questionnaire. Analyses were performed on data from the remaining 83 participants. In accordance with thyroid epidemiology, the majority of participants were women (87%), with a median age of 49 years (Table 1). Twenty-five participants owned an Android smartphone, and the app was installed on their own device. The remaining 58 participants borrowed an Android smartphone with the app. Each participant received 87 notifications to answer momentary assessments during the study period. This added up to a total of 7221 notifications, of which 5908 were answered within the one-hour entry period, yielding a response rate of 82% in total. Participants using their own smartphone had a response rate of 86% (1876/2175), while participants borrowing a smartphone had a response rate of 80% (4032/5046). For baseline characteristics of nonparticipants and dropouts, see Supplementary Table S1 (Supplementary Data are available online at www.liebertpub.com/thy).

Table 1.

Clinical Characteristics

N	83
Sex, n (%)
Women	72 (87%)
Men	11 (13%)
Age, median years (range)	49 (22–74)
Median TSH	<0.01
Median fT4	29^a
Patients on ATD treatment at time of inclusion, n (%)	47 (57%)
Diagnosis, n (%)
Graves' disease	59 (71%)
Toxic nodular goiter	18 (22%)
Amiodarone-induced hyperthyroidism	5 (6%)
Iodine-induced hyperthyroidism^b	1 (1%)

Information on fT4 was missing for five patients.

Weight-reducing product containing iodine.

TSH, thyrotropin (mIU/L); fT4, free thyroxine (pmol/L); ATD, antithyroid drug.

Descriptive analysis

Daily mean momentary scores from day 0 to 28 are shown in Figure 1 for all four scales. The Tiredness scale received the highest score, followed by the Emotional Susceptibility scale, while the scale with lowest score was the Anxiety scale. Figure 1 illustrates the change in quality of life over time: decreasing scores indicate improving quality of life during the four weeks. Figure 2 shows mean momentary scale scores over 28 days from six patients. The particular graphs were chosen as examples to illustrate some different trends observed in the study and to show how the measurements can be displayed to clinicians and patients in a clinical setting.

FIG. 1.

Daily mean momentary scores with confidence intervals over a four-week period for 83 patients with hyperthyroidism. Scale scores range from 0 to 100.

FIG. 2.

(A–F) Daily mean momentary scores for six participants representing different fluctuation patterns over the 28 days. (A) Participant with random variation on all scales throughout the period. (B) Participant with larger random variation. (C) Participant showing improvement throughout the period. (D) Participant showing deterioration throughout the period. (E) Participant with very low scores and a few peaks. (F) Participant with very stable symptoms.

Means and standard deviations of momentary and retrospective ratings are shown in Table 2, as well as median, scoring range, correlations, and mean differences between the two measures. Table 2 shows that large parts of the scoring spectrum of both measures were used at some point during the study. Furthermore, Table 2 shows that the momentary ratings were to some extent skewed toward lower ratings. A log transformation did not alter this. A high proportion of the participants had scores on the lower end of the Anxiety scale and Hyperthyroid Symptoms scale (i.e., floor effects).

Table 2.

Mean Scale Score Over 28 Days, Median, Correlation, and Mean Difference Between Retrospective and Momentary Ratings for 83 Patients with Hyperthyroidism

ThyPRO scale	Mean scale score (SD)	Median	Scoring range	Pearson correlation	Mean difference (SD)
Tiredness
Mean momentary rating	38 (23)	33	0–100
Retrospective	42 (22)	42	0–83	0.86^*	4 (11)^{* a}
Emotional susceptibility
Mean momentary rating	24 (19)	17	0–100
Retrospective	31 (21)	25	0–92	0.88^*	7 (11)^{* a}
Hyperthyroid symptoms
Mean momentary rating	13 (15)	8	0–92
Retrospective	24 (17)	25	0–83	0.74^*	11 (12)^{* b}
Anxiety
Mean momentary rating	8 (12)	0	0–100
Retrospective	14 (19)	8	0–92	0.82^*	7 (13)^{* a}

Small effect size (0.2–0.5).

Moderate effect size (0.5–0.8).

Correlation and mean difference significantly (p < 0.001) different from 0.

SD, standard deviation.

The retrospective ratings were strongly correlated with mean momentary ratings on all scales, ranging from 0.74 for the Hyperthyroid Symptoms scale to 0.88 for the Emotional Susceptibility scale (Table 2). However, momentary and retrospective ratings did not provide identical scores. Retrospective ratings were significantly higher (more symptoms) than mean momentary ratings on all scales. The smallest difference was found on the Tiredness scale, where retrospective ratings measured four points higher than momentary ratings [confidence interval (CI) 1–6], equivalent to a small difference (effect size 0.2–0.5), and the largest difference was found on the Hyperthyroid Symptoms scale, which measured 11 points higher [CI 8–13], equivalent to a moderate difference (effect size 0.5–0.8). On the Anxiety scale and the Emotional Susceptibility scale, retrospective ratings were seven points higher than momentary ratings [CI 4–9], equivalent to small differences.

In Bland–Altman plots (Fig. 3), all scales had a tendency to have higher differences with higher mean score (more symptoms), even after logarithmic transformation. This tendency was especially apparent in the Hyperthyroid Symptoms scale and the Anxiety scale. Graphically, the Tiredness scale showed the highest level of agreement, with relatively evenly distributed differences. However, inspection of the limits of agreement showed that the agreement was far from satisfactory. The limits of agreement were −0.11 and 0.18. Antilog of these limits yield 0.78 and 1.5, meaning that for 95% of cases, retrospective ratings will be between 0.78 and 1.5 times the mean momentary rating, that is, retrospective ratings will differ by 22% below and 50% above mean momentary ratings. Upper limits for the remaining scales are even higher, with retrospective ratings differing by 100% above mean momentary ratings on the Hyperthyroid Symptoms scale.

FIG. 3.

Bland–Altman plots of the four ThyPRO scales. The horizontal axis is the mean score of retrospective and mean momentary ratings. The vertical axis is the difference in scores (retrospective–momentary). The solid horizontal line is the mean difference, whereas the punctuated lines represent the limits of agreement (2 SD _diff). Scale scores were transformed logarithmically.

Peak effect and end effect

The transformation of scale scores to e^x had no effect on three scales; retrospective ratings were still significantly higher than mean momentary ratings (Hyperthyroid symptoms: 31 vs. 19, p < 0.01; Emotional Susceptibility: 41 vs. 34, p < 0.01; Anxiety: 25 vs. 12, p = 0.02). However, after transforming the Tiredness scale, a nonsignificant mean difference was found between the two measures (51 vs. 50, p = 0.54).

Figure 4 shows Pearson's correlations between retrospective ratings and weekly momentary ratings for each of the four scales. Correlations for the Emotional Susceptibility scale and the Anxiety scale were both increasing, indicating a higher correlation for later assessments than for earlier assessments, which is consistent with the end effect hypothesis. The two remaining scales did not show a similar tendency.

FIG. 4.

Correlations with confidence intervals between retrospective ratings and weekly momentary ratings for 83 patients with hyperthyroidism.

Discussion

This is the first study to investigate ecological momentary assessments in the field of thyroidology. The methodology was designed to assess patient-reported outcomes more accurately than retrospective ratings, based on the hypothesis that momentary assessments are free from recall bias. However, knowledge about the relationship between classical retrospective measures and momentary measures is still limited. In this study, retrospective items and momentary items from the widely used ThyPRO survey were tested head-to-head in 83 hyperthyroid patients expected to undergo treatment. Retrospective and momentary measures showed strong correlations, suggesting that patients with high retrospective ThyPRO scores also had high momentary scores. However, the two measures provided significantly different results and low level of agreement across all four scales.

The study confirms and extends the findings from studies of pain and fatigue, that is, retrospective ThyPRO ratings were higher than mean momentary ratings (16 –18). The magnitude of this difference varied depending on the scale. Cohen's d was used to examine the magnitudes, since the minimal important difference has not been determined for ThyPRO. Small effect sizes were found on three scales, and a moderate effect size was found on one scale. Thus, the observed differences were considered clinically relevant (29). A possible explanation for the higher retrospective ratings is that the momentary assessment sampling density of three per day was too low to capture relevant symptom peaks. However, other studies have shown the same discrepancy, despite using much higher sampling densities (16 –18,20). Perhaps participants are more inclined to ignore or decline an assessment when symptoms are at their worst, which would result in missing peaks. With a response burden perceived as too high, some participants might rush through the questions, selecting the option indicating highest quality of life. Finally, in a previous cognitive interview study converting retrospective items to momentary versions, it was found that changing the reference period could change the way participants perceived the meaning of the item, for example when changing a retrospective item assessing increased appetite, the momentary item was understood as being hungry (22). Items behaving this way were not chosen for the current study. However, some items may act differently in a real-world setting than in a cognitive interview situation, so a change in meaning may have been overlooked.

Bland–Altman plots showed low levels of agreement, perhaps indicating that retrospective and momentary ratings are measuring different concepts. Recent studies suggest that we are psychologically comprised of more than one self, for example the experiencing self and the remembering self (31). Different ways of measuring a concept may evoke responses from different selves. Experience measured momentarily evokes the experiencing self and is more strongly correlated with physiological processes (e.g., stress response) than when measured retrospectively (32). On the other hand, experience measured retrospectively evokes the remembering self and was found more to be important to health decision making (20). The choice of measurement method should therefore be defined by the study aims and based upon which aspects of symptoms are most relevant. For example, if we are studying compliance with treatment, retrospective measures should most likely be the method of choice.

By transforming data to give added weight to higher scores, the study showed that retrospective ratings on the Tiredness scale may have been influenced by the peak effect, while ratings on the remaining scales were not. A possible explanation is that the Tiredness scale has the highest variability (largest standard deviation), why the higher scores (peaks) are relatively larger and thus more memorable compared to the other scales. Another explanation is that tiredness is a more consciously perceived outcome compared to anxiety and emotional susceptibility, which are more subconsciously perceived, perhaps making them less prone to peak effects. However, items from the Hyperthyroid Symptoms scale are likewise consciously perceived and should also be influenced by a peak effect if this was the sole explanation.

It was found that retrospective ratings on two of four scales may have been influenced by the end effect. The presence of such an effect entails that when a physician or researcher asks about symptoms over the previous month, the patient's response is actually mainly based on the most recent days. Momentary assessments were aggregated for each week to base the correlations on a sufficiently high number of assessments.

A strength of this study is that the method of ecological momentary assessments was tested using a validated disease-specific PROM with items from multiple scales, thereby increasing the generalizability of the results. In addition, it is important to note that data were collected while patients were undergoing treatment and changes in symptoms were expected, making the results transferable to a clinical trial setting. Finally, momentary assessments were collected using an innovative measurement technology allowing collection on patients' own mobile devices.

However, the study also has some limitations. It was not feasible to conduct the study with all ThyPRO items, which is why only a subset of items was investigated. However, the selection process was conducted thoroughly to select the most suitable items (22). The smartphone app only functioned on Android smartphones, making it necessary for more than two thirds of participants to borrow a project smartphone and others to decline the invitation. Although response rates were lower for the group borrowing a smartphone, response rates were still satisfactory. A few declined to participate in the study because it involved a smartphone in general. It is possible that this group would have responded differently and that the method may be difficult to implement in this subgroup. This study used a sampling density of three per day, which was considered sufficient to capture daily symptom variation and was found to interfere very little with daily activities in a previous study (33). Increasing the sampling density would make data more representative. However, this would simultaneously increase participant burden. It is possible that answering the same questions during the preceding 28 days could influence the response to the retrospective survey simply because the participant notices which response option is most frequently used. Perhaps this effect has made retrospective ratings more similar to momentary ratings and the measurement differences less pronounced. Figure 1 shows a trend toward improved quality of life during the 28 days. This was expected, since treatment had been initiated before or during the 28 days in the majority of patients. Relative to a situation with stable thyroid function and thus presumably stable quality of life, perhaps this change over time puts additional strain on memory, thereby making retrospective ratings less reliable. However, this is a real-life challenge that a measurement method should be able to withstand. The tested momentary items mainly involve specific disease manifestations, as well as general symptoms assessable at all times. In contrast, rare symptoms and context specific items (e.g., work function) were not included. For these aspects to be properly represented, the sampling density needs to be very high, illustrating that not all aspects may be appropriate for momentary assessment.

Future research on retrospective and momentary ThyPRO ratings should determine if the measurement differences are dependent on the length of the recall period. Broderick et al. showed that as the reporting period increased, the difference in retrospective and momentary pain ratings increased (18). However, investigations of sleep reports and inference with activities due to pain and fatigue did not show the same tendency (34,35). For surveillance of treatment effect, PROMs should be able to detect and respond to clinically relevant changes. Thus, the responsiveness of both measures should be investigated to find the most suitable measure for this task.

In conclusion, retrospective and mean momentary ThyPRO ratings had high correlations but provided significantly different results. Retrospective ratings were higher on the four tested scales compared to mean momentary ratings. The presence of the peak effect was supported when measuring tiredness but not in the remaining three scales. There was partial support for the presence of an end effect, which was found in two of four scales. The results do not put the validity of retrospective measurements in question. However, the observed differences in scores were of clinically relevant magnitudes. So, the two measures should not be used interchangeably. Research elaborating which method is most responsive to clinical change may determine which method is most suitable for comparative effectiveness and efficiency studies.

Footnotes

Acknowledgments

This study was funded by The Danish Agency for Science, Technology, and Innovation (grant 271-09-0143). The publication was funded by the Agnes and Knut Mørk Foundation. U.F.R.'s research salary is sponsored by a grant from Novo Nordisk Foundation.

Author Disclosure Statement

No competing financial interests exist.

References

Watt

, Hegedus

, Groenvold

, Bjorner

, Rasmussen

, Bonnema

, Feldt-Rasmussen

. 2010. Validity and reliability of the novel thyroid-specific quality of life questionnaire, ThyPRO. Eur J Endocrinol, 162:161–167.

Watt

, Cramon

, Hegedus

, Bjorner

, Bonnema

, Rasmussen

, Feldt-Rasmussen

, Groenvold

. 2014. The thyroid-related quality of life measure ThyPRO has good responsiveness and ability to detect relevant treatment effects. J Clin Endocrinol Metab, 99:3708–3717.

Watt

, Groenvold

, Deng

, Gandek

, Feldt-Rasmussen

, Rasmussen

, Hegedus

, Bonnema

, Bjorner

. 2014. Confirmatory factor analysis of the thyroid-related quality of life questionnaire ThyPRO. Health Qual Life Outcomes, 12:126.

Watt

, Groenvold

, Hegedus

, Bonnema

, Rasmussen

, Feldt-Rasmussen

, Bjorner

. 2014. Few items in the thyroid-related quality of life instrument ThyPRO exhibited differential item functioning. Qual Life Res, 23:327–338.

Watt

, Barbesino

, Bjorner

, Bonnema

, Bukvic

, Drummond

, Groenvold

, Hegedus

, Kantzer

, Lasch

, Marcocci

, Mishra

, Netea-Maier

, Ekker

, Paunovic

, Quinn

, Rasmussen

, Russell

, Sabaretnam

, Smit

, Torring

, Zivaljevic

, Feldt-Rasmussen

. 2015. Cross-cultural validity of the thyroid-specific quality-of-life patient-reported outcome measure, ThyPRO. Qual Life Res, 24:769–780.

Watt

, Bjorner

, Groenvold

, Rasmussen

, Bonnema

, Hegedus

, Feldt-Rasmussen

. 2009. Establishing construct validity for the thyroid-specific patient reported outcome measure (ThyPRO): an initial examination. Qual Life Res, 18:483–496.

Wong

, Lang

, Lam

. 2016. A systematic review of quality of thyroid-specific health-related quality-of-life instruments recommends ThyPRO for patients with benign thyroid diseases. J Clin Epidemiol, 78:63–72.

Watt

, Bjorner

, Groenvold

, Cramon

, Winther

, Hegedus

, Bonnema

, Rasmussen

, Ware

Jr , Feldt-Rasmussen

. 2015. Development of a short version of the thyroid-related patient-reported outcome ThyPRO. Thyroid, 25:1069–1079.

Terwee

, Gerding

, Dekker

, Prummel

, Wiersinga

. 1998. Development of a disease specific quality of life questionnaire for patients with Graves' ophthalmopathy: the GO-QOL. Br J Ophthalmol, 82:773–779.

10.

Terwee

, Gerding

, Dekker

, Prummel

, van der Pol

, Wiersinga

. 1999. Test–retest reliability of the GO-QOL: a disease-specific quality of life questionnaire for patients with Graves' ophthalmopathy. J Clin Epidemiol, 52:875–884.

11.

Terwee

, Dekker

, Mourits

, Gerding

, Baldeschi

, Kalmann

, Prummel

, Wiersinga

. 2001. Interpretation and validity of changes in scores on the Graves' ophthalmopathy quality of life questionnaire (GO-QOL) after different treatments. Clin Endocrinol (Oxf), 54:391–398.

12.

McMillan

, Bradley

, Woodcock

, Razvi

, Weaver

. 2004. Design of new questionnaires to measure quality of life and treatment satisfaction in hypothyroidism. Thyroid, 14:916–925.

13.

McMillan

, Bradley

, Razvi

, Weaver

. 2006. Psychometric evaluation of a new questionnaire measuring treatment satisfaction in hypothyroidism: the ThyTSQ. Value Health, 9:132–139.

14.

Shiffman

, Stone

, Hufford

. 2008. Ecological momentary assessment. Annu Rev Clin Psychol, 4:1–32.

15.

Stone

. 2007. The Science of Real-Time Data Capture: Self-Reports in Health Research. Oxford University Press, Oxford, United Kingdom.

16.

Williams

, Gendreau

, Hufford

, Groner

, Gracely

, Clauw

. 2004. Pain assessment in patients with fibromyalgia syndrome: a consideration of methods for clinical trials. Clin J Pain, 20:348–356.

17.

Stone

, Broderick

, Shiffman

, Schwartz

. 2004. Understanding recall of weekly pain from a momentary assessment perspective: absolute agreement, between- and within-person consistency, and judged change in weekly pain. Pain, 107:61–69.

18.

Broderick

, Schwartz

, Vikingstad

, Pribbernow

, Grossman

, Stone

. 2008. The accuracy of pain and fatigue items across different reporting periods. Pain, 139:146–157.

19.

Redelmeier

, Kahneman

. 1996. Patients' memories of painful medical treatments: real-time and retrospective evaluations of two minimally invasive procedures. Pain, 66:3–8.

20.

Redelmeier

, Katz

, Kahneman

. 2003. Memories of colonoscopy: a randomized trial. Pain, 104:187–194.

21.

Stone

, Schwartz

, Broderick

, Shiffman

. 2005. Variability of momentary pain predicts recall of weekly pain: a consequence of the peak (or salience) memory heuristic. Pers Soc Psychol Bull, 31:1340–1346.

22.

Boesen

, Nissen

, Groenvold

, Bjorner

, Hegedus

, Bonnema

, Rasmussen

, Feldt-Rasmussen

, Watt

. 2018. Conversion of standard retrospective patient-reported outcomes to momentary versions: cognitive interviewing reveals varying degrees of momentary compatibility. Qual Life Res, 27:1065–1076.

23.

Cramon

, Rasmussen

, Bonnema

, Bjorner

, Feldt-Rasmussen

, Groenvold

, Hegedus

, Watt

. 2014. Development and implementation of PROgmatic: a clinical trial management system for pragmatic multi-centre trials, optimised for electronic data capture and patient-reported outcomes. Clin Trials, 11:344–354.

24.

Sorensen

, Watt

, Cramon

, Dossing

, Hegedus

, Bonnema

, Godballe

. 2017. Quality of life after thyroidectomy in patients with nontoxic nodular goiter: a prospective cohort study. Head Neck, 39:2232–2240.

25.

Cramon

, Bonnema

, Bjorner

, Ekholm

, Feldt-Rasmussen

, Frendl

, Groenvold

, Hegedus

, Rasmussen

, Watt

. 2015. Quality of life in patients with benign nontoxic goiter: impact of disease and treatment response, and comparison with the general population. Thyroid, 25:284–291.

26.

Cramon

, Winther

, Watt

, Bonnema

, Bjorner

, Ekholm

, Groenvold

, Hegedus

, Feldt-Rasmussen

, Rasmussen

. 2016. Quality-of-life impairments persist six months after treatment of Graves' hyperthyroidism and toxic nodular goiter: a prospective cohort study. Thyroid, 26:1010–1018.

27.

Winther

, Cramon

, Watt

, Bjorner

, Ekholm

, Feldt-Rasmussen

, Groenvold

, Rasmussen

, Hegedus

, Bonnema

. 2016. Disease-specific as well as generic quality of life is widely impacted in autoimmune hypothyroidism and improves during the first six months of levothyroxine therapy. PLoS One, 11:e0156925.

28.

Stott

, Rodondi

, Kearney

, Ford

, Westendorp

, Mooijaart

, Sattar

, Aubert

, Aujesky

, Bauer

, Baumgartner

, Blum

, Browne

, Byrne

, Collet

, Dekkers

, den Elzen

, Du Puy

, Ellis

, Feller

, Floriani

, Hendry

, Hurley

, Jukema

, Kean

, Kelly

, Krebs

, Langhorne

, McCarthy

, McConnachie

, McDade

, Messow

, O'Flynn

, O'Riordan

, Poortvliet

, Quinn

, Russell

, Sinnott

, Smit

, Van Dorland

, Walsh

, Watt

, Wilson

, Gussekloo

, Group

. 2017. Thyroid hormone therapy for older adults with subclinical hypothyroidism. N Engl J Med, 377:e20.

29.

Cohen

. 1988. Statistical Power analysis for the Behavioral Sciences. Second edition. Lawrence Erlbaum Associates, Hillsdale, NJ.

30.

Bland

, Altman

. 1986. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1:307–310.

31.

Conner

, Barrett

. 2012. Trends in ambulatory self-report: the role of momentary experience in psychosomatic medicine. Psychosom Med, 74:327–337.

32.

Steptoe

, Gibson

, Hamer

, Wardle

. 2007. Neuroendocrine and cardiovascular correlates of positive affect measured by ecological momentary assessment and by questionnaire. Psychoneuroendocrinology, 32:56–64.

33.

Stone

, Broderick

, Schwartz

, Shiffman

, Litcher-Kelly

, Calvanese

. 2003. Intensive momentary reporting of pain with an electronic diary: reactivity, compliance, and patient satisfaction. Pain, 104:343–351.

34.

Broderick

, Schneider

, Schwartz

, Stone

. 2010. Interference with activities due to pain and fatigue: accuracy of ratings across different reporting periods. Qual Life Res, 19:1163–1170.

35.

Broderick

, Junghaenel

, Schneider

, Pilosi

, Stone

. 2013. Pittsburgh and Epworth sleep scale items: accuracy of ratings across different reporting periods. Behav Sleep Med, 11:173–188.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.02 MB

0.00 MB