Abstract
Purpose:
The aim of this study was to validate the new European Organisation for Research and Treatment of Cancer Quality of Life Thyroid Cancer Module (EORTC QLQ-THY34).
Methods:
We enrolled 437 thyroid cancer patients from 17 countries. One group (n = 303), undergoing treatment or best supportive care, completed the questionnaires at three time points (before therapy [t1], 6 weeks later [t2], and 6 months after t2 [t3]). A second group (survivors ≥2 years after diagnosis, n = 134) completed it at a random baseline time point and a second time 1 week later. We determined internal consistency (using Cronbach's alpha), the scale structure (with confirmatory factor analysis), and discriminant validity (using known-group comparisons). Group 1 data were used to assess responsiveness and group 2 data to determine test-retest reliability using intra-class correlations (ICC).
Results:
All 34 items fulfilled the criteria to be kept in the questionnaire. Cronbach's alpha was >0.70 in 8 of the 9 multi-item scales. All standardized factor loadings exceeded 0.40, confirming the proposed scale structure. The ICC was >0.70 in all scales expressing good test-retest reliability. Differences in scale scores between patients with different histology were >5 points in all scales. In all but one of the pre-specified scales (Dry Mouth), changes over time were ≥|4| points between at least two time points.
Conclusion:
The EORTC QLQ-THY34 with its 9 multi-item and 8 single-item scales is a reliable and valid tool to measure quality of life in thyroid cancer patients and can be used in future trials and studies.
Introduction
When evaluating the effect and potential benefit of new treatments in oncology, the patients' perspective should be included. This can be done by administering well-tailored questionnaires asking about their health-related quality of life (QoL) in a structured way. For this purpose, a number of reliable and valid instruments exist, for example, the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Core Instrument (EORTC QLQ-C30) 1 or the Functional Assessment of Cancer Therapy 2 that is used worldwide. 3,4
However, according to a recent systematic review, 5 there are only a handful of disease-specific instruments for thyroid cancer patients, 6 –8 and, to date, none has been tested for responsiveness (i.e., the instrument's ability to detect changes in QoL over time). This can lead to an under-reporting of problems that are specific to this group of patients, for example, side-effects of radioactive iodine (RAI) treatment or hormonal imbalances. 9 –12
Therefore, the EORTC Quality of Life Group (QLG) decided to develop a new instrument for this purpose, starting with an extensive phase I ensuring content validity, 13 followed by phase II in which items were generated with a response format (4-point Likert scale) and time frame (1 week), same as the EORTC QLQ-C30. In phase III, the new thyroid cancer module (EORTC QLQ-THY34) was pilot-tested internationally. 14
Finally, in the present work, its psychometric properties were tested prospectively in a large group of thyroid cancer patients that was clinically, culturally, and linguistically diverse. This phase IV validation study included two different groups of patients for different purposes: The first one, aiming at measuring the instrument's sensitivity to change (“responsiveness”), included patients who were about to undergo initial or further treatment or best supportive care. The second group, aiming at assessing test-retest reliability, included survivors in whom no rapid changes in QoL were expected based on their clinical situation. A patient representative was involved in the entire module development process. The results of the final phase IV are presented herein.
Methods
Target population and patient selection criteria
The target population for the Thyroid Cancer Module is patients with all types of thyroid cancer histology (papillary, follicular, Hurthle-cell, poorly differentiated, medullary, anaplastic, and mixed).
Two groups of patients were enrolled into the validation study: those just about to receive initial or further treatment or best supportive care (
Patients had to fulfill the following criteria to be eligible: cytologically or histologically verified thyroid cancer (ICD-10, C73), ability to understand and complete the questionnaires (language proficiency and cognitive functioning as judged by the local study coordinator on inclusion), age 16 years or older, and written informed consent. Patients with second malignancies (i.e., any ICD-10 C diagnosis except non-melanoma skin cancer) and patients who had been included in phase I–III of the module development could not participate.
Sampling
Participants were enrolled following a sampling matrix capturing histology and treatment type to ensure that the most frequent tumor types and treatment protocols were represented in the study, and that each cell of the matrix had enough observations for reliable statistical tests. 15 Participants could be approached at the inpatient or outpatient setting.
Data collection and study design
Eligible patients were informed about the study and given a Patient Information Sheet in the local language, which included written information and an invitation to participate in accordance with ethical and governance requirements of each participating center. Institutional Review Board Approval: The Ethics Committee of Rhineland-Palatinate Medical Association was responsible for the study protocol review for the principal investigator and approved it with reference number 837.406.17 (11240).
The patients in
Patients could be enrolled either after initial diagnosis or after the diagnosis of residual or recurrent disease. They were approached again 6 weeks after the first day of treatment (t2), and 6 months after the second time point (t3).
Data collection took place before the patient was informed about any new results of tests or clinical examinations. A window of 1 week before and 1 week after the exact date of t2 and a window of 2 weeks before and 2 weeks after the exact date of t3 were allowed for data collection. The exact dates of t1, t2, and t3 were documented on the case report forms.
The survivors in
Data collection at t1 was usually performed in person. At the follow-up time points, whenever required, questionnaires could be mailed out to participants or could be completed via telephone interviews or online. A system for electronic data capture was established using the Computer Based Health Evaluation System. 16 The method of questionnaire administration was documented on the case report form.
Instruments
Patients received two questionnaires at each measurement point, the EORTC QLQ-C301 and the EORTC QLQ-THY34. 14 The latter is protected by copyright by the EORTC (It is included in the Supplementary Data for reference.).
At t1, they also completed a debriefing questionnaire assessing patient acceptability.
Statistical analysis
Data analysis was conducted in accordance with the EORTC Module Development Guidelines 15 and with the recommendations of the International Society of Quality of Life. 17
Cronbach's alpha was computed for the hypothesized scales, elaborated in phase III of this module's development. 14 If indicated by this step, the scale structure was changed. The threshold for Cronbach's alpha was set at 0.70. 17
We had defined a priori a list of criteria for items to be included or excluded from the questionnaire. An item should be kept if five out of the following seven criteria are fulfilled: mean score >1.5 over all groups of patients, neither floor nor ceiling effects (no floor effect defined as responses three and four >10%, no ceiling effect defined as responses one and two >10%), full range of possible responses used, percentage of patients saying the item is difficult to understand <5%, percentage of patients saying the item is upsetting <5%, percentage of patients completing the item >95%, Cronbach's alpha of hypothesized scale >0.65.
We performed confirmatory factor analysis to examine the scale structure of the EORTC QLQ-THY34 and calculated standardized factor loadings for each item with regard to the corresponding scale, whereby loadings >0.40 were considered adequate. 18 We evaluated the model-data-fit with the Tucker-Lewis Index (TLI) and the Comparative Fit Index (CFI), with both indices considered to signpost good fit if they exceed 0.9519 as well as with the Root Mean Squared Error of Approximation (RMSEA), which should be below 0.06. 19
According to the guidelines of the EORTC QLG, 15 scale scores were constructed by summing up all item responses of that scale, and then this raw score was transformed from 0 to 100. If at least half of the items in a particular scale were not available, no score was calculated for that scale. We then calculated the range, mean and standard deviation, median and interquartile range for all scales, and the percentage of missing values per scale.
Convergent and divergent validity were investigated by correlating the scales of the EORTC QLQ-C30 and the EORTC QLQ-THY34.
To measure discriminant validity, the method of known-group comparison was used, that is, the mean scores of the EORTC QLQ-THY34 scales of relevant clinical subgroups (different histology; survivors vs. patients undergoing treatment) were compared by calculating the delta between the groups.
Responsiveness (sensitivity to change) was determined by calculating the mean changes between t1, t2, and t3 in Group 1, stratified by treatment group and focusing on the following issues: Voice, Shoulder Functioning, and/or Discomfort in Head and Neck due to surgery; Dry Mouth due to RAI; Cramps, Exhaustion, and/or Voice due to TKI. Our assumption was that QoL would decrease between t1 and t2 and increase between t2 and t3 in the pre-specified domains. However, changes were only to be described with no defined thresholds.
Test-retest reliability was ascertained by computing intra-class correlation (ICC) coefficients between t1 and t2 in Group 2.
Data entry, cleaning, and analysis were conducted using STATA (StataCorp. 2017. Stata Statistical Software: Release 16; StataCorp LP, College Station, TX).
Ethical approval
All procedures performed were in accordance with the ethical standards of the institutional research committees and with the 1964 Helsinki declaration and its later amendments of comparable ethical standards. All sites obtained ethical approval in accordance with regional and national requirements. Approval number from the principal investigator's institution: 837.406.17 (11240).
Consent to participate
Written informed consent was obtained from all individual participants included in the study.
Results
Sample characteristics
Overall, 459 patients were contacted to participate (324 in Group 1, 135 in Group 2). Of them, 21 had follicular cytology that turned out to be adenomas on histological examination after surgery and their questionnaires were thus excluded, according to the study protocol, leaving 438 patients (303 in Group 1, 135 in Group 2) of whom 437 participated at least once, one patient declined at t1 (see flowchart in Fig. 1).

Patient flow through the study.
In Group 1, 220 participated at all time points, 55 twice, and 20 only once (at t2, 4 patients had died already, 10 declined, and 19 could not be contacted; at t3, another 10 patients had died, another 10 declined, and another 24 could not be contacted). In Group 2, 124 participated at all time points and 10 only once (6 declined and 4 could not be contacted any more).
The 437 participants came from 21 institutions in 17 countries (Australia n = 21, Austria n = 23, Belgium n = 3, Brazil n = 18, Cyprus n = 13, Germany n = 21, Greece n = 28, Italy n = 70, Japan n = 15, Jordan n = 46, Norway n = 26, Portugal n = 17, Spain n = 20, Sweden n = 28, Switzerland n = 67, The Netherlands n = 15, and the UK n = 6). The EORTC QLQ-THY34 was thereby validated in 12 languages, sometimes with country-specific variations: Arabic for Jordan, Dutch, English, French for Europe, German, Greek, Italian, Japanese, Norwegian, Portuguese for Portugal, Portuguese for Brazil, Spanish for Spain, and Swedish.
Clinical and demographic characteristics of the sample are displayed in Table 1.
Demographic and Clinical Characteristics of the Participants (n = 437)
Group 1, patients about to start treatment; Group 2, survivors.
“Less than total thyroidectomy” refers to lobectomy with or without resection of the thyroid isthmus, as well as to more limited surgery, such as resection of only the isthmus or of a thyroglossal duct remnant, and others.
“Hypoparathyroidism,” “Type of surgery,” and “Other treatment” all time points combined.
Usually taken from t2. If not available, then t1 or t3.
At t1.
ATA, American Thyroid Association; TKI, tyrosine kinase inhibitors; UICC, Union Internationale Contre le Cancer.
Item characteristics
Table 2 describes the various item characteristics. For each item, it is reported how many of the pre-defined criteria were fulfilled. Based on the tests for internal consistency, the scale structure was slightly changed in comparison to the hypothesized scale structure developed in phase III. Item 46 (“Have you felt restless or agitated?”) worked better with the other items of the scale Treatment- and Disease-related Worry than with the item 47 “Have you had a rapid heartbeat?” Hence, it was decided to use item 47 as a single-item scale and combine item 46 with the worry scale.
Item Characteristics
n.a., not applicable (single item scale). If not further specified, the data presented are for t2.
All items but one fulfilled at least five of the seven criteria, and item 55 fulfilled four criteria. While overall, items 54 and 55 taken together as a scale had suboptimal Cronbach's alpha, it was adequate in certain subgroups of patients for whom this domain is most relevant (see next paragraph). It was, therefore, decided to keep item 55 and to keep it together with item 54 in one scale. In consequence, the EORTC thyroid cancer module consists of 34 items with 9 multi-item and 8 single-item scales.
Scale characteristics, reliability, and construct validity
The scales resulting from the items exhibit very good characteristics (Table 3). For all of them, the entire range was used and missing values were very few.
Scale Characteristics
IQR, interquartile range; SD, standard deviation.
The internal consistency coefficients of the various scales were all above 0.70 except for one scale (Table 4). This scale with suboptimal Cronbach's alpha (Tingling and Numbness) had good values in certain groups of patients, for example, an average of 0.68 in patients with hypoparathyroidism, 0.70 in patients with medullary thyroid cancer, and 0.87 in patients with anaplastic thyroid cancer. The test-retest reliability was very good in all scales.
Internal Consistency, Test-Retest Reliability, and Construct Validity
Cronbach's alpha presents the average over all time points.
ICC, intra-class correlation coefficient between t1 and t2 in Group 2.
Standardized factor loadings were all above 0.40 (Table 4). The CFI was 0.95, the TLI was 0.94, and the RMSEA was 0.05, supporting the assumption that the constructs underlying the scales are valid.
Convergent and divergent validity
The correlations between EORTC QLQ-C30 and EORTC QLQ-THY34 scales indicate that related concepts are, indeed, associated whereas others are not (Supplementary Tables S1–S5). For example, the Fatigue scale of the EORTC QLQ-C30 correlates highly (e.g., 0.79 at t1, 0.74 at t2, 0.84 at t3) with the Exhaustion scale of the EORTC QLQ-THY34 (convergent validity) but not entirely, which shows that the measured constructs are similar but not the same. In other words: the Exhaustion scale of the EORTC QLQ-THY34 captures partially different aspects than the fatigue scale of the EORTC QLQ-C30.
Evidence for divergent validity is, for example, that the correlations of different constructs, such as Shoulder Functioning (measured with the EORTC QLQ-THY34) and Nausea and Vomiting (measured with the EORTC QLQ-C30), are weak.
Discriminant validity
There are differences of ≥5 points in nearly all scales when we compare the scale scores of patients under treatment with survivors (Table 5). Similarly, the comparison of the scale scores regarding different histology types indicated differences of ≥5 points in all of the scales at least at one time point and one comparison between patient groups (Table 6), showing that the module is able to differentiate between clinically distinct groups.
Score Differences Between Patients Scheduled for Treatment (Group 1) and Survivors (Group 2)
Displayed are the mean values per group at t2 and the delta between them.
Score Differences (Delta) Regarding Histology
In Group 2 (survivors ≥2 years after diagnosis), no patients with anaplastic thyroid cancer were enrolled because these patients rarely survive for longer than 6 months.
ATC, anaplastic thyroid cancer; DTC, differentiated thyroid cancer; MTC, medullary thyroid cancer;
Patients with total thyroidectomy indicated more treatment- and disease-related worry than those with less than total thyroidectomy. For example, at t1, the respective average scores were 35 versus 27 in Group 1; at t2, it was 33 versus 20.
Responsiveness
In all but one of the pre-specified scales (Dry Mouth), changes were ≥|4| points between at least two time points (Table 7), indicating the instrument's ability to detect changes over time.
Changes in Scores for Different Treatment Groups Between Time Points
RAI, radioactive iodine.
Time and help needed to complete the questionnaire
The time required to complete the questionnaire (core questionnaire and module together) was <10 minutes in 42% of all participants; 34% needed 11–15 minutes, 14% 16–20 minutes, 6% up to 30 minutes, and 1% more than 30 minutes. For the remaining 3% (n = 12), the time needed was not documented. The majority of the patients (87%) completed the questionnaire with paper and pencil, 13% electronically. Those who used paper and pencil were somewhat quicker in completing the questionnaire, with 45% versus 26% needing <10 minutes (p = 0.07).
Three-quarters (74%) of patients required no help to complete the questionnaire, 17% needed help with understanding the questions, and 7% needed other help (for example, reading the question out loud because they had forgotten their glasses). For the remaining 2%, no information was available about any help needed. The proportion of patients who needed help with understanding differed by histology group (32% in those with anaplastic cancer, 20% medullary, 16% differentiated). It was unrelated to the cognitive functioning of the participants, as measured with the EORTC QLQ-C30.
Some patients spontaneously commented on the questionnaire, for example, “easy questionnaire to answer,” or “the questions are clear.”
Discussion
The incidence of thyroid cancer has increased over the past years, mainly driven by papillary thyroid cancer, 20 while treatment varies considerably according to histology and stage. To capture patient-reported outcomes in clinical trials, reliable and valid measures are necessary. Such instruments should cover all relevant QoL issues but be as short as possible, and, ideally, be available in many languages.
The EORTC QLG together with the EORTC Head and Neck Cancer Group and the EORTC Endocrine Tumours Group decided to develop such a tool. This instrument was now field tested in the last phase of its development.
Results suggested retaining all the items that were developed in phases I through III. 13,14 The instrument is therefore more elaborate than the City of Hope QoL thyroid questionnaire, 7 with 30 items. The latter was carefully developed with qualitative interviews but only in the United States, and all study participants had a college degree. Subsequently, Dagan et al. shortened that questionnaire to 15 items and named it TQOLI, 21 but its response categories vary, making it more difficult for patients to complete it.
The Thyroid Module of the MD Anderson Inventory comprised of 25 items. 22 It was developed in the United States, without the participation of other countries. Another questionnaire, the ThyPRO, 23 was designed for benign thyroid disease only; thyroid cancer patients were explicitly excluded during the development and validation.
The THYCA-QoL was developed to be used in conjunction with the EORTC QLQ-C30. 6 It contains 24 items and is, therefore, somewhat shorter than our instrument. However, it was developed only in The Netherlands, making it more difficult to apply to other countries. Moreover, no patients with anaplastic thyroid cancer were included in its development.
The EORTC QLQ-THY34 scales exhibit very good characteristics. For all of them, the entire range was used and missing values were infrequent. The tests provide evidence for the reliability and validity of the instrument. However, it should be kept in mind that the internal consistency for the scale Tingling and Numbness is good only in certain groups of patients, for example, those with anaplastic thyroid cancer or those diagnosed with hypoparathyroidism. Researchers using the EORTC QLQ-THY34 are, therefore, advised to check whether internal consistency is sufficient in their sample, and, if not, then treat the items as independent.
Regarding responsiveness, it should be noted that some changes were not in the direction that we had expected. However, this is not evidence for poor psychometric properties of the EORTC QLQ-THY34 but rather indicates that the temporal evolution of QoL in patients with thyroid cancer is not yet fully understood. In hindsight, the results can be explained rationally from a clinical point of view.
For example, cramps related to treatment with TKI can occur several weeks or months after treatment initiation. Hence, the observed worsening at t3 as compared with t2 is plausible. In other words, these findings illustrate the need for an instrument such as the EORTC QLQ-THY34 to study changes in QoL over time in thyroid cancer patients in a systematic manner (which goes beyond the scope of the present study). The results of the test-retest reliability underline that the observed changes are not simply a matter of random fluctuation but that, most likely, real changes were captured. In conclusion, the EORTC QLQ-THY34 is responsive, that is, changes in thyroid cancer-specific QoL can be measured in future clinical studies using this instrument.
Regarding the target group of this module, we deliberately included patients with non-differentiated thyroid cancers. A disadvantage of this approach is that some items might not be relevant for all patients in a study; for example, questions related to RAI are not relevant for patients with medullary or anaplastic thyroid cancer. The advantage, however, is that the same instrument can be used for different studies and results can then be compared more easily.
This approach had also been used for the EORTC head and neck cancer module, which also includes various types of head and neck cancer, such as laryngeal, oropharyngeal, or salivary gland cancer. 24 –27 Conversely, for gynecological cancers, separate modules were developed for cervical, 28 ovarian, 29 endometrial, 30 and vulva 31 cancer. Our decision to develop a single module was also influenced by the much lower prevalence of medullary and anaplastic thyroid cancer compared with differentiated thyroid cancer.
These differences in the disease occurrence also led to one of the limitations of our study, namely that the number of patients with certain histologies, for example, anaplastic thyroid cancer or Hurthle cell cancer, is relatively small, making the results in these groups of patients more uncertain.
Another limitation of our study is that patients of some countries with high incidence rates of thyroid cancer, such as Korea, Israel, Turkey, France, or China (
However, there were several colleagues from China who approached us and asked to use the EORTC QLQ-THY34 for their own studies after phase III was completed. As the questionnaire is already translated into Mandarin and Cantonese, this was possible. Hence, it is likely that validation studies for these languages will be published in due course.
A challenge of our study had been the COVID-19 pandemic, which made it more difficult to approach patients and include them in the study. The accrual time period was lengthened due to this reason. However, it did not lead to a biased selection of patients.
In conclusion, the EORTC QoL module for thyroid cancer consisting of 34 items is reliable and able to differentiate between clinically relevant groups. It also captures changes over time, and its item and scale characteristics are very good. There are no items that patients or survivors found irrelevant.
The instrument is ready for use in clinical research and available in the following languages: Arabic (Egypt), Arabic (Israel), Arabic (Jordan), Bengali, Bulgarian, Chinese Cantonese (Hong Kong), Chinese Mandarin ( = simplified Chinese), Croatian, Danish, Dutch, English, Finnish, French (Canada), French (Europe), German, Georgian, Greek, Gujarati, Hebrew, Hindi, Hungarian, Italian, Japanese, Kannada, Korean, Malayalam, Marathi, Norwegian, Polish, Portuguese (Portugal), Portuguese (Brazil), Punjabi, Romanian Serbian, Russian, Spanish (Argentina), Spanish (Chile), Spanish (Mexico), Spanish (Spain), Spanish (United States), Swedish, Tamil, Telugu, Thai, Turkish, Ukrainian, and Urgu (India).
The module together with the scoring instructions can be obtained from the EORTC (
Data Availability Statement
The data of this study are stored in the EORTC data repository and can be accessed by other researchers.
Footnotes
Acknowledgments
The authors would like to acknowledge the work of colleagues who helped with data collection: Gisele da Rocha Santa, Micha Pilz, Hisayo Doi, and Emily Dickson; for setting up this study in local institutions: Cláudia Áraújo.
Authors' Contributions
S.S., G.S., A.A.-I., M.P., I.I., A.A.Ø., E.H., L.D.L., E.G., J.Ig., S.J.J., N.K., G.I., O.H., L.M.,
Author Disclosure Statement
S.S. has received honoraria from Lilly for reviewing papers for their Quality of Life Award and from Eisei for advice in writing a paper, outside of the submitted work. N.K. reports honoraria from ONO PHARMACEUTICAL, Bristol Meyers Squibb, Merck Biopharma, Astra-Zeneca, Merck Sharp & Dohme, Eisai, Bayer, and Chugai Pharmaceutical, all outside the submitted work. M.P. reports personal fees from Meeting&Words and from Hinovia S.r.l., outside the submitted work. G.S. has received research support in the form of donations from Merck Sharp & Dohme, IBSA, and Alpha-Sigma, which partially defrayed costs associated with recruiting patients in this study. M.B. has received speaker fees from Lilly and Takeda outside of the submitted work. D.F. reports speaker fees and honoraria for advisory boards of Roche, Lilly, Ipsen, Eisai, Sanofi, and Bayer all outside the submitted work.
All authors have a special interest in the quality of life in thyroid cancer patients and survivors. All other authors declare that they have no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.
Funding Information
This study was funded by the European Organisation for Research and Treatment of Cancer, Quality of Life Group (No. 002/2017). The EORTC Quality of Life Group business model involves charges for commercial companies using EORTC instruments. Academic use of EORTC instruments is free of charge.
Supplementary Material
Supplementary Data
Supplementary Tables S1
Supplementary Tables S2
Supplementary Tables S3
Supplementary Tables S4
Supplementary Tables S5
