Abstract
Background:
Limited information is available on the long-term impact of active surveillance (AS) and immediate surgery (IS) on the quality of life (QoL) and psychological status of patients with highly suspicious subcentimeter thyroid nodules.
Methods:
A prospective study was conducted on 752 patients showing highly suspicious subcentimeter thyroid nodules, among whom 584 chose AS and 168 chose IS. All patients underwent at least two assessments regarding their QoL and psychological status, using three questionnaires: Thyroid Cancer-Specific Quality of Life (THYCA-QoL), Hospital Anxiety and Depression Scale (HADS), and European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Core Questionnaire (QLQ-C30). Propensity-score matching (PSM) at a ratio of 3:1 was utilized on patients in the AS and IS groups to mitigate selection bias (504 patients in the AS group and 168 in the IS group). Subsequently, the mixed linear model was used to analyze the QoL data.
Results:
The median time from the initial evaluation to the last follow-up in the AS and IS groups was 24.0 and 14.2 months, respectively. The AS group showed superior QoL outcomes compared to the IS group, mainly manifested in voice (p < 0.001), sympathetic (p = 0.008), throat/mouth (p < 0.001), and problems with scar (p < 0.001) domains, as per the THYCA-QoL questionnaire. Further, the EORTC QLQ-C30 questionnaire highlighted better outcomes in physical function (p = 0.029), role function (p < 0.001), social function (p < 0.001), global health status (p < 0.001), fatigue (p = 0.012), pain (p = 0.028), appetite loss (p = 0.017), and financial difficulties (p < 0.001). Compared to the initial assessment (1 week after surgery), the IS group showed progressive improvements in QoL, especially in voice (p = 0.024), throat/mouth (p < 0.001), physical function (p = 0.004), social function (p = 0.014), nausea and vomiting (p < 0.001), pain (p = 0.006), and appetite loss (p = 0.048) domains as per both questionnaires.
Conclusion:
Patients with highly suspicious subcentimeter thyroid nodules who choose IS tend to experience a poorer long-term QoL compared to those who choose AS. Although the situation may improve over time, certain issues might persist, making AS a favorable option for these patients.
Introduction
The incidence of thyroid cancer is increasing worldwide. 1,2 Papillary thyroid carcinoma (PTC) accounts for over 90% of all thyroid cancers, with papillary thyroid microcarcinoma (PTMC) showing the most significant increase and representing about half of all newly diagnosed PTC cases. 3,4 Given the favorable prognosis of patients with PTMC, several studies have demonstrated the safety and feasibility of active surveillance (AS). 5 –12 However, for many diseases, including cancer, evaluating the quality of life (QoL) and psychological status of the patients is of paramount importance, as they significantly influence treatment decisions and outcomes. Low-risk PTMC is more like a “chronic disease,” and understanding the impact of AS versus immediate surgery (IS) on the QoL of patients with low-risk PTMC is an important aspect of the entire treatment process. However, only a few studies have evaluated the QoL of patients under AS, 13 –15 and most of these studies were cross-sectional, reflecting the status at a specific time point rather than continuously.
For diseases requiring long-term follow-up, it is particularly important to monitor changes in the long-term QoL and psychological status of the patients, which may fluctuate and change over time, potentially affecting treatment outcomes. The American Thyroid Association (ATA) guidelines do not recommend fine-needle aspiration biopsy (FNAB) for highly suspicious thyroid nodules ≤1 cm. Given that the ultrasound malignancy risk stratification of thyroid nodules by the ATA shows good diagnostic performance and plays a key role in the identification and management of thyroid nodules, 16 and considering that FNAB is not generally well accepted in China, this study mainly uses ultrasound for screening patients with highly suspicious subcentimeter thyroid nodules who meet the criteria for AS. The objective is to assess the QoL and psychological status of patients with highly suspicious subcentimeter thyroid nodules who opt for AS or IS, compare the impact of these treatment approaches on their QoL and psychological health, and examine how these factors change over time.
Materials and Methods
Study design and patients
This prospective cohort study, conducted between November 2018 and October 2023, included 752 patients who visited Peking Union Medical College Hospital (PUMCH). These patients, presenting with highly suspicious thyroid nodules, chose either AS (N = 584) or IS (N = 168). The inclusion criteria were as follows: (1) maximum nodule diameter ≤1.0 cm; (2) thyroid nodules classified as highly suspicious for malignancy according to ATA guidelines; 17 (3) no extrathyroidal invasion, clinical lymph node metastasis (LNM), or distant metastasis (DM); (4) nodules not adjacent to critical structures (e.g., recurrent laryngeal nerve, trachea, or esophagus); and (5) completion of at least two QoL questionnaire assessments. The exclusion criteria were: (1) previous thyroid malignancy surgery; (2) refusal to provide informed consent; (3) inability to adhere to long-term treatment with surveillance; (4) family history of thyroid cancer (involving ≥2 family members); and (5) history of mental illness.
Ultimately, 584 patients with AS and 168 patients with IS met the above criteria and were included in the analysis. All operations were performed according to the Chinese guidelines. 18 A detailed flow diagram is presented in Figure 1. This study was approved by the PUMCH Ethics Review Committee, and all patients provided written informed consent before enrollment (ethics approval No. JS-2454). This study has been registered in the National Medical Research Registration Information System (No. MR-11-23-034312).

Flow diagram showing the inclusion and exclusion of patients in the study. LNM, lymph node metastasis.
The primary outcome of this study was to obtain the results of a comparative analysis of QoL and psychological status between patients who underwent AS and IS after propensity-score matching (PSM). The secondary outcome was to obtain the results of a comparative analysis of QoL and psychological status between the overall patients who underwent AS and IS. Additionally, we performed a post hoc secondary analysis to examine the influence of FNAB results and the extent of surgery on QoL.
Questionnaires
The QoL was assessed using the Thyroid Cancer-Specific Quality of Life (THYCA-QoL) questionnaire, 19 the Hospital Anxiety and Depression Scale (HADS) questionnaire, 20 and the European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Core Questionnaire (EORTC QLQ-C30). 21 All treatments, including AS and IS (specifically lobectomy/total thyroidectomy+prophylactic lymph node dissection), were administered under the guidance of Dr. Li Xiaoyi. Assistant Zhao Ya was responsible for assisting patients in completing the QoL questionnaire assessments.
The THYCA-QoL questionnaire includes 24 questions measuring 7 symptom domains: neuromuscular, voice, concentration, sympathetic, throat/mouth, psychological, and sensory symptoms. In addition, it has six individual scales evaluating scar, felt chilly, tingling hands/feet, gained weight, headache, and interested in sex. All responses are scored on four levels: 1 = “not at all,” 2 = “a little,” 3 = “quite a bit,” and 4 = “very much,” translating to a point system ranging from 1 to 4 points. A higher score represents more discomfort and worse QoL caused by that symptom. 19
The HADS questionnaire consists of 14 items, out of which 7 indicate anxiety, and the rest indicate depression. Scores of both the anxiety and depression subscales are categorized as follows: 0–7, no clinically relevant issues (noncases); 8–10, cases that warrant further psychiatric investigation (possible cases); and ≥11, clinical level of anxiety/depression (probable cases). 20
The EORTC QLQ-C30 consists of five functional scales (physical, role, emotional, social, and cognitive), three symptom scales (fatigue, pain, and nausea/vomiting), a global health status scale, and six single-item scales (dyspnea, loss of appetite, insomnia, constipation, diarrhea, and financial difficulties). 21 The initial QoL assessment of patients opting for AS is conducted at the time of enrollment, with subsequent assessments every 6 months. In contrast, patients opting for IS undergo their first assessment 1 week after the operation, with subsequent assessments conducted every 6 months. The number of patients who completed the questionnaires at each time point were shown in Supplementary Table S1.
Statistical analyses
Statistical analyses were conducted using SPSS version 26.0 (IBM, Armonk, NY), GraphPad Prism version 9.1.1, and R studio version 4.1.0. Categorical variables were presented as numbers and percentages, while continuous variables were expressed as mean ± standard deviation for normal distribution, or medians with interquartile ranges for nonnormal distribution. Pearson's chi-square test was used to compare categorical variables. The Student's t-test was used to compare normally distributed continuous variables, and the Mann–Whitney U test was used for nonnormal distributions. The mixed linear model was used to compare changes in QoL over time between the AS and IS groups and to conduct intergroup comparisons. All p-values were two-sided, and a p-value <0.05 was considered statistically significant.
PSM at a ratio of 3:1 was utilized to minimize potential confounding factors and selection bias. After PSM, mixed linear model analyses were conducted on patients from classes A and B (Class A: PSM was conducted on all patients in the AS and IS groups; Class B: PSM was conducted on patients in the AS and IS groups, after excluding those with disease progression or preference change in the AS group and those who underwent a second surgery in the IS group owing to recurrence or other reasons). Propensity scores were derived from eight relevant covariates that may have a potential impact on QoL and psychological status: age, sex, diameter, multifocality, body mass index, education, income, and marital status. The Chi-square test was conducted to evaluate the equilibrium between the groups.
Results
Clinicopathological features
In this study, out of a total of 752 patients, 584 chose AS, and 168 chose IS. The median follow-up duration was 50.1 (range, 5.2–189.4) months for the AS group and 33.7 (range, 5.3–169.3) months for the IS group, and the median time from initial QoL evaluation to the last follow-up was 24.0 (range, 5.2–87.6) months for the AS group and 14.2 (range, 5.1–37.3) months for the IS group. The detailed clinical features of the patients are shown in Table 1. The comparative results of clinicopathological features between patients who underwent delayed surgery with those who underwent IS are shown in Supplementary Table S2. After performing PSM at a 3:1 ratio, a total of 672 (Class A: 504 patients in the AS group and 168 in the IS group) and 656 (Class B: 492 patients in the AS group and 164 in the IS group) patients were included in the analysis. There were no significant differences in terms of baseline characteristics such as sex, age, multifocality, body mass index, education, income, and marital status among classes A and B, respectively (p > 0.05 for all; Table 2).
Clinical Characteristics of Patients Who Are Under Active Surveillance and Those Who Underwent Immediate Surgery
Time from initial questionnaires evaluation to last follow-up, median value (range).
Time from the initial diagnosis of the nodules to the last follow-up.
AS, active surveillance; AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance; BMI, body mass index; FNAB, fine-needle aspiration biopsy; IS, immediate surgery; Tg-Ab, anti-thyroglobulin antibodies; Tpo-Ab, thyroid peroxidase antibodies; TSH, thyrotropin.
Baseline Clinical Characteristics, Before and After Propensity-Score Matching
Class A: PSM was conducted on all patients in the AS and IS groups; Class B: PSM was conducted on patients in the AS and IS groups, after excluding patients with disease progression or those who changed their preference in the AS group, and patients who underwent a reoperation in the IS group.
PSM, propensity-score matching.
Comparison of QoL and psychological status between the AS and IS groups at initial assessment
The results of the comparative analysis between AS and IS groups, in terms of QoL and psychological status at the initial point, are presented in Table 3. The analysis was conducted for both the overall patients and those matched by PSM (Class A and B). The results of above analyses were similar. In the results of class A, there were significant differences in various aspects of the THYCA-QoL questionnaire, including voice (1.162 + 0.338 vs. 1.503 + 0.505, p < 0.001), throat/mouth (1.374 + 0.393 vs. 1.780 + 0.482, p < 0.001), psychological (1.668 + 0.511 vs. 1.554 + 0.499, p = 0.012), sensory (1.644 + 0.558 vs. 1.408 + 0.472, p < 0.001), problems with scar (1.010 + 0.126 vs. 1.760 + 0.730, p < 0.001), and weight gain (1.380 + 0.614 vs. 1.140 + 0.384, p < 0.001) between the groups.
Comparative Analysis of Quality of Life and Psychological Status Between the Active Surveillance and Immediate Surgery Groups at Initial Assessment
Class A: PSM was conducted on all patients in the AS and IS groups. Class B: PSM was conducted on patients in the AS and IS groups, after excluding patients with disease progression or those who changed their preference in the AS group, and patients who underwent a reoperation in the IS group.
EORTC QLQ-C30, European Organization for Research and Treatment of Cancer (EORTC) Quality of Life Core Questionnaire; HADS, Hospital Anxiety and Depression Scale; SD, standard deviation; THYCA-QoL, Thyroid Cancer-Specific Quality of Life.
However, there were no significant differences between the two groups in terms of anxiety and depression scores on the HADS questionnaire. Similarly, the proportion of patients who tested positive for anxiety (12.5% vs. 13.1%, p = 0.847) and depression (12.9% vs. 11.3%, p = 0.584) did not differ significantly between the groups. The EORTC QLQ-C30 revealed significant differences in physical function (87.354 + 12.934 vs. 83.015 + 14.874, p = 0.001), role function (92.112 + 15.732 vs. 83.434 + 20.831, p < 0.001), social function (92.145 + 15.195 vs. 85.022 + 17.643, p < 0.001), global health status (78.171 + 18.851 vs. 69.938 + 21.171, p < 0.001), pain (14.024 + 17.112 vs. 19.646 + 18.172, p < 0.001), appetite loss (13.247 + 21.270 vs. 18.640 + 22.980, p = 0.008), and diarrhea (21.726 + 24.809 vs. 17.051 + 19.632, p = 0.013) between the groups (Table 3).
Mixed linear model analysis of QoL and psychological status in the AS and IS groups
The results of the mixed linear model analysis of QoL and psychological status in the overall patients who underwent AS and those matched by PSM (Class A and B) are shown in Figures 2–4, Supplementary Figures S1–S6, and Supplementary Table S3. Among the overall AS patients, the majority of the indicators evaluated using the THYCA QoL, HADS, and EORTC QLQ-C30 questionnaires showed no significant changes, except for voice (estimate: 0.021, p = 0.009) concentration (estimate: 0.038, p = 0.001), problems with scar (estimate: 0.011, p = 0.014), and gained weight (estimate: 0.036, p = 0.008) in the THYCA QoL questionnaire. Additionally, differences were observed in the emotional function (estimate: 1.359, p = 0.009) domain of the EORTC QLQ-C30 (Figs. 2–4 and Supplementary Table S3). The results of the mixed linear model analysis of QoL and psychological status between AS patients with delayed surgery and those without delayed surgery were presented in Supplementary Table S4.

Mixed linear model analysis of the THYCA-QoL Questionnaire between the AS and IS patients after propensity-score matching (Class A). Parameters:

Mixed linear model analysis of the HADS Questionnaire between the AS and IS patients after propensity-score matching (Class A). Parameters:

Mixed linear model analysis of the EORTC QLQ-C30 between the AS and IS patients after propensity-score matching (Class A). Parameters:
The results of the mixed linear model analysis of QoL and psychological status in the overall patients who underwent IS and those matched by PSM (Class A and B) are shown in Figures 2–4 and Supplementary Figures S1–S6, and Supplementary Table S5. Among the overall IS patients, there were significant changes in QoL in several domains of the THYCA QoL questionnaire, including voice (estimate: −0.053, p = 0.026), concentration (estimate: 0.098, p < 0.001), throat/mouth (estimate: −0.136, p < 0.001), psychological (estimate: 0.050, p = 0.030), sensory (estimate: 0.063, p = 0.009), tingling hands/feet (estimate: 0.054, p = 0.007), and weight gain (estimate: 0.171, p < 0.001). Additionally, differences were observed in the physical function (estimate: 1.810, p = 0.004), cognitive function (estimate: −2.135, p = 0.036), social function (estimate: 2.126, p = 0.014), nausea and vomiting (estimate: −2.015, p < 0.001), pain (estimate: −2.348, p = 0.006), and appetite loss (estimate: −1.953, p = 0.048) domains of the EORTC QLQ-C30 (Figs. 2–4 and Supplementary Table S5).
The results of the mixed linear model analysis of QoL and psychological status between IS patients with second surgery, and those without second surgery were presented in Supplementary Table S6.
Mixed linear model analysis of QoL and psychological status between the AS and IS groups
The results of mixed linear model analysis between the AS and IS groups, in terms of QoL and psychological status at all four time points, are presented in Figures 2–4 and Supplementary Figures S1–S6, and Supplementary Tables S7 and S8. The analysis was conducted for both the overall patient population and the patients matched by PSM (Class A and B), the results of above analyses were similar. In the AS and IS patients from class A, statistical differences were found in the following domains of the THYCA QoL questionnaire: voice (estimate: −0.206, p < 0.001), concentration (estimate: −0.055, p = 0.040), sympathetic (estimate: −0.070, p = 0.008), throat/mouth (estimate: −0.202, p < 0.001), sensory (estimate: 0.117, p < 0.001), and problems with scar appearance (estimate: −0.692, p < 0.001) (Fig. 2 and Supplementary Table S8).
However, no differences were observed in the HADS questionnaire between the two groups (Fig. 3 and Supplementary Table S8). Furthermore, statistically significant differences were observed in the physical function (estimate: 1.567, p = 0.029), role function (estimate: 4.492, p < 0.001), social function (estimate: 4.223, p < 0.001), global health status (estimate: 5.022, p < 0.001), fatigue (estimate: −2.955, p = 0.012), pain (estimate: −2.034, p = 0.028), appetite loss (estimate: −2.631, p = 0.017), constipation (estimate: 4.476, p = 0.001), diarrhea (estimate: 5.044, p < 0.001), and financial difficulties (estimate: −3.352, p < 0.001) domains of the EORTC QLQ-C30 (Fig. 4 and Supplementary Table S8).
Discussion
This study evaluated the QoL and psychological status of patients with highly suspicious thyroid nodules who underwent AS or IS. The results of the mixed linear model analysis of QoL and psychological status in the overall patients and those matched by PSM all demonstrated that AS group had a better QoL than IS group, particularly in areas related to surgical outcomes. After PSM, both during the initial evaluation and the long-term follow-up process, significant differences were observed in certain parameters of the THYCA-QoL (voice, sympathetic, throat/mouth, and problems with scar) and EORTC QLQ-C30 (physical function, role function, social function, global health status, fatigue, pain, appetite loss, and financial difficulties) questionnaires between the AS and the IS groups. Although patients undergoing IS showed gradual improvement in most QoL parameters and no significant changes in anxiety/depression levels, they consistently showed worse outcomes in terms of scar-related issues, role function, social function, global health status, fatigue, and financial difficulties compared to the AS group over long-term follow-up.
The results of the THYCA-QoL questionnaire demonstrated that during the initial assessment, the IS group experienced more discomfort compared to the AS group. This can be primarily attributed to the surgery-induced decline in QoL, as reflected in areas such as voice, throat/mouth, and the presence of neck scars. Similar results were seen in our prior research 22 and in research conducted in Korea and Japan, 13,14 wherein patients who underwent IS had poorer QoL compared to those monitored under AS. However, in the continuous observation process in this study, most of the QoL parameters for the IS group showed considerable improvements, reaching a level similar to that of the AS group 1 year after surgery. This finding is consistent with previous studies indicating that the postoperative QoL of patients with differentiated thyroid cancer can be comparable to that of the general population after a long-term follow-up. 23,24 Nevertheless, it is important to note that the parameter reflecting surgical scars in the IS group remained consistently worse than that of the AS group, indicating that scar-related issues may persist for an extended period.
Although the QoL of patients in the IS group can eventually become comparable to that of patients in the AS group, the recovery time is longer (∼1 year) in the IS group, compared to the stable level observed in the AS group. Furthermore, our analysis of the EORTC QLQ-C30 indicated that patients who underwent IS showed poorer performance in physical function, role function, social function, global health status, fatigue, appetite loss, and financial difficulties compared to those who underwent AS. This suggests that patients who underwent IS may experience more impact on their daily and social activities. Although gradual improvements were noted in certain aspects over time, there remained a difference compared to patients who underwent AS. It is worth exploring whether this long-term difference is related to surgical scars or other factors. However, currently, there is no clear threshold to define clinically significant differences in the assessment of QoL parameters. Therefore, further discussion is needed to determine which patients require additional interventions such as psychological counseling.
The evaluation of psychological status is crucial and can significantly influence treatment decisions and outcomes. Previous research has suggested that patients who underwent AS had better psychological status compared to those who underwent IS. 14,25,26 However, in our study, the results of the HADS questionnaire showed that both the AS and IS groups had lower average anxiety/depression scores during the initial evaluation, and there was no significant difference between the groups. Additionally, the proportion of patients who showed positive results for anxiety/depression was also similar between the groups. While the AS group had slightly higher scores on the THYCA-QoL questionnaire, the difference in scores between the two groups was minimal and fell within the same grade based on the evaluation criteria. Throughout the follow-up period, the psychological status of patients in both the AS and IS groups remained relatively stable, with no statistically significant difference.
Additionally, a post hoc secondary analysis showed that the FNAB results with Bethesda V/VI did not significantly affect the patients' QoL and psychological status compared to those who did not undergo FNAB or had uncertain FNAB results in the AS group (data not shown; see details in Supplementary Table S9). Furthermore, there were no significant differences in the QoL and psychological status among patients who underwent different surgical procedures (lobectomy or thyroidectomy) in the IS group (data not shown, see details in Supplementary Table S10). However, it is important to note that owing to a lack of preoperative psychological assessment data for patients in the IS group in this study, it was not possible to determine whether the preoperative psychological status would have influenced treatment selection or if the postoperative psychological mental state of the patients underwent any change. Further investigation into this aspect is warranted.
This study has some limitations that should be considered. First, most previous studies involved low-risk PTMCs that were confirmed through FNAB. In contrast, this study primarily focused on highly suspicious thyroid nodules that were assessed through ultrasound, which may have a slight misdiagnosis rate for malignancy. Additionally, the duration of this study might have been insufficient to observe differences in the psychological health aspects of the patients. Therefore, future studies with longer follow-up periods are warranted to assess the long-term effects on the psychological status of these patients.
Moreover, the results could potentially be influenced by self-selection bias, given that the patients were not randomly assigned owing to ethical considerations. To accurately evaluate the impact of different treatment options on patients, this study excluded patients who experienced disease progression or changed treatment preferences in the AS group and those who underwent secondary surgery in the IS group. Alterations in these conditions may also influence the QoL and psychological state of the patients. Therefore, further in-depth research and exploration are warranted for a detailed understanding in this regard.
Conclusions
Over a long-term follow-up period, patients with highly suspicious thyroid nodules who choose IS have poorer QoL compared to those who choose AS. Although the situation may gradually improve, challenges related to surgical scars, role function, social function, global health status, fatigue, and financial difficulties persist, continually affecting the daily life and social activities of affected patients. In light of these findings, AS emerges as a reasonable treatment option for patients with highly suspicious thyroid nodules.
Footnotes
Acknowledgment
The authors thank Mr. Yanlong Li for providing statistical guidance.
Authors' Contributions
C.L.: conceptualization, methodology, validation, formal analysis, data curation, visualization, writing—original draft; H.Z.: investigation, formal analysis, data curation; Y. Lu: investigation, formal analysis, data curation; Y.X.: conceptualization, and, methodology, writing—review and editing; Y.C.: investigation, data curation; L.Z.: investigation, data curation; Y.Z.: investigation, data curation; L.G.: investigation, data curation; Y. Liu: conceptualization, methodology, writing—review and editing; H.L.: investigation, data curation; Z.K.: data curation; S.L.: data curation; Q.S.: data curation; X.L.: conceptualization, investigation, methodology, project administration, supervision, funding acquisition, writing—review and editing.
Author Disclosure Statement
The authors declare that this research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Funding Information
This work was supported by the Non-profit Central Research Institute Fund of the Chinese Academy of Medical Sciences (grant No. 2019XK320011) and the National High Level Hospital Clinical Research Funding (grant No. 2022-PUMCH-B-003).
Supplementary Material
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
Supplementary Table S5
Supplementary Table S6
Supplementary Table S7
Supplementary Table S8
Supplementary Table S9
Supplementary Table S10
Supplementary Figure S1
Supplementary Figure S2
Supplementary Figure S3
Supplementary Figure S4
Supplementary Figure S5
Supplementary Figure S6
