Abstract
Background:
The value of serum thyroglobulin/antithyroglobulin (Tg/antithyroglobulin antibody [ATg]) for papillary thyroid carcinoma (PTC) surveillance after lobectomy was investigated. We aimed to examine the association between postlobectomy serum Tg/ATg and PTC structural recurrence and define applicable values for stratification.
Methods:
PTC patients who underwent lobectomy with adequate serum Tg/ATg data during 2000–2014 were selected. Predictive classifiers of recurrence using random forest were established combining different variables related to serum Tg (ATg-negative patients) or ATg (ATg-positive patients). Cutoff values were determined with receiver operating characteristic curves when applicable. Kaplan–Meier curve and Cox regression were performed to examine the predictive value of elevated Tg/ATg.
Results:
Of 1451 patients enrolled, 66 (6.3%) and 26 (6.5%) patients in the ATg-negative group (n = 1050) and ATg-positive group (n = 401) developed recurrence. The established classifier of serum Tg (n = 1050) showed a favorable association with recurrence (AUC = 0.81), while serum ATg did not (AUC = 0.72). The optimal cutoff values of the first Tg (FTg, measured 6–12 months after lobectomy) and last Tg (LTg, measured most recently) were 5.3 and 11.0 ng/mL, respectively. Elevated LTg patients had significantly higher recurrence rates than normal LTg patients (23.5% vs. 4.4%, p < 0.05). Patients with elevated FTg had significantly lower recurrence-free survival rates than patients with normal FTg in all ATg-negative patients, low-risk patients, and intermediate- to high-risk patients (according to the American Thyroid Association initial risk stratification) (n = 1050, 583, and 467, all p < 0.05). Multivariate analysis indicated patients with elevated FTg had twice the recurrent risk compared with those with normal FTg (hazard ratio = 2.052).
Conclusions:
Postlobectomy serum Tg has favorable value for predicting recurrence in PTC patients, and reasonable thresholds could identify patients at higher risk for recurrence during follow-up.
Introduction
Differentiated thyroid carcinoma (DTC) arises from thyroid follicular epithelial cells and accounts for >95% of all thyroid malignancies. DTC has partial functions of the normal thyroid follicular cells, with functions such as thyroglobulin (Tg) secretion and iodine uptake, which have been applied in its diagnosis and treatment (1). Serum Tg measurements are one of these common applications because they represent the amount of released Tg to a certain degree (2). However, as serum Tg levels can be elevated in most thyroid diseases and are insensitive and nonspecific for thyroid cancer, their current application is limited to surveillance after total thyroidectomy (TT) and radioactive iodine (RAI) ablation rather than the evaluation of patients with residual thyroid tissue (3,4).
Papillary thyroid carcinoma (PTC) is the most frequent subtype of DTC with the most indolent biological behavior (2). With increasingly conservative management approaches to the treatment of PTC, lobectomy is considered a reliable option for properly selected patients (1,5,6). Despite the low mortality risk, precise and reliable life-long surveillance is still important for these patients. Some previous studies have suggested that higher serum Tg levels can similarly indicate recurrent disease in postlobectomy patients, which facilitates a response-to-therapy assessment for dynamic evaluation in patients after lobectomy (4,7,8). However, the prognostic value of postlobectomy serum Tg has always been questioned, as evidence is lacking that an association could be established between serum Tg and recurrence when residual normal thyroid exists (2,9,10).
Theoretically, as already mentioned, the situation can be more complicated after lobectomy, and many factors related to residual thyroid tissue may influence serum Tg levels. Thus, to comprehensively evaluate the prognostic value of postlobectomy serum Tg, in addition to the basal Tg level, the dynamic trends and range of Tg levels as well as serum antithyroglobulin antibody (ATg) when present should also be examined to define their potential association with recurrent disease.
We designed this study using machine learning to comprehensively examine the association between different variables related to postlobectomy serum Tg/ATg and the structural recurrence of PTC and to define a feasible measurement of these serum markers for the risk stratification of patients with PTC after thyroid lobectomy.
Materials and Methods
After approval by the institutional review board, this retrospective cohort study included patients who underwent lobectomy for PTC at the National Cancer Center, Cancer Institute and Hospital, Chinese Academy of Medical Sciences, between 2000 and 2014. Patients with preoperative evidence of distant metastasis, positive surgical margins, or incomplete Tg or ATg data, and elevated thyrotropin (TSH) levels (>4 mIU/L) during follow-up were excluded. If preoperative ultrasound indicated the presence of suspicious thyroid nodules in the contralateral lobe, fine-needle aspiration cytology was performed. Only patients with unilateral malignancy were included. The patient demographics and characteristics of the tumors were obtained from the institutional database.
The size of the tumor was determined based on pathology reports. The presence of minor extrathyroidal extension and gross extrathyroidal extension (gETE) was identified in pathological reports and surgical findings, respectively. Hashimoto thyroiditis (HT) was diagnosed when the pathology report documented lymphocytic thyroiditis. The patients were staged according to the American Joint Committee on Cancer staging system (8th edition). The initial risk stratification was performed according to the 2015 American Thyroid Association (ATA) guidelines. The primary endpoint of the study was structural recurrence. Regional and local recurrences were proven by either cytology or pathology, while distant metastasis was determined by imaging. Recurrence-free survival (RFS) was calculated from the time of surgery for PTC to recurrence.
All patients underwent TSH suppression therapy after surgery. Serum Tg and ATg levels were measured every 6 months within the first 5 years and every 12 months thereafter. Owing to the different intervals between surgery and the last follow-up, the study evaluated no more than five follow-up data points based on an equal time interval of a patient, which was no more than two years. Every patient had at least two follow-up data points. Serum Tg and ATg concentrations were determined using an automated electrochemiluminescence immunoassay (Cobas e 601; Roche Diagnostics, Mannheim, Germany). The minimum detectable Tg and ATg concentrations were 0.04 ng/mL and 10 IU/mL, respectively. Positive ATg was defined as exceeding 60 IU/mL. The study cohort was divided into ATg-negative and ATg-positive groups, in which serum Tg and ATg levels were analyzed.
Serum Tg (ATg) levels during follow-up were defined as follows: FTg (FATg), PTg (PATg), and LTg (LATg), representing the first, penultimate, and last Tg (ATg) levels measured in the first (first 6–12 months after surgery), penultimate, and last follow-up, respectively; deviation was considered the remaining Tg (ATg) level that was out of range formed by FTg (FATg) and LTg (LATg). In the ATg-negative group, the predictive classifier of recurrence was based on random forest (RF) and incorporated the following variables: FTg, LTg, LTg-FTg (difference of LTg subtracting FTg); LTg/FTg (ratio of LTg divided by FTg); LTg-PTg (difference of LTg subtracting PTg); LTg/PTg (ratio of LTg divided by PTg); peak (deviation was 20% higher than the lightest one between FTg and LTg); valley (deviation was 20% lower than the lower one between FTg and LTg); high deviation (difference of the highest deviation subtracting the larger one between FTg and LTg); low deviation (difference of the smaller one between FTg and LTg subtracting the smaller deviation); and standard deviation (SD).
Similarly, the variables of the ATg-positive group replaced the Tg data with the ATg data. The enrolled patients were randomly assigned to a training cohort and a testing cohort to build and evaluate the classifier. Data imbalance was handled by oversampling. To assess classifier discrimination, the area under the receiver operating characteristic (ROC) curve (AUC) was calculated. Continuous and categorical variables were compared across groups using the t-test and the Pearson χ 2 test, respectively. RFS rates were analyzed using Kaplan–Meier curves and the log-rank tests. Univariate and multivariate Cox-proportional hazards models were used to analyze the relationship between clinicopathological variables and RFS. Variables with p < 0.1 in the univariate analyses were selected for the multivariate analyses. All analyses were conducted using the Scikit-learn version 0.22 package in Python (Python Software Foundation). p < 0.05 was considered statistically significant.
Results
Baseline characteristics of the patients
Between 2000 and 2014, a total of 1451 patients (1050 in the ATg-negative group and 401 in the ATg-positive group) were enrolled based on the inclusion and exclusion criteria. The median ages of the ATg-negative group and ATg-positive group were 42 and 41 years (range, 8–77 and 11–71 years), respectively. The female/male ratios were 2.5 and 11.5, and the median tumor sizes in the resected lobe were 1.0 and 0.8 cm, respectively (Table 1). Compared with ATg-negative patients, ATg-positive patients were more likely to be female and have HT.
Study Cohort Characteristics
ATA, American Thyroid Association; ATg, antithyroglobulin antibody; gETE, gross extrathyroidal extension.
Dynamics of serum Tg and ATg levels during follow-up
With a median follow-up of 72.0 (range, 14–135) months and 71.7 (range, 12–124) months, 66 (6.3%) and 26 (6.5%) patients in the ATg-negative group and ATg-positive group experienced recurrence, including 9 and 1 distant metastasis, respectively. None of the patients died from PTC. In the ATg-negative group, the mean FTg and LTg levels were 12.3 ± 54.4 ng/mL (median, 3.4 ng/mL; range, 0–872 ng/mL) and 9.3 ± 59.1 ng/mL (median, 2.5 ng/mL; range, 0–1356 ng/mL), respectively. A mean TSH ≤0.5 mIU/L, 0.5 mIU/L <TSH ≤2 mIU/L, and 2 mIU/L <TSH ≤4 mIU/L were observed in 320 (30.5%), 624 (59.4%), and 106 (10.1%) patients, respectively. The Tg levels stratified by TSH are shown in Figure 1A. During the follow-up period, Tg declined >20% in 45.7% of patients, remained stable in 24.6%, and increased in 29.7%.

Mean Tg stratified by TSH in the ATg-negative group (
Patients with recurrence tended to have a higher LTg than patients without recurrence (59.5 ± 195.6 ng/mL vs. 6.6 ± 29.1 ng/mL, p < 0.001), while no significant difference was found in FTg. The overall trend in the serum Tg of patients with recurrence was relatively stable and rose sharply before recurrence but decreased in patients without recurrence (Fig. 1B). In the ATg-positive group, the mean FATg and LATg levels were 430.3 ± 729.8 IU/mL (median, 220.0 IU/mL; range, 10–4000 IU/mL) and 340.2 ± 732.6 IU/mL (median, 112.7 IU/mL; range, 10–4000 IU/mL), respectively. During the follow-up period, ATg declined >20% in 54.1% of patients, remained stable in 23.7%, and increased in 22.2%. There was no significant difference between FATg or LATg levels in patients with or without recurrence (432.7 ± 629.1 IU/mL vs. 441.0 ± 754.9 IU/mL, 385.8 ± 481.6 IU/mL vs. 358.4 ± 796.8 IU/mL, all p > 0.1). The serum ATg of patients with recurrence showed a downward trend and slightly rose before recurrence, but patients without recurrence showed an overall downward trend (Fig. 1C).
Performance of the classifier based on serum Tg in the ATg-negative group
After random sampling, 324 patients (with 59 recurrences) and 106 patients (with 7 recurrences) were included in the training cohort and the testing cohort, respectively. The classifier was developed in the training cohort using RF. In the testing cohort, the AUC was 0.81 [confidence interval; CI, 0.66–0.96] (Fig. 2A). The relative importance of a variable for separating patients with recurrence from patients without recurrence is shown in Figure 2B. Based on the ROC curve, the performance of the classifier was the best at 31%. At a prespecified cutoff value of 31%, 22% of the testing cohort was classified as having abnormal Tg, and the remaining patients were classified as having normal Tg. Patients with abnormal Tg levels had a significantly higher recurrence rate (5/23, 21.7%) than patients with normal Tg levels (2/83, 2.4%) (sensitivity 71.4%; specificity 81.8%; accuracy 81.1%).

Differentiation of recurrence based on the classifier in the ATg-negative group. (
According to the classifier, LTg and its difference/ratio with PTg had the best predictive value, followed by SD, FTg, and its difference/ratio with LTg (Fig. 2B). Considering basal and dynamic evaluation, LTg and FTg were then analyzed with ROC curves in the overall cohort. The optimal cutoff value of LTg was identified as 11.0 ng/mL with a sensitivity of 36.4% and specificity of 92.1% (Fig. 3A). When stratified by this value, the recurrence rates were 23.5% and 4.4% in the LTg ≥11.0 ng/mL group (n = 102) and the LTg <11.0 ng/mL group (n = 948), respectively (p < 0.001) (Table 2). According to another ROC curve, the optimal cutoff value of FTg was 5.3 ng/mL (Fig. 3A). There was a significant difference between the FTg ≥5.3 ng/mL group (n = 338) and the FTg <5.3 ng/mL group (n = 712) with respect to the 10-year RFS rates (76.2% vs. 93.3%, p < 0.001) (Fig. 3B).

Evaluation of prognostic performance in the overall cohort. (
Predictive Performance of Last Thyroglobulin (11.0 ng/ml as the Cutoff) in the Overall Cohort
LTg, last thyroglobulin.
After further stratification into different risk groups (according to the ATA initial risk stratification), elevated FTg (≥5.3) was associated with significantly worse RFS in both the low-risk and intermediate- to high-risk subgroups (both p < 0.05). The RFS difference was significant between low-risk and intermediate- to high-risk subgroups in patients with normal FTg (<5.3, p < 0.05) but was not significant in patients with elevated FTg (≥5.3, p = 0.07) (Fig. 3C). According to the univariate analysis, sex, age, primary tumor size, gETE, N stage, and elevated FTg were entered into the multivariate analysis. FTg ≥5.3 (hazard ratio [HR] = 2.052 [CI 1.231–3.421], p = 0.006) was significantly associated with lower RFS (Table 3).
Cox Proportional Hazards Model for Recurrent-Free Survival
CI, confidence interval; FTg, first thyroglobulin; HR, hazard ratio.
Performance of the classifier based on ATg in the ATg-positive group
After random assignment, 115 patients (with 21 recurrences) and 39 patients (with 3 recurrences) were included in the training cohort and the testing cohort, respectively. The predictive performance of ATg during follow-up was evaluated using the same approach. The AUC value was 0.72 [CI 0.53–0.91] in the testing cohort (Fig. 4A). The relative importance of a variable for separating patients with recurrence from patients without recurrence is shown in Figure 4B. Further evaluation was not performed considering the unfavorable predictive value according to the AUC.

Differentiation of recurrence based on the classifier in the ATg-positive group. (
Discussion
In this study, we studied the role of serum Tg and ATg in the follow-up of PTC patients treated with lobectomy in a large study cohort. Serum Tg was found to have reasonable value for surveillance in patients after lobectomy (AUC >0.8), while the value of ATg was not significant. According to the ROC curve, over a fifth of abnormal Tg patients in the testing cohort had recurrence, in contrast to 2% recurrences observed in normal Tg patients. To facilitate basal and dynamic evaluations, we selected the basal Tg (FTg, 6–12 months after lobectomy) and the most recent Tg (LTg) as potential markers. The optimal cutoff values of FTg and LTg were determined to be 5.3 and 11.0 ng/mL, respectively, by ROC analysis. When applied for stratification, both cutoff values had good differentiation ability. Notably, at basal evaluation, the FTg status successfully predicted different recurrence risks in both the low-risk and intermediate- to high-risk subgroups, but the recurrence difference lost significance between the low- and intermediate- to high-risks subgroups in FTg-elevated patients, which indicates that FTg better identified patients with recurrence risk than the current risk stratification system in the study cohort.
As the only potential biomarker of PTC, serum Tg has been analyzed to establish an association with recurrence in postlobectomy patients. However, its utility has not been well established. Previous studies have shown that several change patterns in serum Tg may be a clue for recurrence. For example, Harvey et al. reported that a consistent increase was associated with a higher probability of recurrence (7). Patients with a Tg increase >20% tended to have more recurrent disease than those without (80% vs. 22%, p < 0.01) (11). However, in a study by Ritter et al., serum Tg declined ≥1 ng/mL in 45% of postlobectomy patients, remained stable in 36%, and increased in 18%. Though half of the patients with recurrence had a steady rise in serum Tg levels during follow-up, this was also observed in 34% of the patients with no recurrence (10).
The changes in Tg/TSH ratios in postlobectomy patients were similar to those in Tg. Park et al. demonstrated that the proportion of low-risk PTC patients who underwent lobectomy with a Tg/TSH ratio <2.5 decreased gradually, while the proportion of patients with Tg/TSH ratios >5 increased gradually. The Tg/TSH ratios in 19 patients with recurrence did not differ significantly (9). Momesso et al. suggested a nonstimulated Tg threshold of ≥30 ng/mL for the definition of biochemical incomplete response (4). In a cohort of patients treated with lobectomy, Tg ≥30 ng/mL was observed in 6.4% (n = 12/187) of patients and was associated with a higher rate of recurrent disease of 50%, compared with 0% in patients with Tg <30 ng/mL (12). Recently, several studies have evaluated whether 30 ng/mL is a valid threshold.
The proportion of variation in patients who underwent lobectomy was confirmed to be only 32%, and the risk of recurrence of patients with Tg ≥30 ng/mL did not differ significantly from that of patients with Tg <30 ng/mL (13). Ritter et al. demonstrated no overlap between Tg levels in patients with or without recurrence (22.5 ± 22.3 ng/mL vs. 11.3 ± 13.8 ng/mL, nonsignificant), with no threshold that could distinguish between the two groups (10). Cho et al. speculated that the nonstimulated Tg cutoff value of 30 ng/mL is based on the assumption that this value constitutes ∼50% of the amount of Tg (20–60 ng/mL) that would be expected from a normal thyroid gland, which is not evidence based (13). Although consistently increasing ATg levels are also classified as a biochemical incomplete response, studies related to the performance of ATg in postlobectomy patients are scarce, and the clinical importance of ATg is still unclear (4,10).
The main reason for the inconsistent findings in previous studies may be the low rate of recurrence as well as the confounding factors caused by residual thyroid, which were adjusted for in this study. First, a large cohort would result in a better understanding of the prognosis, especially in postlobectomy patients who often have a good prognosis. In addition, the patients in this study had a relatively higher recurrence rate (6.3% vs. 3.1–7.2%) than in those previous studies (8,10,13,14), because nearly half of them had an intermediate risk to high risk for recurrence. The increased number of events improved the reliability and feasibility of the analysis. Although thyroid lobectomy for relatively advanced PTC is still in a state of debate, the tumor characteristics do not influence the present analysis when a structural complete response is achieved.
Moreover, our previous studies reported a similar oncological outcome for lobectomy compared with TT (15,16). Second, unlike patients treated with TT and RAI ablation, the remaining thyroid tissue in patients who underwent lobectomy caused the postoperative Tg to not be reset to 0. The TSH level at the time of measurement, the basal level of Tg measured in the first 6–12 months after lobectomy, and coexistent HT influenced the performance of Tg/ATg in the follow-up. To our knowledge, this is hard to analyze using common statistical methods. The application of machine learning allowed a comprehensive consideration of variables related to Tg/ATg. The predictive value of the whole classifier may better certify the application of a single serum Tg level and minimize the selection bias of a single measurement.
This study has several limitations. First, with different follow-up periods, the intervals of each measurement were different, and normal fluctuations may influence serum Tg levels. However, for the analysis of serum Tg, there are three main concerns, as follows: of the basal data, last data and its change, the first two were included in the overall cohort, and the last can be obtained from the remaining data. Although more data obtained with similar intervals during follow-up will be better to analyze, the method used in this retrospective study with a large sample size can be acceptable and the survival validation indicated that the Tg changes caused by recurrence could generally exhibit normal fluctuation. Second, the influence of TSH was not adjusted for. The serum Tg levels would change with TSH, and some studies adjusted this variable with Tg/TSH (9); however, the exact relationship between the two values is not clear in patients after lobectomy, and Tg/TSH may overadjust the true Tg status. In this study, we included patients with normal TSH levels (<4 mIU/L). Despite these limitations, serum Tg data, including values and dynamics, continue to separate patients who tend to have higher recurrence rates.
Despite the presence of residual thyroid tissue, the measurement of serum Tg has favorable value to predict and detect the presence of recurrence in patients after lobectomy for PTC. The proposed classifier and the cutoff value of FTg/LTg can help guide the frequency of monitoring and enable informed clinical decision making.
Footnotes
Authors' Contributions
S.X. participated in the design of the study, data collection, and article writing. H.H. participated in the design of the study and article writing. X.Z. participated in article writing and statistical analysis. Y.H. participated in the data collection and helped to draft the article. B.G. participated in the statistical analysis and quality control of data. J.Q. participated in data collection. X.W. and S.L. participated in the design of study and article editing. Z.X. participated in data analysis and interpretation and article review. J.L. participated in the design of the study and helped to revise the article. All authors read and approved the final article.
Statement of Ethics
The study was approved by the ethic committee of the Cancer Hospital, Chinese Academy of Medical Sciences.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
The study was funded by the CAMS Innovation Fund for Medical Sciences (CIFMS) (Grant No. 2016-I2m-1-002) and Beijing Hope Run Special Fund of Cancer Foundation of China (Grant No. LC2018A26).
