Abstract
Background:
The utility of serum thyroglobulin (Tg) measurement following partial thyroidectomy or total/near-total thyroidectomy without radioactive iodine (RAI) for differentiated thyroid cancer is unclear. This systematic review examines the diagnostic accuracy of serum Tg measurement for persistent, recurrent, and/or metastatic cancer in these situations.
Methods:
Ovid MEDLINE, Embase, and Cochrane Central were searched in October 2021 for studies on Tg measurement following partial thyroidectomy or total/near-total thyroidectomy without or before RAI. Quality assessment was performed, and evidence was synthesized qualitatively.
Results:
Thirty-seven studies met inclusion criteria. Four studies (N = 561) evaluated serum Tg measurement following partial thyroidectomy, five studies (N = 751) evaluated Tg measurement following total/near-total thyroidectomy without RAI, and 28 studies (N = 7618) evaluated Tg measurement following total or near-total thyroidectomy before RAI administration. Following partial thyroidectomy, Tg measurement was not accurate for diagnosing recurrence or metastasis, or estimates were imprecise. Following total/near-total thyroidectomy without RAI, evidence was limited due to few studies with very low rates of recurrence or metastasis, but indicated that Tg levels were usually stable and low.
For Tg measurements before RAI administration, diagnostic accuracy for metastatic disease or persistence varied, although sensitivity appeared high (but specificity low) at a cutoff of >1 to 2.5 ng/mL. However, applicability to patients who do not undergo RAI is uncertain because patients selected for RAI are likely to represent a higher risk group. The evidence was very low quality for all scenarios. All studies had methodological limitations, and there was variability in the Tg thresholds evaluated, patient populations, outcomes assessed, and other factors.
Conclusions:
Very limited evidence suggests low utility of Tg measurement for identifying recurrent or metastatic disease following partial thyroidectomy. Following total/near-total thyroidectomy, Tg levels using a cutoff of 1–2.5 ng/mL might identify patients at low risk for persistent or metastatic disease. Additional research is needed to clarify the role of Tg measurement in these settings, determine optimal Tg thresholds, and determine appropriate measurement intervals.
Introduction
Thyroglobulin (Tg) is a protein produced by the thyroid follicular cells roughly in proportion to the amount of thyroid gland tissue present (1,2). In patients with differentiated thyroid cancer (DTC) who undergo total or near-total thyroidectomy and receive radioactive iodine (RAI) for remnant ablation or therapy, postoperative serum Tg levels are monitored to identify patients with persistent or recurrent disease, disease progression, and to provide prognostic information (3). However, the role of postoperative Tg measurement in patients who undergo partial thyroidectomy, in whom noncancerous thyroid tissue is not removed, is uncertain. Similarly, the role of Tg measurement in patients who have undergone total or near-total thyroidectomy but have not received RAI is a challenge, as Tg-producing noncancerous residual thyroid tissue will be present.
A 2015 American Thyroid Association guideline recommends that periodic serum Tg measurement on thyroid hormone therapy be considered during follow-up of patients with DTC who have undergone less than total thyroidectomy and in patients who have had a total thyroidectomy but who have not received postoperative RAI (3). Although the guideline states that optimal Tg cutoff levels to distinguish normal residual thyroid tissue from persistent thyroid cancer are unknown, it notes that rising Tg values over time may indicate recurrence. To inform an updated guideline, the American Thyroid Association commissioned a systematic review examining Tg testing of patients following partial thyroidectomy or total/near-total thyroidectomy without RAI. The purpose of this systematic review is to address the utility of Tg testing in persons with DTC following (a) partial thyroidectomy or (b) total or near-total thyroidectomy who have not received postoperative RAI.
Methods
In conjunction with the American Thyroid Association's Guidelines Task Force for the management of adult patients with DTC, we determined the Key Question for this review: In adult patients with DTC, what is the accuracy of serum Tg measurement for diagnosing or predicting persistent, recurrent, or metastatic disease following (a) partial thyroidectomy or (b) total or near-total thyroidectomy without or before RAI remnant ablation? This review is reported in accordance with the Preferred Reporting Items for Systematic review and Meta-Analyses (PRISMA) 2020 statement (4).
Search strategies
We searched the Cochrane Central Register of Controlled Trials, Elsevier Embase®, and Ovid MEDLINE® (through October 2021) for relevant studies. Search strategies utilized keywords and terms for DTC and Tg measurement (detailed search strategies are shown in Supplementary Appendix SA). Searches were supplemented by reference list review of relevant articles.
Study selection
Abstracts and full-text articles were evaluated using prespecified eligibility criteria. The population was adults with DTC who underwent Tg measurement following partial thyroidectomy or following total or near-total thyroidectomy without RAI. We also included studies of patients who underwent total or near-total thyroidectomy and had Tg tested before RAI administration, as few studies evaluated patients who did not receive RAI, and Tg measurement before RAI ablation may provide some information about the usefulness of Tg monitoring. We included randomized controlled trials, nonrandomized clinical trials, and cohort studies (retrospective or prospective) that reported diagnostic accuracy of Tg measurement for detection of residual disease, DTC recurrence, and/or metastatic disease, as these outcomes were defined in the studies. Inclusion was restricted to English language studies, and studies published only as conference abstracts were excluded. We did not restrict inclusion based on the reference standard used.
Data abstraction
We extracted the following data from studies: author, year, country, study dates, data collection method (retrospective or prospective), sample size, age, percent female, DTC type and stage, surgery type, RAI use, thyroid stimulation status at time of Tg measurement, timing of Tg measurement, Tg antibody status, duration of follow-up, reference standard, proportion experiencing outcomes, and results (sensitivity, specificity, positive predictive value, and negative value). Data were extracted by one investigator and verified by a second.
Assessing methodological quality of individual studies
The quality (risk of bias) of each study was rated as “good,” “fair,” or “poor” using predefined criteria for studies on diagnostic accuracy adapted from the U.S. Preventive Services Task Force criteria (Supplementary Appendix SB). Studies rated “good quality” are generally considered valid, with unbiased patient selection methods; low attrition or missing data; prespecified Tg cutoffs; no data discrepancies; and use of an appropriate reference standard in all patients, interpreted without knowledge of the Tg result. Studies rated “poor quality” have a significant flaw or combination of flaws that may invalidate the results. These include biased selection methods; high attrition or missing data; no prespecified Tg cutoff; significant data discrepancies; inadequate reference standard; inconsistent application of the reference standard; or nonblinded interpretation of the reference standard to Tg results.
Studies rated “fair quality” have some methodological limitations but not enough to warrant a “poor” rating. We broadly defined an appropriate reference standard as one that utilized some combination of pathological findings, imaging, iodine scan, and/or clinical follow-up; a reference standard based solely on ultrasonography or iodine scan or based solely or primarily on Tg measurement was considered inadequate.
Synthesizing the evidence
The evidence was synthesized qualitatively; we planned to conduct meta-analysis if there were sufficient poolable data, but this was not done because few studies were identified for the key populations (partial thyroidectomy and total or near-total thyroidectomy without RAI ablation), with methodological limitations in the studies and differences in populations studied, Tg monitoring strategies, and thresholds used to define an elevated Tg level. The overall quality of evidence (indicating the confidence in findings) was graded “high,” “moderate,” “low,” or “very low” using GRADE methods, based on methodological limitations, consistency, directness, precision, and reporting bias (5,6).
Results
Literature searches
Database searches resulted in 843 potentially relevant articles (Fig. 1). After dual review of abstracts and titles, 96 articles were selected for full-text review. Of these, 37 met inclusion criteria (7 –43). Among the 59 excluded articles, the most common reasons for exclusion following full-text review were ineligible population (e.g., underwent total or near-total thyroidectomy and received RAI, or mixed population of patients who did and did not receive RAI; 27 studies), diagnostic accuracy not reported (15 studies), ineligible outcome (e.g., prediction of ablation success; 7 studies), or Tg not obtained before RAI or timing of Tg testing unclear (7 studies) (see Supplementary Appendix SC for full list of excluded studies with reasons for exclusions).

Literature flow diagram.
Tg measurement following partial thyroidectomy
Four retrospective studies of unstimulated Tg measurement following partial thyroidectomy met inclusion criteria (Tables 1 and 2) (7,30,33,39). Tg measurement was performed every 3–6 months in 2 studies and at least 3 months after surgery and then annually in 1 study; one study (7) did not report timing of Tg measurement. Sample sizes ranged from 70 to 223 (N = 561). The procedure was lobectomy in two studies (30,33), lobectomy with or without isthmusectomy in one study (39), and a variety of partial thyroidectomy procedures (most commonly, unilateral lobectomy [36%], hemithyroidectomy [35%], and subtotal thyroidectomy [13.9%]), in one study (7). Mean or median age ranged from 35 to 53 years, and the proportion that was female ranged from 77% to 94%. In three studies, 96–100% of DTCs were papillary, and in one study (39), 80% were papillary. The majority of cancers were classified as T1 or Union for International Cancer Control/American Joint Committee Cancer (AJCC) stage I or II.
Studies of Thyroglobulin Testing—Population Characteristics
Reported as 13% in the journal publication, but author communication verified that the proportion of female was 87%.
AJCC, American Joint Committee on Cancer; FTC, follicular thyroid cancer; PTC, papillary thyroid cancer; RAI, radioactive iodine; Tg, thyroglobulin.
Studies of Thyroglobulin Testing—Intervention and Test Characteristics
CT, computerized tomography; MRI, magnetic resonance imaging; PET, positron emission tomography; RAI, radioactive iodine; rhTSH, recombinant human thyrotropin; SPECT, single-photon emission computerized tomography; Tg, thyroglobulin; TgAb, thyroglobulin antibody; TSH, thyrotropin; US, ultrasound; WBS, whole body scan.
All studies excluded patients with Tg antibodies or reported a low proportion (11%) of patients with Tg antibody. The thyrotropin (TSH) level at the time of Tg measurement was <0.5 mIU/L in one study; one study reported median TSH level of 3.25 mIU/L; and two studies did not report the TSH level. One study focused on patients who underwent Tg measurement following partial thyroidectomy and before completion thyroid surgery (mean 3.2 months following initial surgery); outcomes were persistent disease and persistent or recurrent cervical lymph node metastatic disease based on pathological findings from completion surgery (7). In this study, completion thyroidectomy was performed to remove residual thyroid tissue as a definitive cancer therapy, not due to pathological findings on partial thyroidectomy. The other studies evaluated Tg measurement in unselected patients who underwent partial thyroidectomy (did not necessarily undergo completion surgery) and evaluated risk of persistence or recurrence, based on imaging and cytology/histology [two studies (30,39)] or an unreported reference standard [one study (33)].
Follow-up ranged from 6 months to 6.9 years. Two studies (7,39) were rated fair quality and two studies poor quality (Table 3) (30,33). Methodological shortcomings included potential selection bias, assessment of clinical outcomes not blinded to Tg results, unclear or high attrition or missing data, and lack of prespecified Tg thresholds to define a positive result.
Quality Assessment
Study reports 14 false-positive patients, which would result in specificity of 0.78 (51/65) rather than 0.80 as reported in study.
Discrepancy between reported diagnostic accuracy and data reported in study; diagnostic accuracy calculated from data in study.
One fair-quality study (n = 101) of patients who underwent completion surgery (89% with modified neck dissection) following partial thyroidectomy found 39% had residual thyroid cancer, and 40% had cervical lymph node metastasis (Table 4) (7). Tg >20 ng/mL before completion surgery was associated with a sensitivity of 0.44–0.47 for diagnosing residual thyroid cancer or lymph node metastasis and a specificity of 0.79–0.80. The positive predictive value was 0.57–0.60, and negative predictive value was 0.69.
Studies of Thyroglobulin Testing—Results
Ab, antibody; AUROC, area under the receiver operating characteristic curve; CI, confidence interval; SD, standard deviation; SE, standard error.
In the three other studies, the proportion of patients who experienced DTC recurrence ranged from 7.1% at 6–13 months to 8.5% at 6.9 years (30,33,39). Evidence on Tg accuracy for identifying patients with recurrence was limited. One study found a “rising” (undefined) Tg associated with sensitivity of 0.80 and specificity of 0.80 (positive predictive value 0.24 and negative predictive value 0.98), but there were only 5 cases of recurrence (39). Another study found that Tg levels were not associated with high sensitivity and specificity for recurrence at various thresholds (≥20% Tg increase associated with sensitivity of 0.74 and specificity of 0.08; ≥100% increase associated with sensitivity of 0.26 and specificity of 0.75) (30). Positive predictive values ranged from 0.07 to 0.09, and negative predictive values ranged from 0.76 to 0.92. The third study did not report diagnostic accuracy at 1 year, but found that Tg levels at that time did not differ between recurrence and nonrecurrence groups (22.5 ng/mL vs. 11.3 ng/mL, p = 0.16); at 2 years, 3 of 6 patients with recurrence had rising Tg levels (33).
Tg measurement following total or near-total thyroidectomy without or before RAI
Tg following total or near-total thyroidectomy, without RAI
Five studies evaluated Tg measurement in patients who underwent total or near-total thyroidectomy without RAI (Tables 1 and 2) (9,14,24,27,40). All studies were retrospective, except for one (14). Sample sizes ranged from 47 to 290 (N = 751). Mean or median age ranged from 39 to 50 years. The proportion female ranged from 68% to 89%. Three studies reported that all or nearly all cancers were T1; one study (14) restricted inclusion to T1 and T2 tumors, but did not report the proportion of tumors by stage, and one study (40) did not report tumor stage. In 3 studies, 78–98% of cancers were papillary; 2 studies (9,40) did not report DTC type. Timing of initial Tg measurement ranged from 3 to 9 months after surgery in 3 studies [timing not reported in 2 studies (9,40)]. Thyroid stimulation before Tg measurement was not reported in any study, except for one (14), which reported no stimulation at 3 months after total thyroidectomy, recombinant TSH stimulation at 6 months, and Tg measurement with TSH 0.5–2.0 mIU/L at 18 and 24 months.
Four studies excluded patients with Tg antibodies or reported a low proportion of patients with Tg antibodies; one study (40) did not report Tg antibody status. The duration of follow-up ranged from 2 to 6 years. Outcomes were persistent or recurrent disease, based on whole-body iodine scan and other imaging. Two studies (9,14) were rated fair quality and three studies (24,27,40) poor quality (Table 3). No study reported assessment of outcomes blinded to results of Tg measurement, and no study reported the proportion of patients with missing data. Other methodological limitations included unclear application of the same reference standard to all patients and no prespecification of the threshold used to define a positive Tg level. The reference standard was neck ultrasound (US) (with other imaging as indicated) in two studies (24,27); 123I scan and US in one study (14): a combination of clinical, ultrasonography, and Tg findings in one study (9); and not reported in one study (40) (Table 2).
Evidence on the accuracy of Tg measurement in patients who underwent total or near-total thyroidectomy without RAI was limited due to very low prevalence or incidence of recurrence or persistence across studies (Table 4). No cases of persistence or recurrence occurred in two studies [n = 57 and n = 271 (14,24)], and two studies (9,27) reported one case each (n = 86 and n = 290); the fifth study (40) did not report the number of persons with recurrence. One study (14) found Tg >1 ng/mL associated with specificity of 0.95, and 1 study (27) found that the Tg level was 11 ng/mL in a single patient with recurrent disease at 7 months. Otherwise, information on diagnostic accuracy was not reported, and findings were largely descriptive. The studies generally found Tg levels were stable and low (usually defined as <1 ng/mL) or undetectable following thyroidectomy.
One study (9) found that 97.9% of patients had a final (median 5 years) Tg ≤1 ng/mL, and 1 study (14) reported a mean Tg level at 18 months of 0.28 ng/mL. One study (24) found postoperative Tg levels were stable in most patients, although there was some variability according to first postoperative Tg level (78% in those with first postoperative Tg <0.2 ng/mL and 51% in those with first postoperative Tg >1 ng/mL). One study with mean follow-up of 60 months found that Tg was consistently undetectable in 62% of patients, and that 85% had a level <5 ng/mL (40).
Tg following total or near-total thyroidectomy, before RAI
Twenty-eight studies of patients who underwent total or near-total thyroidectomy and underwent RAI evaluated the accuracy of Tg measurement obtained before receiving RAI and may also provide some information on utility of Tg measurement in patients who do not undergo RAI (Tables 1 and 2) (8,10 –13,15 –23,25,26,28,29,31,32,34 –38,41 –43). All studies were retrospective, except for one (20). Sample sizes ranged from 42 to 1033 (N = 7618). Mean or median age ranged from 40 to 53 years, and the proportion of female ranged from 64% to 90% in studies that reported sex. The proportion of tumors that were papillary ranged from 55% to 97%; in studies that reported stage, the proportion with T1 or T2 tumors ranged from 40% to 100%, and the proportion with stage I or II tumors ranged from 54% to 69%. Stimulation of TSH with thyroid hormone withdrawal before Tg testing appears to have occurred in all studies, except for six (10,19 –21,25,36). When reported, the timing of Tg measurement ranged from 4 to 5 weeks to 3 months following surgery, and the duration of follow-up ranged from 4 to 6 weeks to 54 months following surgery
Outcomes assessed were progression, persistent local disease, local recurrence, and metastatic disease (distant, lymph node, or both) (Table 4). Seven studies (8,10,16 –18,25,35) were rated fair quality, and the rest were rated poor quality (Table 3). No study reported assessment of outcomes blinded to results of Tg measurement, and no study reported the proportion of patients with missing data. Other methodological limitations included failure to apply the same reference standard to all patients and no prespecification of the threshold used to define a positive Tg level. The reference standards used in the studies varied (Table 2). Eight studies used reference standards considered inadequate: one study (37) used ultrasonography alone, and six studies (8,10 –12,28,41,43) used whole-body scan alone. In the other studies, the reference standard was cytological or pathological findings or some combination of pathology, imaging, or 131I scan.
Fifteen studies assessed the accuracy of postoperative Tg measurement for diagnosing metastatic or persistent disease before administration of RAI (8,10 –12,18,20,21,25,28,29,32,35,37,42,43). Ten studies evaluated accuracy for lymph node or distant metastatic disease (Table 4) (11,12,18,20,21,25,29,32,42,43). The proportion of patients with lymph node metastases ranged from 0.8% to 66.0% (6 studies), and the proportion with distant metastases ranged from 1.7% to 22.7% (9 studies); in 1 study the proportion of patients with lymph node or distant metastasis was 5.0% (12). Sensitivity of Tg for any metastatic disease (lymph node or distant) ranged from 0.41 to 1.00 [8 studies (11,12,18,20,25,29,42,43)], and specificity ranged from 0.24 to 1.0 [7 studies (11,20,21,25,29,42,43)]. Tg thresholds ranged from “detectable” [sensitivity 0.80, specificity not reported (18)] or ≥0.89 ng/mL [sensitivity 1.00, specificity 0.42 (29)] to >12.35 ng/mL [sensitivity 0.90, specificity 0.83 (42)].
One additional study reported higher accuracy of pre-RAI Tg for distant metastasis (sensitivity 0.98, specificity 0.88, Tg threshold >61.87 ng/mL) than for lymph node metastasis (sensitivity 0.19, specificity 0.71, Tg threshold >32.13 ng/mL) (32). In studies that did not report TSH stimulation before Tg measurement, sensitivity was 0.41 and 0.73 [2 studies (20,25)], and specificity ranged from 0.88 to 0.95 [3 studies (20,21,25)]. Seven studies assessed accuracy for persistent disease or the composite outcome of persistence or metastatic disease before administration of RAI (8,10 –12,28,35,37). One study (8) restricted inclusion to patients with persistent or metastatic disease; in the other studies, the proportion with persistence or the composite outcome ranged from 5.4% to 58%. Sensitivity of Tg ranged from 0.38 to 1.0 [7 studies (8,10 –12,28,35,37)], and specificity ranged from 0.33 to 1.0 [6 studies (8,11,12,28,35,37)]. Tg thresholds ranged from >1 or 1.10 ng/mL (sensitivity 0.67–1.0 and specificity 0.57–0.66) (10,35,37) to >10 ng/mL (sensitivity 0.50 and 1.0 and specificity 0.68 and 0.93) (12,35).
In four studies that evaluated diagnostic accuracy for persistence or the composite outcome at different Tg threshold levels, sensitivity decreased, and specificity increased at higher thresholds (12,16,31,35). However, no Tg testing threshold was associated with both high sensitivity and high specificity. In these studies, at a Tg threshold of >1 to >2.5 ng/mL, sensitivity ranged from 0.90 to 1.0 (median 0.93) and specificity ranged from 0.35 to 0.58 (median 0.48); at a Tg threshold of >10 ng/mL, sensitivity ranged from 0.69 to 0.77 (median 0.71), and specificity ranged from 0.66 to 0.93 (0.77). One study reported a sensitivity of 1.00 and specificity of 0.24 for metastatic disease at a Tg threshold ≥1 and sensitivity of 0.83 and specificity of 0.57 at a Tg threshold of >5 ng/mL (43). In one study, in which Tg was obtained without prior TSH stimulation, the sensitivity was 0.90 (specificity not reported) (8).
Thirteen studies assessed the accuracy of postoperative, pre-RAI Tg measurement for predicting outcomes that occurred following RAI (mean or duration of follow-up, 6–72 months) (Table 4) (13,15 –17,19,22,23,26,31,34,36,38,41). However, results are more difficult to interpret than for outcomes assessed at the time of RAI administration, because they could be impacted by response to RAI or other intervening factors. For predicting metastatic disease, sensitivity ranged from 0.57 to 0.94 and specificity from 0.54 to 0.96 in 4 studies (13,22,34,41), based on pre-RAI Tg thresholds of >8 to >38.1 ng/mL. In one study that evaluated different Tg testing thresholds, sensitivity was high (≥0.94) but specificity was low (0.10 or 0.54) at thresholds of >2.25 to >11.05 ng/mL; however, a testing threshold of >30.25 ng/mL was associated with high sensitivity and specificity [0.84 and 0.86, respectively (34)].
For predicting recurrence, sensitivity ranged from 0.50 to 1.0, and specificity ranged from 0.55 to 0.92 in 8 studies (13,16,17,19,26,31,36,38), based on pre-RAI Tg thresholds of >0.7 to >34.6 ng/mL. Two studies (15,36) found postoperative, preablation Tg associated with an area under the receiver operating characteristic curve (AUROC) of 0.82 and 0.87 for recurrence following RAI treatment, and one study (22) found Tg associated with an AUROC of 0.77 (confidence interval 0.66–0.89) for metastatic disease following RAI treatment.
Discussion
Evidence for the utility of serum Tg measurement in persons with DTC following partial thyroidectomy or following total or near-total thyroidectomy without administration of RAI after surgery is limited. Due to imprecision, methodological limitations, and inconsistency, the evidence on diagnostic accuracy was graded as very low for all outcomes (Table 5). One study of patients who underwent partial thyroidectomy and subsequent completion surgery found Tg >20 ng/mL associated with sensitivity of <0.50 and specificity of ∼0.80 for detection of cervical lymph node metastasis or residual thyroid cancer based on pathological findings at completion surgery (7), for a positive likelihood ratio of 2.4 and negative likelihood ratio of 0.66. Based on these estimates, in a hypothetical cohort of patients who underwent partial thyroidectomy with a 10% pretest probability of cervical lymph node metastasis or residual thyroid cancer, the post-test probability in those with a Tg level >20 ng/mL would be 21% and with a Tg level ≤20 ng/mL would be 7%, indicating modest utility, given the relatively small changes in diagnostic probabilities.
Overall Quality of Evidence, Diagnostic Accuracy of Thyroglobulin Measurement
Formal assessment for small sample effects and potential publication bias was not performed, due to the small number of studies (partial thyroidectomy and total/near-total thyroidectomy without RAI), very serious methodological limitations, and heterogeneity in populations, Tg thresholds, Tg methods, and outcomes.
The overall quality of evidence on diagnostic accuracy for all clinical outcomes (metastasis, recurrence, persistence, or a composite) was graded very low.
Downgraded for indirectness because of reduced generalizability to patients who undergo total/near-total thyroidectomy and do not receive RAI.
Similarly, in a hypothetical cohort with a 40% pretest probability, the post-test probability following a Tg level >20 ng/mL would be 61% and the post-test probability following a Tg level ≤20 ng/mL would be 31%. However, these estimates are based on a single study of patients who underwent completion surgery, with uncertain applicability to other partial thyroidectomy populations. Three other studies of Tg testing after partial thyroidectomy that did not restrict enrollment to persons who underwent completion surgery and used an imaging or histological reference standard did not identify patients with recurrence or were limited by small sample size, and it was not possible to estimate diagnostic accuracy or likelihood ratios (30,33,39). In these studies, decreases in Tg levels were observed in some patients who experienced recurrence following partial thyroidectomy, potentially related to natural fluctuations in Tg or TSH levels.
For patients who underwent total or near-total thyroidectomy and did not receive RAI, there was very low-quality evidence from five studies to determine diagnostic accuracy of Tg measurement for recurrence, persistence, or metastatic disease due to very low rates of these outcomes. In these cohorts, Tg levels were low (usually <1 ng/mL) and stable in most patients during follow-up. Evidence for postoperative Tg measurement before RAI therapy suggests high specificity but variable (moderate to high) sensitivity for diagnosing metastatic disease or recurrence. Some variability was due to the Tg threshold used, with higher Tg thresholds associated with lower sensitivity and higher specificity. Although no Tg threshold was associated with both high sensitivity and high specificity, the utility of Tg testing depends on the Tg threshold used and the purpose of Tg testing. For example, four studies that compared different Tg thresholds found that at a Tg threshold of >1 to 2.5 ng/mL, median sensitivity for persistence or a composite outcome (persistence or metastatic disease) was 0.93 and median specificity was 0.48, resulting in a modest positive likelihood ratio (1.8) but strong negative likelihood ratio (0.15).
In a hypothetical cohort with a pretest probability of 10%, the post-test probability for the outcomes following a Tg level <1 to 2.5 ng/mL would decline fivefold, to 2%, suggesting potential usefulness for ruling out these outcomes. However, a Tg level >1 to 2.5 ng/mL would only have a modest impact on increasing the post-test probability (17%). At a Tg threshold of >10 ng/mL, the median sensitivity was 0.71 and median specificity was 0.78, for a positive likelihood ratio of 3.2 and negative likelihood ratio of 0.37. A Tg value >10 ng/mL would result in a greater increase in the post-test probability (26%) than using the lower threshold, while a Tg value <10 ng/mL would decrease the post-test probability to 4%. The clinical utility of using a Tg threshold >10 ng/mL would depend on whether a post-test probability for these outcomes of 4% is low enough to rule out the need for additional evaluation or otherwise alter the clinical approach. Other studies reported variable accuracy of postoperative, pre-RAI Tg for predicting outcomes following RAI and are difficult to interpret due to potential effects of RAI and other intervening factors on subsequent outcomes.
Our review had limitations. First, we restricted inclusion to English language articles, which could result in language bias. However, only one study (44) was excluded due to non-English language; it evaluated pre-RAI Tg and was unlikely to impact conclusions. Second, we did not assess for potential publication bias, due to the small number of studies and variability in Tg thresholds used and other factors, which complicate interpretation of graphical and statistical tests for small sample effects (45). Third, the protocol was not registered before initiating the review. However, the scope and methods were developed before conducting the review, and no protocol changes occurred. Fourth, we did not address other potential uses of Tg measurement, such as assessing the adequacy of thyroid hormone dose or predicting response to RAI.
Despite these limitations, our review is the first to synthesize the evidence around Tg testing in patients who have undergone partial thyroidectomy or total/near-total thyroidectomy who have not received RAI. A prior systematic review evaluated Tg measurement following thyroidectomy but did not address patients who had undergone partial thyroidectomy, did not report findings from studies of patients who underwent total or near-total thyroidectomy separately, included fewer studies, and did not evaluate studies of patients who underwent Tg measurement before RAI administration separately (46). A major limitation of the evidence in this review is the presence of methodological shortcomings in all studies. Almost all studies were retrospective, no study was rated good quality, and over half were rated poor quality. No study reported assessment of outcomes blinded to Tg results, and few reported attrition or missing data. Other common methodological shortcomings included failure to report enrollment of a consecutive or random sample, no prespecification of the Tg threshold to define a positive test, lack of clarity regarding TSH levels at the time of Tg testing, and unclear timing of Tg measurement or follow-up in relation to surgery.
Interpretation of the evidence is also challenging due to low event rates (particularly for patients who underwent total/near-total thyroidectomy without RAI) and differences in patient populations, outcomes assessed, duration of follow-up, variability in serum Tg concentrations depending on the measurement method used and study year (studies indicate less variability in more recent studies) (47), reference standards for outcomes, use of TSH-stimulated Tg in some studies and non-TSH-stimulated Tg levels in other studies, and other factors. Some studies did not define outcomes well and for the outcome of metastatic disease, studies did not distinguish between persistent disease, recurrent disease, or incident development in the contralateral lobe. In addition, due to study methodological limitations and heterogeneity, we did not perform meta-analysis, to avoid misleading pooled results. In patients who have undergone total or near-total thyroidectomy, the applicability of studies in which patients had Tg measurement before RAI to patients who do not receive RAI is uncertain, because the former is likely to represent a higher risk category.
Future research is needed to clarify the accuracy of Tg measurement in these situations, how the utility of Tg measurement varies according to patient or tumor factors, and optimal approaches to Tg monitoring (including timing, intervals, interpretation of single values vs. change, optimal Tg thresholds). Additionally, because the impact of Tg measurement depends on the actions that are taken as a result of Tg test results and the downstream effects of these actions, studies that assess the effects of Tg measurement on clinical decision making (e.g., additional testing, RAI administration, or surgery) and patient outcomes are needed. If Tg levels are obtained, anti-Tg antibodies should also be measured for appropriate interpretation.
In conclusion, very limited evidence suggests low utility of Tg measurement for identifying recurrent or metastatic disease following partial thyroidectomy. In persons who have undergone total or near-total thyroidectomy, incidence of recurrence is low, and Tg levels appear to be stable and low in most patients who do not receive RAI. Tg levels using a low cutoff (e.g., 1–2.5 ng/mL) might be useful to identify patients at low risk of persistent disease or metastasis. Therefore, in patients who have undergone total or near-total thyroidectomy without RAI, measuring Tg levels in conjunction with other monitoring may be helpful for identifying patients not requiring additional evaluation. Additional research is needed to clarify the role of Tg measurement in these settings, determine optimal Tg thresholds, and determine appropriate testing intervals.
Footnotes
Authors' Contributions
All authors conceived the study. R.C. designed the study and R.C. and T.D. carried out the review. R.C. prepared the first draft of the article. All authors were involved in the revision of the draft article and have agreed to the final content.
Author Disclosure Statement
J.A.S.: member of the Data Monitoring Committee of the Medullary Thyroid Cancer Consortium Registry supported by GlaxoSmithKline, Novo Nordisk, Astra Zeneca, and Eli Lilly. Institutional research funding was received from Exelixis and Eli Lilly. W.G.: Institutional research funding received from Roche and Siemens. All other authors reported no conflicts of interest.
Funding Information
This work is supported by the American Thyroid Association.
Supplementary Material
Supplementary Appendix SA
Supplementary Appendix SB
Supplementary Appendix SC
