Abstract

B
In contrast to biochemistry, ultrasonography (US) is highly operator-dependent. 6 Previous studies have demonstrated that US in adult ATA low-risk patients is more likely to yield false-positive findings than true structural recurrence. 7,8 Inflammatory cervical lymph nodes are common in children, further contributing to false-positive US. Indeterminate or suspicious US, in the absence of biochemical evidence of disease, may contribute to patient and provider worry, as well as unnecessary investigations and/or interventions. It is unclear whether, in the context of undetectable or low-detectable Tg, such findings are actionable, and whether they are associated with independent risk for RRD.
We conducted a retrospective study of all patients <18 years who underwent total thyroidectomy for DTC at The Hospital for Sick Children from 2010 to 2020, to explore the value of routine US surveillance in follow-up of pediatric DTC (Supplementary Fig. S1 and Supplementary Table S1).
Data were abstracted from US reports, without re-review of the primary images, as we sought to assess the “real-world” impact of the radiologists' impression on decision making. Reports were categorized as (1) “reassuring” (there were no findings concerning for sRRD), (2) “indeterminate,” or (3) “suspicious” for sRRD. Indeterminate reports included those describing (1) lesions without comment regarding significance, (2) lesions “of uncertain significance,” or (3) lesions that “may” reflect sRRD per the reporting radiologist.
Criteria used at our institution for suspicion of lymph node metastases have been previously described. 5,9 US was performed 6 months after thyroidectomy and then every 6–12 months with contemporaneous serum Tg and Tg-antibody (TgAb) measurement. This study was approved by the research ethics board (No. 1000035326) with waiver of informed consent. Additional imaging or interventions performed within six months were considered to reflect direct sequelae of the US study and/or disease present at the time of US. Risk stratification was based on ATA Guidelines 1 with modification as outlined in Supplementary Data S1. Disease status was classified as previously defined. 10 Detailed methods are included in the Supplementary Data S1.
In total, 260 US studies were assessed. The mean number of US studies per patient per year was 2.1 over a median follow-up of 2.3 years (range 0.6–9.5). Overall, 112 of 260 (43.1%) US reports were classified as reassuring, while indeterminate findings were documented in 42.3% (110/260), and suspicious findings in 14.6% (38/260) (Fig. 1 and Supplementary Table S2). sRRD was confirmed in 11 of 56 (19.6%) patients (Supplementary Table S3).

Outcomes within six months of surveillance investigations. (
Reassuring US Findings
sRRD was never diagnosed in the context of a reassuring US study. Twenty-five patients had reassuring initial postoperative US. None developed structural recurrence during follow-up, regardless of initial ATA risk stratum. Disease status at last follow-up for these patients was 16/25 had no evidence of disease; 6/25 had indeterminate response (4 with TgAb and 2 with low-detectable thyroxine-suppressed thyroglobulin Tg; LT4-Tg <1 μg/L); 3/25 had biochemically incomplete response.
Eighteen individuals progressed from reassuring to indeterminate or suspicious US findings during follow-up. In 1 of 18 patients (Supplementary Table S3, patient 39), presumed sRRD was diagnosed based on elevated Tg, indeterminate US, and no radioiodine uptake. A schematic of subsequent US studies, relative to prior US, is shown in Supplementary Figure S2.
Indeterminate US Findings
sRRD was diagnosed within 6 months of 5/107 (4.7%) indeterminate studies, all coincident with elevated Tg or TgAb. These were considered to reflect accurate detection of structural disease by US.
Suspicious US Findings
Of 38 suspicious studies, structural disease was confirmed within 6 months on 10 occasions (26.3%). Among patients with suspicious US findings at any point during their follow-up course, 35% (7/20) developed sRRD. Conversely, 13 of 56 (23%) patients had ≥1 suspicious US during their course, and never developed sRRD. Overall, 31 of 56 patients had at least 1 indeterminate/suspicious US and did not develop sRRD.
Outcomes Relative to Tg
Undetectable Tg
In patients with undetectable Tg, reassuring, indeterminate, and suspicious US findings were documented in 49.4% (42/85), 42.4% (36/85), and 8.2% (7/85) of contemporaneous studies. Undetectable Tg was never associated with sRRD within 6 months of US.
Seven patients had undetectable Tg at one point, which subsequently became (low) detectable. One developed structural recurrence (Supplementary Table S3, patient 39). In retrospect, the undetectable Tg was obtained using a prior-generation assay with a functional sensitivity (FS) of 0.9 μg/L (vs. the current assay with FS = 0.1 μg/L).
Low-detectable Tg levels (LT4-Tg <1.0 or stimulated Tg <2.0 μg/L)
Among patients with low-detectable Tg, reassuring, indeterminate, and suspicious findings were present in 57.4% (31/54), 35.2% (19/54), and 7.4% (4/54) of US studies. Reassuring or indeterminate US findings in the context of low-detectable Tg were never associated with sRRD within six months of US.
Four patients had low-detectable Tg and suspicious US. Recurrence was confirmed histologically in one. Another transitioned to adult care immediately thereafter, and final disease status could not be ascertained and was censored. The remaining 2 did not develop sRRD after 3.5 and 7.2 years of follow-up. The false-positive rate of indeterminate or suspicious US findings with low-detectable Tg was 21 of 22 (95.5%).
sRRD was confirmed during follow-up in 3 patients with low-detectable Tg at any point (Supplementary Table S3, patients 3, 35, 53). All three had suspicious or indeterminate findings on their initial postoperative US. Test characteristics of US based on Tg are summarized in Table 1 and Supplementary Table S4.
Test Characteristics of Cervical Ultrasonography in Follow-Up of Structural Residual or Recurrent Disease Based on Thyroglobulin levels
No patient had detectable Tg antibodies.
FU, follow-up; LT4-Tg, Tg measurement on LT4 replacement therapy; NPV, negative predictive value; PPV, positive predictive value; sRRD, structural residual or recurrent disease; sTg, stimulated thyroglobulin (thyrotropin >30 mIU/L); US, ultrasonography.
Outcome among patients with elevated TgAb
Presence of TgAb was not associated with sRRD (p = 0.263). Among all five patients with elevated TgAb who developed sRRD, the initial US after primary therapy demonstrated indeterminate or suspicious findings.
Outcome by ATA risk strata
Supplementary Table S5 summarizes US findings by stratum. There was no significant difference in findings based on stratum (χ 2 = 9.37, p = 0.053). Per-patient, sRRD was found during follow-up in 17.9% (5/28) low-risk, 37.5% (3/8) intermediate-risk, and 15.0% (3/20) high-risk patients.
Discussion
Sonographically indeterminate findings are common during surveillance of DTC and are infrequently diagnostic of sRRD. This highlights the challenges inherent to interpretation of postoperative US for surveillance.
Our data illustrate that in patients with undetectable Tg, US is associated with a high false-positive rate. Even among patients with low-detectable Tg, only 4.3% of indeterminate or suspicious US findings were associated with sRRD. Moreover, a reassuring initial postoperative US was a strong favorable prognosticator. No patient with a reassuring initial US developed RRD during follow-up. Notably, 35% of patients progressed to undetectable Tg without any therapeutic intervention, following ≥1 indeterminate or suspicious US. Thus, decisions to pursue intervention should not be based on imaging alone.
We were intrigued to find the distribution of US findings similar across those with initial low, intermediate, and high risk. There are several possible explanations. First, we excluded low-risk patients who underwent lobectomy. This would a priori constitute the lowest risk category and the least likely to experience sRRD. Moreover, among the high-risk patients, those with distally metastatic disease and R2 resection were excluded. Thus patients at lowest and highest risk for RRD were excluded. Nevertheless, it is meaningful that ATA high-risk patients did not appear to have more frequent abnormal US studies.
There are several limitations to this study. Due to transition to adult care after age 18 years, this cohort has a relatively brief median follow-up duration of 2.3 years. Thus, we could not assess for late recurrence after transition. US images were interpreted, over the course of this study, by 12 different pediatric radiologists, and no standardized report was used for sonographic findings, leading to potential heterogeneity of interpretation and reporting. Nonetheless, these studies are reflective of experience in typical clinical practice.
The high rate of indeterminate US reports supports the need for a “common language” in reporting surveillance US and we would endorse development and validation of a standardized reporting tool applied to post-thyroidectomy investigations, akin to the preoperative Thyroid Imaging Reporting and Data System instrument used for thyroid nodules. 11
Although the aforementioned limitations preclude over-arching recommendations, several practice modifications may be considered: First, we continue to endorse an initial postoperative US for all patients. For those where the initial postoperative US is clearly reassuring, without elevated TgAb, it may be reasonable to limit surveillance to a “Tg-first” approach, with subsequent US studies limited to the context of rising Tg. This may also apply to those with undetectable or low-detectable Tg during follow-up, given the low rate of sRRD. Applying this approach to the current cohort would have reduced the number of US studies by 101 (39%) without missing sRRD.
We advocate US surveillance for individuals with elevated TgAb, at least until TgAb titers are within the reference range, whereupon circulating Tg can be accurately assessed.
Validation of these observations in a larger (and/or multicenter) cohort as well as collaboration with adult institutions to facilitate extended follow-up after transition of care would be important to establish generalizability of these findings.
Footnotes
Authors' Contributions
Conceptualization of the study was carried out by C.A.L., H.M.v.S., and J.D.W.; methodology was performed by C.A.L., H.M.v.S., and J.D.W.; validation of the study was performed by C.A.L., H.M.v.S., and J.D.W.; formal analysis was performed by C.A.L., H.M.v.S., and J.D.W.; investigation was by all authors; resources were performed by C.A.L., H.M.v.S., and J.D.W.; data curation was performed by C.A.L., H.M.v.S., and J.D.W.; data analysis and interpretation were by all authors; writing—original draft preparation was performed by C.A.L., H.M.v.S., and J.D.W.; writing—review and editing was performed by all authors; visualization was by all authors; supervision was by H.M.v.S. and J.D.W.; project administration was done by C.A.L., H.M.v.S., and J.D.W.; all authors have read and agreed to the published version of the article.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
This project was supported by the “Nijbakker-Morra Stichting,” the “Hendrik Muller's Vaderlandsch Fonds,” the “Girard de Mielet van Coehoorn Stichting,” and the Foundation “De Drie Lichten” in The Netherlands (C.A.L.).
Supplementary Material
Supplementary Data S1
Supplementary Figure S1
Supplementary Figure S2
Supplementary Table S1
Supplementary Table S2
Supplementary Table S3
Supplementary Table S4
Supplementary Table S5
