Abstract
Background:
Recalibrating diagnostic thresholds or using alternative labels may mitigate overdiagnosis and overtreatment of papillary microcarcinoma (mPTC). We aimed at identifying and collating relevant epidemiological evidence on mPTC, to assess the case for recalibration and/or new labels.
Methods:
We searched EMBASE and PubMed databases from inception to December 2020 for natural history, autopsy, diagnostic drift, and diagnostic reproducibility studies. Where a relevant systematic review was pre-identified, only new articles were additionally included. Non-English articles were excluded. One author screened titles and abstracts. Two authors screened full text articles, performed quality assessments, and extracted data. We undertook narrative synthesis of included evidence (pooled estimates from systematic reviews and single estimates from primary studies).
Results:
One systematic review of patients undergoing active surveillance found that after 5 years of follow-up, 5.3% (95% confidence interval [CI 4.4–6.4%]) of the mPTC lesions had increased in size by ≥3 mm, and 1.6% [CI 1.1–2.4%] of patients had lymph node metastases. Among 7 new primary studies (including 3 updates on 2 studies included in the systematic review), 1–5% of patients undergoing active surveillance had lymph node metastases after a median follow-up of 1–10 years. One systematic review found that subclinical thyroid cancer incidentally discovered at autopsy is relatively common, with a pooled prevalence of 11.2% [CI 6.7–16.1%] among studies that examined the whole thyroid. Four diagnostic drift studies evaluated the new classification of non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP). Three studies of cases previously diagnosed as papillary thyroid cancer found 1.3–2.3% were reclassified as NIFTP (reclassifications were from follicular variation of papillary thyroid cancer [FVPTC]). One study of 48 cases previously diagnosed as mPTC found that 23.5% were reclassified as NIFTP. Thirteen reproducibility studies of papillary thyroid lesions found substantial variation in the histopathological diagnosis of thyroid lesions, including FVPTC and NIFTP classifications (no study evaluated mPTC).
Conclusions:
This review supports consideration of recalibrating diagnostic thresholds and/or alternative labels for low-risk mPTC.
Introduction
In recent decades, there has been a dramatic increase in the incidence of thyroid cancer in many countries (1 –4), driven by increases in papillary thyroid cancer (PTC) (1,2,5). While there is some controversy on the extent of real underlying increases (2,6), there is strong evidence that overdiagnosis likely accounts for a large proportion of the increases observed (1,6,7). The rapid increase in diagnosis rates has been accompanied by flat or only slightly increasing thyroid cancer mortality rates—the epidemiological signature of overdiagnosis (8,9). Of particular concern is the marked increase in the diagnosis of PTCs smaller than 1 cm, also known as papillary microcarcinoma (mPTC). These lesions, which are impalpable and asymptomatic, are being increasingly detected as incidental findings on imaging tests requested because of thyroid function test abnormalities or for other clinical indications (2,4,6).
Recognizing the risk of overtreatment from mPTC, updated or new guidelines recommend against biopsy of smaller or lower risk thyroid nodules (10,11), and they promote hemi-thyroidectomy as a possible alternative to total thyroidectomy. Active surveillance has also been suggested for managing very low-risk cancers; however, uptake has largely been limited to clinical trials (12) or specific centers in Japan (13) and acceptance of active surveillance for mPTC among patients and clinicians is generally low (14 –17). Clinicians may be reluctant to offer active surveillance because of concerns about the risk of metastases and safety (16), and because they believe that patients have a high preference for surgery as a definitive treatment (17 –19). Although patients would be more likely to accept active surveillance if recommended by their clinician (20,21), many report that they would decline due to anxiety about a “cancer” being present in their body (18), and the potential for progression (22).
To facilitate the acceptance of more conservative management options, consideration may be given to recalibration of the diagnostic threshold. This might be done by tightening the criteria by which small thyroid lesions are considered to be “cancer,” or by adopting alternative diagnostic labels to describe low-risk mPTCs, such as the “papillary microtumor” term suggested in the Porto Proposal (23,24), to avoid using the term “cancer” (25,26). There is evidence that re-labeling mPTC without using the word “cancer” could increase uptake of active surveillance and other conservative management options (27 –31). Preventing overtreatment would prevent harms from total thyroidectomy, including the need for lifelong thyroid hormone therapy, and surgical complications such as vocal cord paralysis or hypoparathyroidism.
In this review, we systematically sought evidence to support or refute the need to recalibrate diagnostic thresholds and/or adoption of alternative non-cancer labels for mPTC lesions. Specifically, we sought evidence to answer the following 4 research questions: (i) What is the natural history of mPTC if left untreated? (ii) What is the size of the “reservoir” of sub-clinical thyroid cancer in people who were not known to have thyroid cancer during their life and who died of other causes [noting that PTC accounts for more than 80% of thyroid cancers (2)]? (iii) Is there evidence of diagnostic drift over time, for example resulting in cancer diagnoses for lesions that would have previously been classified as non-cancer? and (iv) What is the reproducibility of mPTC and other related thyroid lesions?
Methods
Literature search
We searched PubMed and EMBASE databases from inception to December 2020 for studies in 4 categories: (i) natural history studies where patients underwent active surveillance or watchful waiting instead of undergoing immediate surgery, (ii) autopsy studies that reported prevalence of incidental thyroid cancer, (iii) diagnostic drift studies, and (iv) diagnostic reproducibility studies (Search strategy in Supplementary Appendix SA1).
Inclusion criteria
We included studies that provided data relevant to mPTC, but we did not restrict inclusion based on the size of the tumor. The natural history category included studies where patients were diagnosed with PTC but not actively treated with surgery. The autopsy category included studies reporting results of autopsies of individuals with no known premorbid history of thyroid cancer. The diagnostic drift category included studies where there were two or more independent diagnostic classifications of the same histopathological slides at two or more different time points separated by years. The diagnostic reproducibility category included studies where there were two or more independent diagnostic classifications of the same histopathological slides at approximately the same time point.
A preexisting systematic review was identified for both natural history (12) and autopsy (32) categories, and only additional studies published after that time period were considered from the search.
Exclusion criteria
For all categories, papers that were not in English, and Abstracts, reviews, and protocols of planned studies were excluded. Studies where patients with thyroid cancer had extrathyroidal extension or metastatic disease were also excluded. For natural history, we excluded studies that did not report on clinical outcomes relevant to disease progression (e.g., tumor enlargement, lymph node metastases, conversion to surgery). For autopsy, we excluded studies where patients were found to have died of thyroid cancer that was not detected during life. For diagnostic drift and diagnostic reproducibility, we excluded studies that reported only readings by non-pathologists as well as studies that did not report on diagnostic classification (e.g., only reported mitotic counts), and we only included cases diagnosed as non-PTC thyroid cancer.
Study selection
One author (C.R.S.) screened the titles and abstracts, and two authors (T.M. and C.R.S.) independently screened full text articles for inclusion. Discrepancies were resolved through discussion and involvement of two more authors (K.J.L.B. and B.N.).
Data extraction, quality assessment, and synthesis
Data from included studies were extracted and summarized in a spreadsheet by one author (T.M.) and checked by another author (B.N. or K.J.L.B.). All studies were assessed for risk of bias by 1 author (T.M.) and checked by another author (B.N. or K.J.L.B.), by using a list of standardized items adapted from the ROBINS-I tool (33) (natural history studies), Hoy and colleagues' tool (34) (autopsy studies) QUADAS-2 (35) and QAREL (36) tools (diagnostic drift and diagnostic reproducibility).
We undertook a narrative synthesis of the evidence in each category. A PRISMA checklist for this review is available in Supplementary Appendix PRISMA-1.
Results
Natural history studies (n = 13) (details in Table 1)
We retrieved 226 papers, of which 46 were potentially relevant after title and abstract screening, and 13 were published after the end date of the identified systematic review (12). After full text review, we included an additional 7 studies (8 papers) in addition to the 6 studies included in the systematic review, for a total of 13 included studies (Supplementary Appendix SA2a shows selection of studies).
The systematic review included data collected from 1993 to 2017 from 4 active surveillance cohorts in Japan (Kuma Hospital and Cancer Institute Hospital), South Korea (multicenter), and North America (Memorial Sloan-Kettering Cancer Center), with a total of 2256 patients (range 291–1235 per study), the vast majority with mPTC (12). Pooled data from 4 cohorts found that 5.3% (95% confidence interval [CI 4.4–6.4%]) had enlargement of ≥3 mm at 5 years. Pooled data from 2 cohorts found that 1.6% [CI 1.1–2.4%] had lymph node metastasis at 5 years. The review also found that lesion size enlargement was steepest in years 2 to 4, and that many delayed surgeries were not done because of tumor enlargement or lymph node metastasis. There were no data reported on distant metastases or death.
The eight more recent articles identified from the search included two new papers on the Japanese Kuma Hospital cohort, and one new paper on the South Korean cohort. One of the papers on the Kuma cohort examined data from a subset of 824 patients enrolled in 2005–2011 (37), and it reported that the 10-year enlargement-free survival rate was 86.7% (enlargement was an increase in maximal tumor size of ≥3 mm). For many of the patients who initially had tumor enlargement, tumors either subsequently shrank or remained stable in size. Another paper on the Kuma cohort (38) retrospectively examined tumor volume doubling rates in a subset of 169 patients enrolled in 2000–2004 with a median 10.1 years of follow-up. Tumor growth ranged from rapid growth to shrinkage, but most tumors either remained stable in size (57%) or were slow growing (22%), and some decreased in size (17%). Only 3% of tumors had rapid growth. The new Korean paper (39) reported that of the 273 patients enrolled in 2002–2016, 71.8% of patients had a slow-growing tumor (volume doubling time >5 years). Younger age and microcalcification in the initial ultrasound were associated with a shorter doubling time, implying more aggressive disease.
The remaining five more recent papers reported findings from four active surveillance cohorts. Molinaro et al. reported on a prospective Italian cohort of 93 patients (40), of whom only 3 (3%) had disease progression requiring surgery after a median follow-up of 19 months. Another 19 patients (20%) elected to have surgery despite no evidence of disease progression. Rosario et al. reported on a prospective Brazilian cohort of 77 patients (41), of whom only 1 patient had disease progression requiring surgery, whereas 2 more patients elected to have surgery. However, some patients had <18 months of follow-up, which would mean that they had <2 surveillance ultrasounds. Sanabria reported on a prospective Colombian cohort of 102 patients (42), of whom 11 (11%) had tumor growth of ≥3 mm. Smulever et al. reported on a prospective Argentinian cohort of 41 patients (22,43), of whom 6 (15%) had tumor growth of ≥3 mm after a median 37.5 months of follow-up, 31 (75%) were stable, and 4 (10%) had a decrease in tumor size by ≥3 mm. Two patients (4.8%) had cervical lymph node metastases, and no patients had distant metastases. The last two studies included patients with relatively bigger tumors, and both were assessed to be at high risk of bias. Table 1 presents the main findings of the seven new studies and the systematic review, and the Supplementary Appendix SA3a presents the quality assessments.
Characteristics and Key Findings of the Natural History Studies That Were Included in This Review
See Supplementary Appendix SA3 for complete risk-of-bias assessment.
Also included older patients (>65 years of age) with severe comorbidities with incidentally discovered nodules >15 mm.
Including five who were retrospectively included as active surveillance due to diagnosis after surveillance of thyroid nodule.
Systematic review including four different cohorts.
Based on risk-of-bias assessment of primary studies reported in the systematic review.
IQR, interquartile range.
Autopsy studies (n = 35) (details in Table 2)
We retrieved 294 papers, of which 36 were potentially relevant after title and abstract screening, and 1 was published after the end date of the identified systematic review (32). This article was not eligible for inclusion, as it was not in English. Therefore, we included one systematic review in this category (Supplementary Appendix SA2b shows selection of studies).
The systematic review included 35 studies published from 1955 to 2011, with 42 datasets from Europe, Asia, North America, and South America, and a total of 12,834 autopsies (range 57–1102 hospital or forensic autopsies per study) (32). The identification of incidental sub-clinical thyroid cancer on autopsy was relatively common, with a 4.1% [CI 3.0–5.4%] pooled prevalence of incidental differentiated thyroid cancer (iDTC) among studies with partial thyroid examination, and 11.2% [CI 6.7–16.1%] among studies with whole thyroid examination. The prevalence of iDTC on autopsy had increased since the 1970s, and the authors suggested that the apparent increase in the incidence of thyroid cancer may be due to increased detection.
The most common biases in the primary studies were non-representativeness of the national population, and the non-systematic cancer detection method. Out of the studies that undertook an examination of the whole thyroid gland, the authors assessed 5 to be at low risk of bias for 8 out of 9 domains (no study was at low risk of bias for all domains), of which 2 reported on a series of >100 forensic autopsies where the whole thyroid was examined. A Singaporean study of 444 autopsies found that 43 (9.7%) had iDTC (44), while an Icelandic study of 199 autopsies found that 13 (6.53%) had iDTC (45). Table 2 summarizes the key findings of all studies in the systematic review.
Characteristics and Key Findings of the Studies in the Autopsy Systematic Review
iDTC, incidental differentiated thyroid cancer.
Diagnostic drift studies (n = 4) (details in Table 3)
We retrieved 128 papers, of which 43 were potentially relevant after title and abstract screening. After full text screening, we included three articles. In addition, 1 article (46) was transferred from the diagnostic reproducibility search because it fitted this category better (Supplementary Appendix SA2c shows the selection of studies).
All four studies in the diagnostic drift category were retrospective studies related to the reclassification of the encapsulated variant of follicular variation of papillary thyroid cancer (FVPTC) into the non-malignant category non-invasive follicular thyroid neoplasm with papillary-like nuclear features (NIFTP). Three studies published in 2020–2021 re-examined 69, 136, and 115 cases of previously diagnosed FVPTC for potential reclassification into the non-malignant category NIFTP (46 –48). Using the new more stringent diagnostic criteria, they found that NIFTP represented 2.3%, 1.3%, and 1.5% of previously diagnosed PTCs, respectively, and no patient with a lesion reclassified as NIFTP experienced any adverse oncologic events post-surgery. The fourth study specifically examined the impact of applying the NIFTP classification, previously proposed for lesions >1 cm only, to smaller lesions (49). Out of the 48 cases of mPTC diagnosed at a U.S. pathology department between 1996 and 2000, 8 (23.5%) were reclassified as NIFTP on review, and no patient with a reclassified lesion had a recurrence after 7–19 years follow-up. The authors concluded that applying the NIFTP classification to appropriate thyroid lesions that are ≤1 cm could lead to the avoidance of overtreatment.
Table 3 presents the main findings of the four studies, and the Supplementary Appendix SA3b presents the quality assessments.
Characteristics of the Diagnostic Drift Studies That Were Included in This Review
See Supplementary Appendix SA3 for complete risk-of-bias assessment.
FVPTC, follicular variant of papillary thyroid carcinoma; mPTC, papillary microcarcinoma; NIFTP, non-invasive follicular thyroid neoplasm with papillary-like nuclear features; PTC, papillary thyroid cancer.
Diagnostic reproducibility studies (n = 13) (details in Table 4)
We retrieved 161 papers, of which 50 were potentially relevant after title and abstract screening, and 13 were included after full text screening (Supplementary Appendix SA2d shows the selection of studies).
None of the included studies examined the diagnostic reproducibility of mPTC. In general, studies found substantial variability in the histopathological diagnosis of thyroid lesions, including classification as malignant or benign, and sub-classification of malignant lesions (50 –54). Compared with other sub-types of thyroid cancer, there appears to be better reproducibility for the diagnosis of PTC, that is, in differentiating this category from benign lesions and other malignant lesions (51,53,55). However, there was high variability in the diagnosis of FVPTC (56 –60), as well as the new category NIFTP (54,61). Table 4 presents the main findings of the 12 studies, and the Supplementary Appendix SA3c presents the quality assessments.
Characteristics of the Diagnostic Reproducibility Studies That Were Included in This Review
See Supplementary Appendix SA3 for complete risk-of-bias assessment.
FTC, follicular thyroid cancer; MTC, medullary thyroid cancer; TCV, tall cell variant.
Discussion
We found relevant evidence to inform consideration of recalibration of diagnostic thresholds and/or alternative labels for low-risk papillary microcarcinoma. The strongest evidence to support recalibrating diagnostic thresholds and/or alternative labels for mPTC comes from natural history and autopsy studies. Evidence from active surveillance studies of patients with mPTC shows that the natural history is most often indolent. After many years of observation, only a very small proportion of these lesions have been reported to metastasize to local lymph nodes (12). This indolent nature of most mPTC is also consistent with the findings from autopsy studies that there is a relatively large reservoir of sub-clinical thyroid cancer in the general population that does not cause any symptoms or adverse health effects. These lesions would, therefore, never have been detected before the recent development of widely available and sensitive diagnostic imaging (62). Given these findings alone, a strong case may be made to re-classify some mPTC into a lower risk category and/or re-label the condition (25). The objective of the current study is to support robust discussion among the pathology community about this idea, including the issue of what criteria should be used to define a lower risk category, should this strategy be adopted.
Consistent with suggestions in clinical management guidelines (10,11), evidence from this review supports active surveillance as a safe and effective alternative to immediate surgery for many patients diagnosed with mPTC. Ultrasound±fine needle aspirate cytology may be used to identify higher risk mPTC subtypes that behave more aggressively and may not be appropriate for active surveillance, for example the tall cell variant (63) that may be identified in cytology specimens (64 –66). The absence of such features may be identified as necessary criteria for a lesion to be designated as low risk, and documenting assessment of this could become standard cytopathology reporting practice. Active surveillance of low-risk lesions may reduce overtreatment (both surgical and non-surgical) and other potential negative impacts on a patient's quality of life (67). A recent study of the 2009 American Thyroid Association guidelines found that despite the guideline's recommendations for more conservative management, there was only a modest decrease in the use of more invasive management for low-risk patients with tumors 2 cm or less (68). Further, it has been shown that there is still very little acceptance among clinicians and patients. Re-labeling low-risk mPTC with a terminology that avoids calling it a “cancer” could further encourage the recommendation and acceptance of more conservative management by both clinicians and patients, and therefore help reduce overtreatment. A number of recent studies have now provided evidence that re-labeling can, indeed, change patients' perceptions (27 –31).
The reclassification of thyroid lesions into a lower risk category based on clinical evidence is also not without precedent. The new diagnostic classification NIFTP (69) was recently established to re-classify a subset of PTC that was found to be generally indolent and non-invasive. The indolent nature of NIFTP has since been confirmed by multiple studies. As noted in this review, a study has since been conducted that supports the application of the NIFTP category to mPTC lesions as their histology and clinical behavior is similar and re-classification will help to avoid further unnecessary aggressive treatment (49). Although a current NIFTP diagnosis differs from that of mPTC in that it requires excision of the whole thyroid specimen to make a diagnosis, the change in diagnostic classification demonstrates that a change of label may be accepted by pathologists.
We found no evidence on diagnostic reproducibility of mPTC specifically. However, the diagnostic reproducibility of thyroid lesions, in general, was found to be sub-optimal—including the classification of such lesions as benign versus malignant, a problem also seen in other cancers (70 –73). If there is to be a recalibration and/or re-labeling of some mPTC into a lower risk category, the new classification will need to be able to be applied consistently, with a high degree of reproducibility.
Strengths of this study include the systematic search, which is unlikely to have missed important evidence, and the robust methods of data extraction process and quality assessment. Limitations include that some of the data from included studies were from decades ago, were poorly reported, and had a high risk of bias. Notwithstanding this, this review is the first to shed light on whether recalibrating diagnostic thresholds and/or the consideration of alternative labels for low-risk PTC is supported by available evidence.
Strategies are needed to help rethink and minimize the impact of overdiagnosis and overtreatment on patients diagnosed with low-risk cancers, and on overburdened health care systems. The World Health Organization's classification of tumors has recognized that “there is an urgent need to integrate [new understanding of cancer] into cancer classification internationally” (74). The evidence from this review provides empirical support for the consideration of recalibrating diagnostic thresholds and/or alternative labels as possible strategies for low-risk PTC.
Footnotes
Authors' Contributions
Conception and design: K.J.L.B. and B.N. Collection and assembly of data: T.M. and C.R.S. Data analysis: K.J.L.B., B.N., T.M., and C.R.S. Data interpretation: all authors. Article writing: T.M. wrote the first draft. All authors revised it critically for important intellectual content. Final approval of article: all authors. Responsible for integrity of the data and accuracy of the analysis: K.J.L.B., B.N., T.M., and C.R.S.
Author Disclosure Statement
A.B. is a member of the Scientific Committee of the International Preventing Overdiagnosis Conference. All other authors have no conflict of interest to disclose.
Funding Information
National Health and Medical Research Council (NHMRC) Investigator Grant (no. 1174523).
Supplementary Material
Supplementary Appendix SA1
Supplementary Appendix SA2
Supplementary Appendix SA3
Supplementary Appendix PRISMA-1
