Abstract
Background:
Papillary thyroid carcinoma (PTC) is the most common endocrine malignancy. Despite its low mortality rate, the disease has a recurrence rate of up to 30%. The mainstay of treatment for PTC is surgery, followed by radioiodine ablation and thyroxine therapy in appropriately selected patients. PTC can appear as a unifocal solitary tumor, but also as two or more anatomically separate foci. A great deal of controversy surrounds the significance of multifocality as a prognostic factor, and it is considered a poor prognostic factor that prompts more aggressive treatment. The aim of this study was to investigate the prognostic value of tumor multifocality on disease recurrence and mortality in PTC patients.
Methods:
Data of 1039 consecutive PTC patients from two tertiary medical centers were reviewed. The baseline characteristics and short- and long-term outcome were analyzed to evaluate the prognostic significance of multifocal disease. The application of two different propensity score models followed multivariate analysis.
Results:
The median follow-up was 10.1 years, and 534 (51.4%) patients had multifocal disease and 505 (48.6%) unifocal disease. Patients with the multifocal disease were significantly older, were more frequently male, had more extrathyroidal extension, more lymph node metastases, more advanced disease (stage III/IV), and a higher American Thyroid Association recurrence risk. Multifocal PTC patients had more persistence at one year (26.6% vs. 16.4%; p < 0.001), more recurrence during follow-up (12.7% vs. 6.6%; p = 0.002), and a higher overall mortality rate (15.5% vs. 9.7%; p = 0.002). However, there were no significant differences in recurrence, last-visit persistency, and mortality rates when adjusting for confounding variables by using propensity score matching.
Conclusion:
This propensity score–matching study provides the best available data to support the assertion that multifocality in PTC patients is a marker of more extensive disease at presentation, but not an independent prognostic factor for long-term outcomes.
Introduction
Thyroid cancer is the most common endocrine malignancy, and its global incidence has dramatically increased in recent decades. Despite the rising prevalence, the thyroid cancer–related death rate is reported stable at approximately 0.5 cases per 100,000 persons (1,2). Nevertheless, recurrent/persistent disease can reach 30% and is associated with increased morbidity and mortality (3).
Papillary thyroid carcinoma (PTC) accounts for 80–85% of all thyroid cancers in iodine-sufficient areas (2). It may appear as a unifocal solitary tumor, but it can also include two or several anatomically separate foci within the thyroid gland, a constellation described as multifocal PTC. According to reports in the literature, the prevalence of multifocality is 18–87% (4,5).
The mainstay of treatment for PTC is total or hemithyroidectomy, followed by selective administration of radioactive iodine (RAI) and thyroxine therapy (6). The appropriate use of these treatment modalities can reduce disease recurrence and patient mortality, though they might also lead to treatment-associated adverse events (7). For this reason, it is essential to balance the potential benefit versus possible adverse effects in order to define the type and extent of initial treatment accurately. Many studies have investigated the accuracy of different risk-stratification systems to help identify the appropriate treatment for a specific patient. These systems are continually evolving and are routinely debated because risk stratification became the cornerstone of individualized thyroid cancer management (6,8).
The prognostic significance of multifocality in PTC remains controversial (5,9 –17). Current guidelines issued by the American Thyroid Association (ATA) and the European Thyroid Association include patients with multifocal PTC within the low-risk category (6,7). However, recent studies have shown that multifocality does associate with an increased risk of nodal metastases, as well as persistent/recurrent disease (5,9 –11). Moreover, multifocality is often empirically managed as a high-risk factor for disease progression, thus prompting more aggressive treatment (12). Among a wide variety of staging systems, only two (the Memorial Sloan Kettering Cancer Center and the National Thyroid Cancer Treatment Cooperative Study systems) address the prognostic value of multifocality on survival (18).
The inconsistent results regarding the significance of multifocality in PTC probably result from biases and confounding effects of measured and unmeasured baseline characteristics. Nevertheless, the use of propensity score methods allows for the analysis of an observational study in a pseudo-randomized manner and reduces the effect of confounders (19,20). Propensity scores are typically estimated using a multivariable logistic regression model, with the score meaning the probability that a given variable would determine the outcome. Patients can be stratified into groups by propensity scores (e.g., quintiles), and the independent variable is then compared by strata. It also provides a method to match patients with similar characteristics to overcome confounder bias. As more variables enter the pairing process, the similarity between groups will increase, and hence the likelihood that the difference in outcome stems from the independent variable (19,20).
Given the dilemma facing physicians when deciding on the best treatment for the individual patient and the current controversy on multifocality as a predictor of a poor outcomes, this study aimed to compare baseline characteristics and response to therapy in patients with multifocal and unifocal PTC by using logistic regression analysis and propensity score matching.
Methods
Study design
A retrospective cohort study was carried out at the Assaf Harofeh and Rabin Medical Centers, two tertiary-care and academic medical centers in central Israel. After excluding aggressive histopathologic variants, the cohort included 1039 patients (Fig. 1). The median follow-up of the study was 10 years. The study followed the tenets of the Declaration of Helsinki and the approval of the local Institutional Ethics Review Board.

Study design flow chart.
Patient cohort
The thyroid cancer registries from two medical centers were searched for all patients with differentiated PTC who were retroactively and prospectively registered since 2005. Patients with information on tumor focality who had undergone total thyroidectomy and for whom sufficient data were available for analysis were included. Exclusion criteria were pathologic types other than classical (CPTC) and follicular (FVPTC) PTC variants. Tumors with more aggressive histology were excluded to avoid their potential confounding effect on clinical outcomes. Patients who had hemithyroidectomy were also removed from analysis in order to avoid the bias of including unproven pathology status.
Clinicopathologic variables
The following clinical variables were collected: age, sex, family history, radiation exposure, preoperative thyroid function tests, thyroglobulin (Tg) levels, type and time of surgery, pathology findings, extent of primary disease, focality, extrathyroidal extension (ETE), tumor-node-metastasis (TNM) staging, 2015 ATA recurrence risk stratification, RAI therapy, additional treatments, follow-up time, disease recurrence, and mortality. Patients undergoing near-total, subtotal, or completion thyroidectomy were grouped together as having total thyroidectomy. Neck dissection was defined as either central or lateral, as data on the affected compartment were incomplete. Bilateral disease and multifocal disease were considered together for statistical analysis. The staging system used was based on the seventh edition of the American Joint Committee on Cancer (AJCC)/Union for International Cancer Control TNM (rather than the eighth edition) because the AJCC recommended that all newly diagnosed cases through December 31, 2017, should be staged using the seventh edition (21). The clinical endpoints included (i) assessment of disease persistence one year after initial treatment, (ii) disease recurrence during follow-up, (iii) persistent disease at last visit, and (iv) overall and disease-specific mortality. The persistent/recurrent disease was diagnosed based on histopathology findings, imaging studies, and Tg levels. Criteria for disease-free status were the lack of suspicious structural findings combined with undetectable Tg in the absence of antithyroid antibodies. Criteria for response to initial treatment follow those established by the 2015 ATA thyroid cancer guidelines: excellent response—no clinical, biochemical, or structural evidence of disease; biochemical incomplete response—abnormal Tg or rising anti-Tg antibodies in the absence of localizable disease; structural incomplete response—persistent or newly identified locoregional or distant metastases; and indeterminate response—nonspecific biochemical and/or structural findings not reliable enough to differentiate between complete and incomplete response (6).
Statistical analysis
Fisher's exact test and a chi-square test were used to compare categorical variables, while a t-test was used to compare continuous variables between patients who presented with multifocal and unifocal disease and those with persistence and no evidence of disease. Univariate Cox regression was used to evaluate the crude association between each of the predictors and the study's long-term outcomes (mortality and recurrence). The Kaplan–Meier with log-rank test was used for survival analysis. Categorical variables are expressed as numbers and percentages, while continuous variables are expressed as the mean and standard deviation or medians and interquartile range. A reversed censoring method was used to calculate the length of follow-up. Univariate and multivariate logistic regression was performed to evaluate the crude and adjusted association between multifocality and first-year persistence. The multivariate regression analysis included age at diagnosis, sex, tumor size, ETE, TNM stage, and ATA risk of recurrence.
The propensity score was calculated as the probability of a patient having multifocal disease and was used to control for differences in baseline characteristics in the two groups. Multivariate logistic regression was used to calculate the propensity score and included age, sex, radiation exposure, family history, tumor size, TNM stage, pathological vessel invasion, ATA risk of recurrence, and ETE. For the propensity score, patients were grouped into five quintiles, and multivariate analyses were repeated using the stratified model. Then, multivariate logistic regression analysis was performed on the subgroup of matched patients according to their propensity score. An absolute difference of up to 5% in the propensity score was acceptable for matching. After matching, baseline characteristics were compared using the McNemar test for categorical variables and paired t-test or Wilcoxon test for continuous variables. The odds ratio (OR) for disease persistence at one year following surgery was described using conditional logistic regression. The hazard ratio (HR) for disease recurrence and mortality was described using stratified Cox regression. All statistical tests were two-sided, and a p-value of <0.05 was considered to be statistically significant. All statistical analyses were done using IBM SPSS Statistics for Windows v24 (IBM Corp., Armonk, NY).
Results
Baseline characteristics of the study cohort
After excluding patients with missing baseline information, pathological types other than CPTC or FVPTC, and those who underwent hemithyroidectomy, the study population comprised 1039 patients with enough data for analysis. Multifocal and unifocal disease accounted for 51.4% and 48.6% of the cohort, respectively. Table 1 shows the clinical baseline characteristics for all patients and by groups. When compared to unifocal disease, patients with multifocal disease were significantly older, more likely to be male, and had more radiation exposure, more microscopic vascular invasion, more ETE, and more lymph node (LN) involvement. Moreover, multifocality was significantly associated with a higher risk of mortality according to TNM staging (p = 0.029) and a higher risk of recurrence according to the ATA risk-stratification system (p = 0.008). When comparing primary treatment among groups, patients with multifocal disease underwent more neck dissections (36.3% vs. 30%; p = 0.033), received more RAI treatment (96.8% vs. 91.4%; p < 0.001) at higher mean first doses (121.6 ± 45 mCi vs. 106.2 ± 49 mCi; p < 0.001) and more external beam radiotherapy (ERBT; 2.7% vs. 0.8%; p = 0.025).
Baseline Characteristics and Primary Treatment of 1039 Patients with Unifocal and Multifocal Papillary Thyroid Cancer
Data shown are n (%) unless indicated otherwise.
PTC, papillary thyroid carcinoma; SD, standard deviation; TNM, tumor-node-metastasis; ATA, American Thyroid Association; stTg, stimulated thyroglobulin; ns, not significant.
Disease outcome at one year after primary treatment
An excellent response was recorded for 78.3% of the whole cohort and was significantly higher in the unifocal group than in the multifocal group (83.6% vs. 73.4%; p < 0.001). The non-excellent response group was composed as follows: structural persistence in 145/225 patients, biochemically only persistence in 75/225 patients, and indeterminate response in 5/225 patients. Table 2 shows the variables associated with a worse response assessed at one year from the initial treatment (multifocal disease, male sex, larger tumor, advanced TNM stage, advanced ATA risk, ETE, LN metastases, and distant metastases). The biochemical assessment at this time revealed that >10-fold postoperative stimulated Tg (stTg) levels were associated with an incomplete response. Interestingly, the worse outcome at one year in the multifocal group occurred even though the patients in that group received more intensive initial therapy.
Comparison of PTC Patients With and Without an Excellent Response Assessed One Year After Initial Treatment
Data shown are n (%) unless indicated otherwise. Excellent response: no clinical, biochemical, or structural evidence of disease; non-excellent response: composite of incomplete biochemical, structural, and indeterminate response. Definition criteria was according to the 2015 ATA thyroid cancer guidelines (6).
Additional treatments for recurrent/persistent diseases
Of the 814 patients with an excellent response at one year, recurrent disease was found in 73/764 (9.6%) with enough follow-up data (Table 3). Almost 80% of those 73 patients had some evidence of structural recurrence. Overall, 250 (24%) patients in the cohort received additional treatment, 225 because of persistent disease and 73 as the result of recurrent disease. The additional treatments included reoperations (n = 94), more RAI treatment (n = 241), and/or ERBT (n = 27). A comparison of both groups indicated that additional treatments were given to 17.6% (88/500) of unifocal compared to 30.6% (162/529) of multifocal disease patients (p < 0.001); the rates were similar for RAI retreatments (17.2% vs. 29.4%; p < 0.001) and reoperations (6.2% vs. 11.8%; p = 0.002), respectively.
Baseline Characteristics of PTC Patients with Recurrent Disease
Data shown are n (%) unless indicated otherwise. Recurrence criteria according to 2015 ATA guidelines (6).
HR, hazard ratio; CI, confidence interval; na, not available; ns, not significant.
Long-term disease outcome
Evidence of persistent disease at the last visit was more frequent in the multifocal group than in the unifocal group (18.4% vs. 11.2%; p = 0.002), with similar results observed when structural persistence only was considered (12.6% in multifocal vs. 6.8% in unifocal patients; p = 0.003).
With a HR of 2.17 [confidence interval (CI) 1.34–3.51], recurrence was more frequent in patients with multifocal disease compared to unifocal disease (12.7% and 6.6%, respectively; p = 0.002); Other factors predictive of recurrence by univariate analysis were sex, tumor stage, ETE, LNM, TNM stage, and ATA recurrence risk (Table 3). Similar results were seen when structural recurrence only was considered (data not shown).
During a median follow-up of 10.1 years, 132 all-cause deaths were observed. In 22 cases, the death was related to advanced thyroid cancer, denoting a 12.7% overall mortality and 2.1% disease-related mortality rate for the whole cohort. Compared to the unifocal group, patients with multifocal disease had a higher rate of all-cause mortality (15.5% vs. 9.7%, p = 0.002), with a HR of 1.75 [CI 1.22–2.51]. Other factors predictive of overall mortality by univariate analysis were age, tumor stage, ETE, TNM stage, ATA recurrence risk, and postoperative stTg (Table 4).
Overall Mortality of PTC Patients According to their Baseline Characteristics
Kaplan–Meier survival analysis using a log-rank test to compare mortality and disease-free survival by tumor focality is shown in Figure 2. Compared to unifocal disease, multifocal disease had significantly higher all-cause mortality (p = 0.002), higher disease-related mortality (p = 0.011), and lower disease-free survival (p = 0.001). For the overall and disease-specific mortality, the curve separation was more evident after five years following diagnosis.

Kaplan–Meier curves for disease outcome of papillary thyroid cancer patients with multifocal versus unifocal disease. (
Multivariate analysis for clinical outcomes
Adjusted and unadjusted HRs for short- and long-term outcomes for both groups are shown in Table 5. After adjusting for age, sex, tumor size, ETE, TNM stage, and ATA recurrence risk, multifocal disease was significantly associated with more persistent disease at one year (OR = 1.85; p = 0.001) and more disease recurrence during follow-up (HR = 2.13; p = 0.005). However, on adjusted multivariate analysis, the independent prognostic role of multifocality in predicting overall mortality showed no statistical significance (HR = 1.26; p = 0.249).
Multivariate Analysis and Propensity Score for Multifocal Disease Adjusted for Age, Sex, Tumor Size, Extrathyroidal Extension, TNM Stage, and ATA Risk of Recurrence
Odds ratio.
Analysis using propensity score matching
Given the observed heterogeneity in baseline characteristics between the multifocal and unifocal groups, as well as the inherent potential biases of any retrospective study, propensity score–matching analysis was used to reduce the impact of these biases (Tables 5 and 6). When the propensity score used was stratified into five quintiles, it revealed multifocality to be an independent predictor for persistence at one year (OR = 1.57; p = 0.008) but did not predict disease recurrence and all-cause mortality during follow-up. Next, the cohort was matched 1:1 according to the patients' estimated propensity score. From the 1039 patients in the study, it was possible to match 690 patients in 345 pairs, and their baseline characteristics are shown in Table 6. Following matching, the statistically significant value of multifocality to predict persistent disease at one year was slightly weakened (OR = 1.52; p = 0.017), and it was no longer significant for predicting recurrent disease (HR = 1.26 [CI 0.49–3.17]) or all-cause mortality (HR = 1.00 [CI 0.5–2.0]). Differences in persistent disease at last visit were also no longer significant after matching (13.8% [47/340] for multifocal vs. 11.3% [38/336] for unifocal disease; p = n.s.) and were also nonsignificant when only structural/persistency patients were considered (10.2% [34/332] vs. 6.9% [23/331], respectively; p = n.s.).
Baseline Characteristics of PTC Patients After Matching Using a Propensity Score Method
Discussion
Personalized medicine is achieved by adjusting treatment to individual clinicopathologic characteristics. Risk stratification is carried out to tailor a therapeutic approach to each patient by using established clinicopathologic risk factors, such as advanced age at diagnosis, male sex, LN node metastasis, ETE, and DM (6,8). Although many studies have investigated the prognostic value of multifocality on disease outcomes, the issue remains a matter of controversy (5,9 –17). This study extensively evaluated the differences in baseline characteristics and disease outcomes between multifocal and unifocal disease following total thyroidectomy by using two large hospital-based registries. To the best of the authors' knowledge, this is the first research on this subject to use propensity score matching to correct for confounding variables.
The wide range in the prevalence rate of multifocal disease (18–87%) that was previously reported (5) may result from differences in study design. In the present study, multifocal disease was diagnosed in 51.4% of the whole cohort. The clinicopathologic characteristics of the study group are similar to previously reported studies, and the same clear association was found between multifocal disease and older age (5,10), male sex (22), and ETE (5,11,12). However, most previous studies were conducted on multifocal patients diagnosed mostly at a more advanced TNM stage (11,12,14). Though the prognostic value of bilateral disease was beyond the scope of our research, a recent study by Kim et al. (11) found that multifocality but not bilaterality predicted disease recurrence in PTC patients.
Interestingly, in the present research, multifocal disease was associated with a higher ATA risk of recurrence compared to unifocal disease. This is noteworthy because the ATA initial risk-stratification system included multifocality as a clinicopathologic feature of low-risk significance (6). It is also notable that despite the difference in the extension of disease, RAI treatment was used similarly between groups, which can be explained by the time period of the registries, going back to an era were empiric RAI therapy was given routinely.
Disease persistence at one year has been reported to be in the range of 6–48% (23 –26). The variance among studies seems to derive from differences in design, the definition of persistent/recurrent disease, and the disease stage at entry. In the present study, it was 21.7% for the whole cohort, but it was worst in the multifocal group (26.6% vs. 16.4%; p < 0.001).
The fact that after multivariate analysis and propensity score matching multifocality remains an independent predictor of persistence at one year raises several possibilities. Patients with multifocal disease present with more advanced illness and are therefore less prone to respond well to initial treatment. They are also considered to have a worse outcome and are subjected to more aggressive initial treatment. Moreover, the extent of disease is not always fully evident at the time when the initial treatment is chosen. Thus, differences in initial treatment are inherent to the study groups and cannot be included in the propensity score matching.
At last visit, with a median follow-up of 10 years, the present study revealed more recurrent/persistent disease among patients with multifocal disease, which was twice as high when only residual structural disease was considered (12.5% vs. 6%; p = 0.003). They also developed more distant metastases and had a higher rate of overall and disease-related mortality. Nevertheless, when the propensity score matching was applied to the multivariate analysis, multifocality lost its statistical significance as an independent predictive factor of long-term outcome.
Two recently published meta-analyses reported opposite findings. Guo et al. (22) included 13 studies for a total of 7048 patients and failed to find a prognostic value of multifocality to predict recurrent disease. In contrast, by including five studies with recurrence data on 178,550 thyroid cancer patients (116,775 excluding a Surveillance, Epidemiology, and End Results analysis), Joseph et al. (5) found a HR of 2.81 for recurrent disease in multifocal PTC. In 2017, Wang et al. (12) reported on 2638 patients from 11 medical centers with a complementary database of 89,680 patients for replication and validation analysis. This study demonstrated an association between multifocality and disease recurrence in univariate but not multivariate analysis. Moreover, in their data, there was no association between multifocality and distant metastasis or patient mortality. The study presented here is the first to use propensity score matching in a large cohort of unifocal and multifocal PTC patients, which supports and corroborates the findings by Wang et al. (12).
This study should be viewed in light of several limitations. The first is inherent to its retrospective design. The approach to PTC patients is a dynamic process; disease-stage classifications, the extent of surgery, role of RAI, methods of follow-up, and definitions of persistent or recurrent disease were all subject to changes over time, and for this reason the conclusions should be carefully considered. Second, PTC is associated with a very low mortality rate. Therefore, to investigate the association between multifocality and mortality, large population-based studies are needed, and the findings should thus be interpreted with caution. Additionally, because data on the number of foci were lacking, it was not possible to assess the effect of this variable in this cohort. The use of propensity score matching has some limitations too. There might be variables not measured during the disease course and therefore not adjusted for their potential confounding effect. Also, the more the variables are used to pair patients, the smaller the matched group available for comparison. Treatment could not be included as a matching variable because multifocal disease is usually treated more aggressively by default.
Strengths of the study are: (i) the comparison made between groups using propensity score matching; (ii) the prolonged follow-up, allowing assessment of long-term outcomes; (iii) stratification of focality in relation to the ATA recurrence risk system; and (iv) a relatively large and homogeneous study sample with extensive available data on clinicopathologic characteristics and individual treatments.
In summary, this study shows that multifocality is associated with more aggressive disease, including baseline characteristics, intensity of treatment, persistence/recurrence rates, and mortality. However, when the multifocal and unifocal groups were matched using propensity score analysis, there were no significant differences in the long-term outcomes. It is concluded that in PTC patients, the association of multifocal disease with a worse short- and long-term outcome should be considered as a marker of more extensive disease at presentation but not as an independent predictor of disease outcome.
Footnotes
Acknowledgments
This work was part of the requirements for the Medical Doctor degree at the Sackler Faculty of Medicine, Tel-Aviv University.
Author Disclosure Statement
The authors confirm that there is no conflict of interest.
