Abstract
Thyroid cancer (THCA) is a prevalent health burden, and unpacking its biological and social determinants is a public health priority. Previous studies have reported inconsistent findings regarding the effects of race and ethnicity on the incidence and presentation of THCA. It remains unclear whether racial differences manifest at the molecular level. By harnessing the Cancer Genome Atlas papillary THCA dataset, this study derived genetic ancestry estimates from single nucleotide polymorphism array genotyping and exome sequencing data. Five ancestral groups (Europeans, East Asians, Africans, Native/Latin Americans, and South Asians) were included for analysis. We found a good agreement between genetic ancestry and reported race (Cramer’s V = 0.730). Although differences in tumor size and patient age were observed, overall survival, progression-free interval, and disease-free interval were similar across the ancestral groups. Furthermore, the distribution of oncogenic drivers did not significantly differ among these groups. Weighted gene co-expression network analysis identified several ancestry-associated signatures. In conclusion, this study suggests that hereditary ancestral traits likely have little biological significance in papillary THCA. Instead, racial disparities in this type of cancer may be attributed to lifestyle factors, environmental exposures, and social and political power asymmetries in society and healthcare infrastructure.
Introduction
Papillary thyroid cancer (THCA) is the most common type of thyroid malignancy and has the highest increase in incidence compared to other histological types (Cheng et al., 2023). In addition to classical papillary THCA, there are several histological subtypes, including follicular, tall cell, diffuse sclerosing, and solid/trabecular variants (Bai et al., 2020). Molecular studies have advanced our knowledge of the genetic correlations with tumor morphology. Cancers with a papillary architecture are characterized by the activating BRAF V600E mutation and exhibit a gene expression profile known as ‘BRAF-like’ (Fagin and Wells, 2016). In contrast, those with follicular growth patterns typically have RAS mutations (or, less commonly, EIF1AX and BRAF K601E) and display similar gene expression profiles categorized as ‘RAS-like.’ Other cancers can arise from gene fusions (such as RET, NTRK, and ALK). These oncogenic drivers are mutually exclusive (Fagin and Wells, 2016).
There are differences in diagnosis, treatment, and clinical outcomes for patients with THCA. Insofar as efforts to explain this heterogeneity are concerned, previous studies have reported inconsistent findings on the effects of race and ethnicity on the incidence and presentation of THCA (Zhao and Wilhelm, 2025; Zmijewski et al., 2025). It remains unclear whether racial differences manifest at the molecular level.
An analysis of the ‘Surveillance, Epidemiology, and End Results Program’ registries showed that the incidence of THCA increased more in Whites than in Asians/Pacific Islanders (Magreni et al., 2015). Nevertheless, non-Hispanic Black Americans were reported to have a significantly higher risk of all-cause mortality than non-Hispanic Whites (Fwelo et al., 2025). The phenotypes of THCA also vary across racial groups. Whites were more likely to have classical papillary THCA, whereas African Americans had a higher percentage of the follicular variant of the disease (Tang et al., 2018). According to Minas et al. (2021), these discrepancies in clinical observations may largely be explained by cultural barriers, lifestyle factors, healthcare accessibility, and varying exposures to carcinogens and pathogens. The role of genetic susceptibility has also been gaining attention in both research and clinical settings.
Recent advancements in computational biology have made it possible to accurately and reliably infer ancestry from cancer-derived molecular profiles (Belleau et al., 2023). Genetic ancestry provides a more precise and unbiased assessment of a person’s lineage compared to self-identified race and ethnicity (Lord et al., 2022). Notably, genetic ancestry may have varying effects on phenotypes and genotypes across different tissues. Research has demonstrated that ancestry-associated variations in genomic alterations, methylation status, and gene expression are specific to particular tissues (Carrot-Zhang et al., 2020; Oak et al., 2020).
This study aimed to investigate the impact of genetic ancestry estimates on phenotypes and genotypes of papillary THCA using data from the Cancer Genome Atlas (TCGA) database. The findings of this study have the potential to enhance our understanding of the biological factors that contribute to racial disparities in THCA research, as well as the social and biological determinants of the disease.
Materials and Methods
Institutional review board approval and informed consent were not applicable to the present study, which used publicly available data.
Data retrieval
We downloaded clinical data, mutation status, thyroid differentiation score (TDS), and RNA sequencing profiles from the TCGA papillary THCA database via the Genomic Data Commons data portal (https://portal.gdc.cancer.gov/) as previously reported (Chien et al., 2017). The reported race of the patients was based on the initial global analysis (Cancer Genome Atlas Research Network, 2014). Driver mutations were identified as BRAF V600E, RAS (HRAS/NRAS/KRAS), and fusions (such as RET and NTRK). The status of TERT promoter mutations was determined in our previous report (Chien et al., 2018).
Ancestry inference
Genetic ancestry was based on ancestry calls published by Carrot-Zhang et al. (2020). Ancestral groups were determined using five independent classification approaches, including three that used single nucleotide polymorphism (SNP) array genotyping calls and two that employed exome sequencing. Five ancestral groups were identified: African, East Asian, European, Native/Latin American, and South Asian. Estimates of ancestral proportions for African, East Asian, European, Native/Latin American, and South Asian populations were obtained for each individual. For admixed individuals, primary ancestry was assigned unless the proportion of secondary ancestry exceeded 20%. In such cases, they were considered admixed.
Clinicopathological assessment
The tumor-node-metastasis (TNM) stage was based on the seventh edition of the American Joint Committee on Cancer staging manual (Suh et al., 2017). For clinical outcome endpoints, three major outcomes (overall survival, progression-free interval, and disease-free interval) were included in the analysis for the THCA cohort, following the published endpoint usage recommendations (Liu et al., 2018).
Transcriptomic appraisal
Principal component analysis (PCA) was conducted on the gene expression profiles to reduce dimensionality and assess the degree of clustering. A PCA biplot was used to visualize the first and second principal components and to explain the greatest amount of variance among all samples (Lee et al., 2022). The plot was generated using the prcomp() and ggbiplot() libraries in R software, based on the expression levels of all genes.
Weighted gene co-expression network analysis (WGCNA) and pathway analysis
WGCNA was used to study the relationships between co-expression modules and ancestral groups (Liu et al., 2022a). Genetic admixture proportions were incorporated as a continuous variable. The coefficients for the correlations of the color-coded module eigengenes with a given ancestral trait were calculated, with a positive coefficient indicating that the genes in the module exhibit a corresponding pattern of increasing expression as the trait values increase. Pathway analyses were performed using the WEB-based GEne SeT AnaLysis Toolkit (WebGestalt 2024; https://www.webgestalt.org/) to explore functional enrichment among the genes in modules that showed significant correlations with specific ancestral groups (Elizarraras et al., 2024; Hsu et al., 2016).
Statistical analyses
Cramer’s V was employed as a measure of effect size for association levels, ranging from 0 (indicating no association between the variables) to 1 (indicating a perfect association). Categorical variables were compared using a two-sided Fisher’s exact test, while ordered variables (extrathyroidal extension and TNM stage) were analyzed using the Cochran–Armitage trend test. Given the small sample size and skewness in some ancestral groups, medians (with interquartile ranges) were reported for continuous variables, and the non-parametric Kruskal–Wallis test was used for comparisons (Konishi and Kakimoto, 2023). Dunn’s test was utilized for post-hoc pair-wise comparisons. Data were analyzed using R software version 4.4.1, and a p-value of < 0.05 was considered statistically significant.
Results
Ancestral characteristics
Following integration, 470 patients with papillary THCA had sufficient transcriptome and genetic ancestry data for analysis. Three were categorized as admixtures and excluded from the study. Therefore, a total of 467 individuals were included for further analysis based on estimated molecular ancestry: 361 of European descent, 52 of East Asian descent, 32 of African descent, 11 of Native/Latin American descent, and 11 of South Asian descent. The correlations between genetic ancestry and reported race are shown in Table 1. Overall, genetic ancestry is in good agreement with the reported race, with a Cramer’s V of 0.730.
Correlations Between Genetic Ancestry and Reported Race in 467 Patients with Papillary Thyroid Cancer
Note: Genetic ancestry estimates are derived from single nucleotide polymorphism array genotyping and exome sequencing data.
Clinicopathological differences
We compared clinicopathological data across ancestral groups (Table 2). The age at diagnosis for individuals of East Asian and Native/Latin American descent was significantly younger compared to those of European and African descent (Kruskal–Wallis with post-hoc Dunn’s test: East Asian vs. European and African, p = 0.018 and 0.004; Native/Latin American vs. European and African, p = 0.026 and 0.006). In addition, tumor size was significantly larger in individuals of African and Native/Latin American descent compared to those of European and East Asian descent (African vs. European and East Asian, both p < 0.001; Native/Latin American vs. European and East Asian, p = 0.009 and 0.008). Nonetheless, no other parameters, such as sex ratio and disease stage, differed significantly among the ancestral groups. Clinical outcomes were also comparable among the five ancestral groups, including overall survival (log-rank p = 0.246), progression-free interval (log-rank p = 0.963), and disease-free interval (log-rank p = 0.685; data not shown).
Clinicopathological Features Stratified by Genetic Ancestry in 467 Patients with Papillary Thyroid Cancer
Data are reported as numbers (percentages) or medians (interquartile ranges).
There is missing data.
TNM, tumor-node-metastasis.
Genetic differences
As shown in Table 3, there were no differences in the distribution of histological subtypes across ancestral groups (p = 0.839). For the proportion of the tall cell variant, one of the aggressive variants of papillary THCA, individuals of East Asian descent had the highest percentage (10%) of this variant. There was also no association between oncogenic drivers and genetic ancestry (p = 0.662). Among the ancestral groups, fusion drivers were present in 21% of individuals of East Asian descent and 27% of individuals of South Asian descent, which corresponds to the relatively younger ages of these groups. However, fusion drivers were not present in individuals of Native/Latin American descent. The percentage of TERT promoter mutations ranged from 7.69% to 12.50%, with no statistical difference among the groups (p = 0.626; data not shown).
Genetic Alterations and Thyroid Differentiation Scores Stratified by Genetic Ancestry in 467 Patients with Papillary Thyroid Cancer
Data are reported as numbers (percentages) or medians (interquartile ranges).
TDS was derived from mRNA expression levels of 16 thyroid function genes (DIO1, DIO2, DUOX1, DUOX2, FOXE1, GLIS3, NKX2-1, PAX8, SLC26A4, SLC5A5, SLC5A8, TG, THRA, THRB, TPO, and TSHR), with higher scores indicating a more differentiated status. Previous studies indicate that BRAF-like tumors typically exhibit lower TDS (Fagin and Wells, 2016). In this study, a marginal difference (p = 0.088) in TDS across ancestral groups was observed. Individuals of Native/Latin American descent had the lowest median TDS of −1.14, likely due to the highest proportion of BRAF V600E mutations in this population.
A PCA plot of the bulk transcriptome showed substantial overlap among ancestral groups (Fig. 1). This indicates that genetic ancestry had little impact on overall gene expression, consistent with the similarity of oncogenic drivers.

Principal component analysis plot of the bulk transcriptome from 467 patients with papillary thyroid cancer, grouped by genetic ancestry. PC1 and PC2 refer to the first and second principal components, respectively. PC, principal components.
Functional enrichment for specific ancestral groups
We performed WGCNA to identify potential molecular characteristics associated with specific ancestral groups. The correlation coefficients and significance levels are shown in Supplementary Figure S1 and Supplementary Table S1. For individuals of European descent, a negative correlation was observed in the ‘salmon4’ module (r = −0.093; p = 0.045), with enriched Reactome pathways including FGFR1 and G protein-coupled receptor signaling (Fig. 2). For individuals of East Asian descent, the ‘salmon4’ and ‘brown4’ modules displayed positive correlations (r = 0.127 and r = 0.114, respectively), with enriched pathways such as olfactory transduction and neuroactive ligand-receptor interaction. Individuals of African descent showed a positive correlation with the ‘darkgrey’ module (r = 0.159; p = 0.001), which was enriched for actin and/or actin-binding cytoskeletal proteins and voltage-gated ion channel molecules. The most notable pathways for individuals of African descent were the nicotinic acetylcholine receptor signaling pathway and cytoskeletal regulation by Rho GTPase.

The top three reactome pathways enriched in significant WGCNA modules related to specific genetic ancestry among 467 patients with papillary thyroid cancer. The dashed line indicates a false discovery rate of 5%. WGCNA, weighted gene co-expression network analysis.
Individuals of Native/Latin American descent were positively associated with the ‘purple,’ ‘yellow-green,’ ‘black,’ and ‘white’ modules (r = 0.269, 0.141, 0.101, and 0.100, respectively) and negatively associated with the ‘green’ module (r = −0.100; p = 0.031). These modules were enriched with microtubule motor proteins and intermediate filaments, linking the cell cycle and integrin signaling pathways. Lastly, individuals of South Asian descent showed a positive association with the ‘maroon’ module (r = 0.264; p < 0.001), which was enriched with the olfactory signaling pathway and organic anion transporters.
Discussion
As a prevalent health burden, unpacking the biological and social determinants of THCA is a public health priority. In this study, the inferred genetic ancestry demonstrated relatively good agreement with the reported race of patients with papillary THCA (Table 1). Genetic ancestry is an objective assessment of genetic signatures related to cancer. However, reported race/ethnicity still holds value in uncovering the complex relationship between socioeconomic and biological determinants of disease (Lord et al., 2022). These factors are particularly relevant to THCA, the majority of which follows an indolent course with a large subclinical reservoir of occult disease. Thus, the diagnosis of THCA may be influenced by differences in healthcare access across racial groups. In general, non-White patients are more likely to experience delays in referral, present with more advanced disease, be less likely to receive appropriate treatment, and typically have lower overall and disease-specific survival rates than White patients (Gillis et al., 2024).
Epidemiological studies have revealed varying effects of race and ethnicity on THCA. In the United States, a population-based study showed that Whites had the highest age-adjusted incidence rates, while African Americans had the lowest rates of THCA (Weeks et al., 2018). Furthermore, Asian patients, particularly Asian women, have the highest incidence-based mortality rates compared to other gender and racial groups (Patel et al., 2020). In a retrospective review at Henry Ford Health System, among patients with incidental thyroid nodules, 12.9% of White patients were found to have a thyroid malignancy compared to 4.9% of non-White patients (Iwata et al., 2018). By contrast, Asian and Black ethnic groups had higher incidence rates of THCA than the White ethnic group in England (Delon et al., 2022).
In the present study, and importantly, there were no significant differences in major outcome endpoints—overall survival, progression-free interval, and disease-free interval—among the ancestral groups. This finding is consistent with a previous review suggesting that the outcomes of THCA are greatly affected by socioeconomic status (Keane et al., 2017). Although racial and ethnic disparities can occur at multiple points along the cancer care continuum, including access to cancer prevention, prompt detection, time to treatment, and quality of treatment (Haque et al., 2023), ancestral variation at a genetic level does not determine outcomes for THCA.
Although the differences in oncogenic drivers did not reach statistical significance (Table 3), our analysis generally corresponds to a recent report showing that African ancestry exhibited fewer BRAF and more NRAS mutations in THCA (Arora et al., 2022). Among patients identified as African-American with radioiodine-refractory disease, the incidence of BRAF V600E was 21% in those with over 80% African ancestry, compared to 67% in those with less African ancestry (Hurst et al., 2019). In a recent analysis of the Genomics, Evidence, Neoplasia, Information, Exchange (GENIE) cohort, Black patients had more frequent CDKN2A and NRAS alterations, while Asian patients had a higher prevalence of ERBB4 mutations (Muquith and Hsiehchen, 2024). Furthermore, the proportions of TERT promoter mutations in THCA were comparable across ancestral groups in this study. This finding was consistent with an analysis using the GENIE registry, which found no significant differences in TERT promoter mutations across racial and sex subgroups among patients with THCA (El Zarif et al., 2024).
In addition, we observed that individuals of East Asian descent were associated with the highest median TDS in this study. This corroborates a study of patients with BRAF-mutant papillary THCA, which found that Blacks and Hispanics were overrepresented in the TDS-low group, while Asians comprised a fifth of the TDS-high group (Boucai et al., 2022). However, given the lack of significant differences in the distribution of oncogenic drivers and TDS, our findings suggest that genetic ancestry has a subtle effect on carcinogenesis and tumor differentiation.
Our WGCNA analyses identified several ancestry-associated signatures (Fig. 2), although the PCA denoising algorithm indicated substantial overlap in the transcriptome among racial groups. Our previous study revealed that gene expression profiles of THCA primarily depend on the distinction between BRAF-like and RAS-like classes (Hsu et al., 2019). Despite similarities in oncogenic drivers, differences in transcriptome clustering may arise from variations in the allele frequencies of specific genetic variants across different ancestral genomes. A case-control study from the EPITHYR consortium detected several susceptibility loci for THCA, including 2q35, 8p12, 9q22.33, and 14q13.3 (Truong et al., 2021). In another pan-cancer analysis, non-White patients had a lower rate of pathogenic germline variants compared to non-Hispanic Whites and patients of Ashkenazi Jewish heritage (Liu et al., 2022b). Furthermore, ancestry-associated genetic landscapes may interact intricately with environmental influences. For instance, a recent study disclosed that smoking status interacts with TP53 mutation rates in an ancestry-specific manner (Jiagge et al., 2023).
There are several caveats and limitations of the present study. The small number of patients of Native/Latin American and South Asian descent is likely not representative. This limited sample size for minority groups may affect the statistical power and generalizability of our findings. In fact, real-world research often suffers from the issue of limited ancestral diversity (Cheung et al., 2023). In addition, the phenotypes and genotypes of tumors can arise from interactions between inherited genetic predispositions and external influences. The tissue sources for the TCGA program were obtained from multiple centers across the United States. We are unable to thoroughly explore potential interactions between genetic ancestry and environmental or lifestyle factors. In a longitudinal cohort study using healthcare administrative data from all residents of Ontario, immigrants from Southeast and East Asia had a significantly higher incidence of THCA than non-immigrants (Shah et al., 2017). Interestingly, immigrants had fewer ambulatory primary care visits and diagnostic imaging tests than non-immigrants, alleviating concerns about differences in healthcare-seeking behavior.
It should be noted that our findings were not validated in independent cohorts, which limits the robustness and comparability of the results. In addition, we did not perform functional validation of these findings. The biological significance of these signatures thus remains speculative without further experimental evidence. Nevertheless, a strength of this study is that it included analyses of all genetic ancestry groups, which is consistent with current recommendations (Feero et al., 2024). Moreover, the TCGA dataset used in this study was well-curated and validated, providing a solid foundation for comprehensive genomic and transcriptomic analyses.
Conclusions
The present multi-layered analyses suggest that hereditary ancestral traits may be biologically insignificant for papillary THCA. The phenotypes of THCA are likely determined primarily by oncogenic drivers, which are influenced to a lesser extent by inherited genetic predispositions. Racial disparities may be more attributable to lifestyle factors, environmental exposures, and healthcare infrastructure.
Footnotes
Authors’ Contributions
S.Y.C.: Conceptualization, data curation, formal analysis, and writing––original draft. Y.C.H.: Conceptualization, formal analysis, validation, and writing––review and editing. S.P.C.: Conceptualization, data curation, formal analysis, validation, and writing––original draft. All authors have read and agreed to the published version of the article.
Author Disclosure Statement
The authors declare they have no conflicting financial interests.
Funding Information
This work was supported by research grants from the National Science and Technology Council of Taiwan (NSTC-113-2314-B-195-015) and MacKay Memorial Hospital (MMH-11404 and MMH-E-114-08).
Abbreviations Used
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
