Abstract

The diagnosis of papillary thyroid carcinoma (PTC) primarily relies on routine histopathological evaluation. Pathologists assess nuclear features, growth patterns, and other microscopic attributes to classify PTC into its appropriate subtype—an essential step, as certain subtypes are more likely to exhibit aggressive clinical behavior. 1,2 Moreover, the degree of nuclear atypia often correlates with the underlying genetic alteration. 1 For example, PTCs harboring BRAF-like aberrations—most commonly the BRAFV600E point mutation—typically show marked nuclear changes, whereas RAS-like PTCs generally display only mild nuclear atypia. 1,2 In addition, fusion-driven PTCs often show multinodularity and prominent intratumoral fibrosis. 3 These morphological-genotypic correlations allow experienced endocrine pathologists to make informed predictions about the tumor’s molecular profile, which has direct implications for prognosis and treatment planning. 4
In addition to morphology, pathologists may employ mutation-specific immunohistochemical stains (such as BRAFp.V600E and pan-RAS p.Q61R) or submit tissue for molecular testing. 1 Genetic alterations, including BRAF mutations and gene fusions involving RET, ALK, or NTRK, serve not only as prognostic markers but also as therapeutic targets in the era of precision oncology. 4 However, detecting these mutations typically requires next-generation sequencing, polymerase chain reaction, or fluorescence in situ hybridization—methods that are costly, labor-intensive, and time-consuming. In this context, artificial intelligence (AI) offers a promising and efficient alternative. By extracting molecular-level insights directly from routine histopathology slides, AI has the potential to accelerate diagnosis, reduce cost, and expand access to advanced diagnostics, especially in resource-limited settings.
In the study under discussion, the authors introduce an AI pipeline based on Vision Transformers (ViTs) capable of predicting key genomic alterations directly from hematoxylin and eosin (H&E) stained slides of PTC. 5 Using a development cohort of 496 patients from The Cancer Genome Atlas (TCGA) cohort 6 and an independent external test cohort of 166 patients from the University Medical Center Mainz in Germany, the study assessed the model’s ability to predict BRAF, panRAS, and gene fusion status. All cases were annotated by pathologists and had undergone molecular characterization using standard protocols.
The choice of ViTs is noteworthy. These models, adapted from natural language processing, offer improved interpretability and robustness over traditional convolutional neural networks. By dividing whole-slide images into small, annotated tiles and applying image augmentations, the authors created a balanced and standardized training set. The model was trained using fivefold cross-validation on the TCGA data and then evaluated on the Mainz cohort to ensure external validity. 5
The results were eye-catching. In the test cohort, the model achieved area under the receiver operating characteristic curve values of 0.882 for BRAF mutations, 0.876 for panRAS mutations, and 0.858 for gene fusions. 5 Accuracy scores ranged from approximately 79% to 89% across the different genotypes. These figures suggest that AI can approach, and potentially match, traditional pathology-based assessment in performance—without requiring additional laboratory tests.
What sets this study apart is not just its predictive accuracy but also its use of techniques to uncover novel morphological features associated with genetic alterations. By identifying the most predictive tiles for each mutation class, the authors found that fusion-associated tumors were more likely to exhibit clear cytoplasm, microfollicular patterns, and calcifications. 5 In contrast, wild-type tumors showed darker cytoplasm and larger follicles. To test whether these AI-discovered features could improve diagnostic performance, pathologists were asked to predict fusion-positive cases on whole-slide images before and after being trained on these new features. Their accuracy improved significantly, rising from near-random levels to over 80% for fusion prediction. This demonstrates a powerful synergy between AI and human expertise, in which machine-generated insights meaningfully enhance diagnostic accuracy. Notably, the histological features identified—though not traditionally recognized as indicators of gene fusions—could be integrated into routine practice pending proper validation. The study also included a comparison between AI-educated and “untrained” pathologist performance for BRAF and panRAS mutations, but no significant differences were observed, suggesting that AI-guided education did not enhance diagnostic accuracy for these alterations, likely due to the already well-established morphological features associated with them. 1,2,5
The clinical implications are broad. In settings where access to subspecialized endocrine pathologists and molecular testing are not readily available, an AI-assisted review of histology slides could help prioritize cases for further testing or treatment planning. Even in well-resourced environments, the ability to flag likely mutation-positive cases could streamline workflows and reduce time to treatment. As molecular-targeted therapies become more common, rapid and affordable identification of actionable mutations will become increasingly important. 4
A limitation of the study is that the model did not differentiate between mutation subtypes—such as BRAFV600E versus non-V600E—or between specific fusion partners. 5 Grouping all gene fusions into a single category overlooks the distinct therapeutic implications of individual alterations; for example, RET and NTRK fusions are targeted by different agents. 4 Future models should aim to classify these mutations with greater granularity, as such distinctions are clinically important. Furthermore, although the model demonstrated strong performance on an external test cohort, its generalizability to broader populations remains unproven, particularly in light of geographic variability in mutation prevalence. The study also did not evaluate the model’s prognostic capabilities across different PTC subtypes. Given that certain subtypes are enriched for specific genetic alterations, such as the well-established association between diffuse sclerosing PTC and gene fusions, the lack of histological subtype stratification represents an additional limitation. 1
Despite these drawbacks, the study marks a significant step forward. It not only confirms the feasibility of predicting molecular alterations from standard histology but also shows that AI can discover previously unrecognized features with diagnostic relevance. While subspecialized pathologists are proficient at predicting BRAF and panRAS mutations, the study provides compelling evidence that deep learning models can also accurately identify clinically significant mutations and gene fusions in PTC from routine H&E slides. More importantly, these models can enhance human diagnostic performance through explainable and teachable outputs. Although further refinements are needed, this work offers a promising glimpse into a future where AI supports pathologists in assessing thyroid cancer and extracting greater diagnostic value from standard histological slides.
Footnotes
Author’s Contributions
C.C.J. conceptualized, wrote the original draft, and revised this article.
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
