Risk Stratification Tools for Thyroid Cancer: A Systematic Review of Models Combining Ultrasound,Cytology,and Clinical Risk Factors

Abstract

Background:

The rising incidence of thyroid cancer presents a growing diagnostic and therapeutic challenge. Various risk stratification systems have sought to integrate clinical, ultrasonographic, and, in some cases, cytological features to aid malignancy prognostication. This systematic review aims to critically evaluate risk stratification tools (RSTs) for patients with thyroid nodules, which incorporate multimodal inputs to assess their diagnostic performance and clinical utility in supporting surgical decision-making.

Methods:

PubMed, Embase, and Cochrane databases were searched from inception to 04/13/2026, identifying studies evaluating multivariable risk prediction models for adult patients undergoing assessment of thyroid nodules. Studies were excluded if the proposed tool failed to incorporate clinical features, ultrasound findings, and cytology results or was not validated with histology. Data extraction encompassed methodology of model development, performance metrics, and approaches to validation. Risk of bias was assessed using the PROBAST+AI tool.

Results:

Seven studies describing five distinct RSTs met inclusion criteria Thyroid Nodule App (TNAPP), the McGill Thyroid Nodule Score (MTNS), CUT Score, Memorial Sloan Kettering Cancer Centre (MSKCC) nomogram, and Thyroid Prediction Score (TiPS). TiPS demonstrated the highest sensitivity (96.2%) and specificity (97.5%) with area under the curve (AUC) >0.9. The CUT score also showed strong performance (AUC >0.9), particularly in low-to-intermediate risk nodules. TNAPP underperformed (accuracy 50.5%; specificity 27.5%) despite broad clinical inputs. The MTNS and MSKCC, although promising for indeterminate cytology, lacked robust validation. Most models were derived from single-center, retrospective cohorts, limiting generalizability.

Conclusions:

RSTs integrating multimodal data may improve thyroid nodule risk stratification, particularly in cases of indeterminate cytology. However, methodological limitations and lack of external validation currently restrict clinical utility. Prospective evaluation in diverse populations is required to identify the most effective and generalizable tools. Until then, RSTs should be used as adjuncts to, not replacements for, clinical judgment and shared decision-making in thyroid nodule assessment.

Keywords

thyroid neoplasms risk assessment ultrasonography cytodiagnosis biopsy fine needle

Introduction

The rising global incidence of thyroid cancer poses significant challenges for international health systems and policy.¹ Improved access to ultrasonography and technological advancements have led to a surge in the detection of incidental nodules.^2,3 However, less than 10% of thyroid nodules are malignant; the majority of which are indolent microcarcinomas that carry favorable prognoses and do not impact overall survival.^2,4,5 Despite the rising incidence of thyroid cancer, mortality rates have remained stable, implying a trend toward overdiagnosis and overtreatment.⁶

Accurately distinguishing benign from malignant thyroid nodules prior to surgery remains a significant challenge. No single ultrasonographic parameter can reliably predict malignancy. However, combinations of sonographic characteristics (including taller-than-wide shape, irregular margins, and microcalcifications) correlate with an increased risk of cancer.^7–9 These high-risk ultrasound features have been integrated into variable international classification systems.^10–13 While fine-needle aspiration (FNA) cytology augments estimations of malignancy risk for suspicious nodules, its diagnostic value is limited.^14,15 Despite its high reported sensitivity and specificity,¹⁶ FNA is limited in its ability to conclusively rule out malignancy, especially for indeterminate results.¹⁷ Clinical factors, including age,^18,19 sex,²⁰ family history,^21,22 biochemical markers,^23,24 and prior radiation exposure,^25,26 are associated with thyroid cancer but vary in their predictive value, and no single factor is definitive.²⁷ Consequently, many patients undergo thyroidectomy for nodules later proven benign on final histopathology.^28,29

Several risk prediction models have been proposed, combining clinical, imaging, and cytological features into composite scoring systems. These models offer the potential to streamline decision-making, standardize risk assessment, endorse individualized care, reduce costs, and ultimately improve patient outcomes through reducing unnecessary interventions.^30–32 However, many of these tools are derived from narrowly defined patient subsets and lack rigorous external validation; hence, none are currently advocated for routine clinical use in the UK.¹¹

This systematic review aims to evaluate clinically usable multimodal risk stratification tools (RSTs) for thyroid nodules that integrate clinical, ultrasonographic, and cytological variables. In addition to comparing reported diagnostic performance, this review also examines the methodological quality, heterogeneity, and real-world applicability of these models.

Methods

This systematic review was conducted in accordance with the 2020 Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines.³³ A comprehensive search of PubMed, Embase, and Cochrane databases was conducted from inception to April 13, 2026 (Supplementary Data). Duplicate citations were removed after export using manual bibliographic matching within the reference management software. After removing duplicates, two blinded reviewers independently screened titles and abstracts, resolving all discrepancies through discussion. Full-text articles were assessed independently by the same two blinded reviewers, with any remaining discrepancies resolved by consultation with a third independent reviewer. Additional studies were identified through snowballing search strategies.

Inclusion criteria

This review included adult patients undergoing assessment for thyroid nodules. The intervention of interest was defined as thyroid RSTs, which combined ultrasound findings, cytology, and clinical risk factors as predictor variables to estimate individual risk of malignancy. The primary outcome was the presence of histologically confirmed malignancy.

Exclusion criteria

Studies were excluded if they involved pediatric patients, animals, or disease processes unrelated to thyroid cancer, such as toxic nodules. In addition, case reports, literature reviews, and articles not published in English were excluded. Multivariable models incorporating two of the three key predictor domains (ultrasound findings, cytology, and clinical risk factors) were excluded from the formal review dataset and PRISMA synthesis but were considered separately during the discussion for contextual comparison. Similarly, studies focusing on risk stratification within narrowly defined patient subsets (such as nodules with indeterminate cytology or specific ultrasound classification categories) were excluded from the formal review dataset and PRISMA analysis but were considered separately in the narrative discussion for contextual comparison. Studies focusing on RSTs designed to predict cervical lymph node metastasis, rather than the risk of primary thyroid malignancy, were not included. RSTs developed specifically to predict a single histological subtype of thyroid malignancy (e.g., papillary or medullary thyroid cancer), rather than overall thyroid cancer risk, were excluded from formal analysis. Finally, for the purposes of this review, a clinically usable tool was defined as one that provided a reproducible patient-level output using specified inputs and an explicit framework for risk estimation or decision-making. Machine learning models were therefore excluded if their output could not be translated into a reproducible patient-level risk estimate or decision aid suitable for routine clinical use, for example, a point-based score, nomogram, clinical calculator, or explicit decision rule as opposed to a study reporting multivariable logistic regression alone or a machine learning model without a directly applicable clinical output.

Data extraction

Data were extracted regarding study design, methodology, population characteristics, predictor variables, model performance, and validation. Patient characteristics (including mean age, gender distribution, and prevalence of malignancy) were collated to evaluate the comparability of baseline populations. Data extraction was performed independently by both reviewers.

Data synthesis and analysis

Descriptive statistics were used to summarize model characteristics and identify the most frequently utilized predictor variables. Diagnostic performance was assessed using area under the curve (AUC), C-statistics, sensitivity, and specificity, where available. In accordance with established thresholds, AUC or C-statistic values of 0.5–0.6 indicated poor performance, 0.6–0.7 adequate performance, and values above 0.7 good performance.³⁴

Risk of bias assessment

Included studies were assessed for their risk of bias using the Prediction Model Risk Of Bias Assessment Tool (PROBAST+AI).³⁵ An overall low-risk rating was only assigned when all domains were judged as low risk. Two reviewers independently conducted the risk of bias assessments. For studies that developed or analyzed multiple similar models, a single risk of bias assessment was conducted.

Institutional Review Broad waiver

Institutional Review Board approval was not required for this systematic review, as it does not involve human subjects or identifiable personal data.

Results

Study selection

In total, 3124 records were identified, as outlined in the PRISMA flow diagram (Fig. 1). Following removal of duplicates and articles published in languages other than English, 2830 articles were eligible for screening. Title and abstract screening excluded 2524 records, leaving 306 articles for full-text review. Following full-text review, the most common reasons for exclusion were studies limited to multivariable statistical analysis without describing an RST (n = 241), models incorporating only two of the three key predictor variables (n = 40), and studies focused on narrowly defined patient subgroups (n = 37).

FIG. 1.

PRISMA flow diagram of exclusion criteria. PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analysis.

Characteristics of included studies

Seven studies met criteria for inclusion, describing five distinct RSTs for thyroid cancer: TNAPP,^36,37 the CUT Score,³⁸ the McGill Thyroid Nodule Score (MTNS),^39,40 TiPS,⁴¹ and the MSKCC nomogram⁴² (Table 1).

Table 1.

Characteristics of Included Studies

RST model	Author (year)	Setting/country	Method of development of RST	Validation type	No. of patients (no. of nodules)	Nodule size (mm)	Age (years)	Male participants (%)	Malignancy rate (%)
TNAPP	Garber et al. (2021)³⁶	Multicenter 3 US 2 European	Expert consensus	Internal, retrospective	95 (95)	—	—	—	—
TNAPP	Triggiani et al. (2023)³⁷	Single-center Italy	—	External, retrospective	112 (188)	14^a	55^a	19	42
McGill Thyroid Nodule Score	Sands et al. (2011)³⁹	Single-center US	Expert consensus	Internal, retrospective	844 (–)	—	53.1^b	17.2	63
McGill Thyroid Nodule Score	Varshney et al. (2015)⁴⁰	Single-center US	—	External, retrospective	437 (–)	—	54.4^b	21.5	59
CUT Score	Ianni et al. (2016)³⁸	Single-center Italy	Meta-analysis-based modeling	Internal, prospective	683 (705)	17.5^b	54^b	23.9	53
MSKCC Score	Nixon et al. (2010)⁴²	Single-center US	Multivariate logistic regression modeling	Internal, retrospective	158 (172)	18^a	55^a	33	45
TiPS	Sarayu et al. (2025)⁴¹	Single-center India	Threshold modeling	Internal, prospective	200 (295)	21.03^b	43^b	11.4	12.5

Median.

Mean.

MSKCC, Memorial Sloan Kettering Cancer Centre; RST, risk stratification tool; TiPS, Thyroid Prediction Score; TNAPP, Thyroid Nodule App.

TNAPP³⁶ and MTNS³⁹ were developed through expert consensus, rather than statistical modeling. TNAPP was validated on 95 cases submitted from multiple centers to test interface usability.³⁶ However, the absence of clear case selection criteria raises concerns about the potential for selection bias. The MTNS underwent more robust internal validation, with retrospective application to 844 patients who underwent surgery. Ianni et al. derived their CUT score following meta-analysis of over 30,000 nodules, with a small prospective validation (n = 110).³⁸ Most tools were developed or validated at single centers, raising concerns about generalizability and potential overfitting.⁴³

Malignancy rates varied markedly across studies, from 12.5% in the TiPS cohort⁴¹ to 53–63% in CUT and MTNS cohorts,^38–40 likely reflecting referral and selection biases. Age distributions were generally comparable (mean/median 53–55 years),^37–40,42 except for TiPS (mean age 43 years)⁴¹ and TNAPP, which did not report demographics.³⁶ A predominantly female population distribution was noted across all studies, although male representation ranged from 11.4%⁴¹ to 33%,⁴² which may influence predictive performance. Baseline nodule size, where reported, was also broadly comparable across cohorts. However, the MTNS development cohort reported by Sands et al. appeared to include relatively larger nodules, with 73% measuring >2 cm; this may partially account for the higher malignancy rate of 63% observed in that study. Although several other potentially relevant baseline features were captured in individual studies, including TSH, family history, prior irradiation, and multiplicity, their reporting was too sporadic to allow meaningful comparison across cohorts. Papillary thyroid carcinoma (PTC) formed the predominant malignancy in all cohorts, although the spectrum of histological subtypes differed. The MSKCC cohort included the broadest range of thyroid cancers, likely due to its tertiary referral setting.⁴² Some studies did not report histological subtypes, limiting comparability and applicability to diverse patient populations.^36,40

Eligibility criteria were heterogenous and often restrictive. TNAPP excluded high-risk features to focus on low-to-moderate risk nodules.^36,37 The MSKCC nomogram⁴² and MTNS^39,40 were restricted to surgical cohorts. While histopathology offers a definitive gold standard outcome, this limits the generalizability of RSTs for broader outpatient practice. TiPS⁴¹ excluded pregnant, thyrotoxic, and patients with previous thyroid cancer. In contrast, eligibility criteria for the CUT score were insufficiently reported,³⁸ limiting assessment of its external applicability.

Characteristics of risk prediction models

All five models incorporated clinical, ultrasound, and cytological variables, though with differing breadth and methodological approaches (Table 2). The MTNS produced a 22-item weighted scoring system based on expert consensus, while the MSKCC nomogram was the only model developed purely through multivariable logistic regression. TiPS represented the simplest approach, combining basic clinical data with established ACR-TIRADS and Bethesda classification systems. The TNAPP tool included the broadest range of clinical factors, while the CUT score limited clinical variables to those supported through prior meta-analysis. TSH was the most consistently used clinical parameter, employed in four out of five models.

Table 2.

Overview of Risk Stratification Tool Components

RST model	Clinical risk factors	Ultrasound features	Cytology
TNAPP	Factors favoring FNA: • Previous head and neck irradiation • Compressive symptoms (dysphonia, dysphagia, dyspnea) • Nodule position (posterior, capsular, or tracheal) • History of documented growth (i.e., >50% increase in volume in <1 year, or 20% increase in one dimension) • Sudden nodule enlargement • Planned thyroid or parathyroid surgery • Cosmetic concern • Patient preference or anxiety • Protocol requiring documentation of cancer Factors against FNA: • Low TSH • Autonomous nodule on imaging • Prior benign FNA of the same nodule • Pregnancy or other medical condition takes precedence • Significant comorbidity • Life expectancy <1 year • Previous lobectomy with vocal cord paralysis	AACE/AME/TNAPP (US 1–3)	Bethesda category (I–VI) Molecular result
McGill Thyroid Nodule Score	Male sex Age >45 years Palpable nodule TSH >1.4 mIU/L Hard consistency Ionizing radiation exposure Family history of thyroid cancer Ethnicity (Fillipino, Hawaii, Iceland)	Hypoechoic Increased vascularity Taller-than-wide shape Coarse calcifications or microcalcifications Enlarging >10% or >30% Lymphadenopathy Size 2–2.9 cm or 3–3.9 cm PET scan focally positive	Hurthle cell lesion Favors neoplasm Atypia (mild, moderate, or severe) HBME-1 positivity BRAF mutation present
CUT Score	Family history of thyroid cancer Previous head and/or neck irradiation male sex	Nodular height greater than width Absent halo sign Microcalcifications Irregular margins Hypoechogenicity Solid nodule structure Intranodular vascularization Nodule size ≥4 cm Single nodule	Italian Thy Score (Thy1–Thy5)
MSKCC Score	TSH	Shape Echotexture Vascularity	Nuclear grooves Pseudoinclusions Cellularity Colloid
TiPS	Age <40 years TSH >2 mIU/L (or on treatment for hypothyroidism)	ACR-TIRADS Score (1–5)	Bethesda category (II–VI)

FNA, fine-needle aspiration; MSKCC, Memorial Sloan Kettering Cancer Centre; RST, risk stratification tool; TiPS, Thyroid Prediction Score; TNAPP, Thyroid Nodule App.

Ultrasound features including hypoechogenicity, irregular margins, and microcalcifications were common across all models, although weighting differed. TiPS relies entirely on the ACR-TIRADS, boosting interobserver consistency. By contrast, the CUT score assigns odds-ratio-based weights to each sonographic parameter. The MTNS adopts a broader approach, including PET positivity and lymphadenopathy, while the MSKCC nomogram narrows its focus to shape, echotexture, and vascularity, emphasizing cytology instead. The TNAPP blends AACE/AME and ACR TI-RADS frameworks into a simplified three-tier system.

When incorporating cytology, most models relied on established classification systems: TNAPP, MTNS, and TiPS integrate the Bethesda system to varying extents, while the CUT score uses the Italian Thy system.⁴⁴ The MSKCC nomogram was the only tool to incorporate detailed cytological parameters not formally aligned with any traditional classification system. The TNAPP and MTNS also incorporated selective molecular markers (like BRAF) and other immunohistochemical markers.

The primary outcome across all RSTs was the estimation of malignancy risk. The TNAPP tool provided the most granular output, including malignancy probability ranges and management recommendations. However, its malignancy probability estimates were partly based on data that predate the reclassification of Noninvasive Follicular Thyroid Neoplasm with Papillary-like Nuclear Features and may overestimate true malignancy rates. The CUT score stratifies nodules into low, intermediate, or high risk but without providing management guidance. The MTNS, TiPS, and MSKCC nomogram offer numerical risk estimates to support decision-making without recommending management.

Model performance

Considerable heterogeneity was observed in diagnostic performance across models (Table 3). Diagnostic performance varied according to the thresholds applied within each model. TiPS demonstrated excellent performance at a score threshold ≥6 with a sensitivity of 96.2% and excellent predictive values [negative predictive value (NPV) 99.5%, positive predictive value (PPV) 83.3%], outperforming ACR TI-RADS alone in the same cohort. The CUT score showed similarly high sensitivity (95% for scores >2.5) with AUC values >0.9, indicating strong discrimination. In contrast, TNAPP demonstrated lower overall accuracy (50.5%) and low specificity (27.5%), offering limited improvement over existing guideline-based approaches.

Table 3.

Summary of Comparative Risk Stratification Tool Performance

RST model	Sensitivity (%)	Specificity (%)	Accuracy (%)	AUC
TNAPP	82.5	27.5	50.5	—
CUT Score				>0.9
>2.5	95	60	77
>5	69	96	95
MSKCC Score	—	—	—	91% (C-index)
TiPS Score >6	96.2	97.5	—	>0.9

AUC, area under the curve; MSKCC, Memorial Sloan Kettering Cancer Centre; RST, risk stratification tool; TiPS, Thyroid Prediction Score; TNAPP, Thyroid Nodule App.

Specificity was highest for TiPS (97.5% for score thresholds ≥6). Similarly, the CUT score achieved specificities of up to 95% for higher thresholds, although this was accompanied by reduced sensitivity. In contrast, TNAPP displayed the weakest specificity (27.5%) and predictive values (PPV 60.4%, NPV 44.2%), suggesting utility primarily for ruling in rather than ruling out malignancy. The MTNS and MSKCC models were developed specifically for indeterminate cytology (Bethesda III/IV). MTNS provided clear risk gradients, with scores >9 corresponding with malignancy rates of 63% and scores of 7 correlating with a 32% risk of malignancy. However, no formal sensitivity, specificity, accuracy, or receiver operating characteristic data were defined. By contrast, the MSKCC nomogram achieved excellent discrimination (C-index 0.91), outperforming cytology alone.

This comparative analysis highlights the differing clinical priorities of each system: ACR TI-RADS and TNAPP prioritize sensitivity, whereas TiPS, CUT, and the MSKCC nomogram provide more balanced performance profiles with higher specificity and overall diagnostic utility.

External validation of model performance

External validation of MTNS has shown consistent performance in intermediate nodules, with accuracy up to 91.4% for higher score thresholds.^45,46 Scheffler et al. proposed MTNS+ as an extended version of the score, demonstrating that the incorporation of thyroglobulin may further improve sensitivity by up to 10.5%, particularly at lower score thresholds >7.^47,48 However, specificity and positive predictive value were not substantially improved.

The CUT score demonstrates mixed outcomes during external validation. While some cohorts demonstrate moderate discriminative ability,^49,50 others report poor performance.⁵¹ Thus, while the CUT score demonstrated promise in select clinical cohorts, its generalizability beyond the original Italian cohort appears limited, underscoring the need for population-specific external validation and contextual calibration for RSTs.

Risk of bias

The risk of bias and applicability of the seven included prediction models were independently assessed by two authors using the PROBAST+AI tool³⁵ (Fig. 2 and Tables 4 and 5). Most models exhibited an unclear risk of bias, with a high risk of bias predominantly in the analysis domain. Common issues included small sample sizes, inadequate handling of missing data, retrospective designs without blinding, and lack of appropriate internal or external validation. Several models did not adequately address potential overfitting or provide sufficient statistical justification for predictor selection and weighting. While predictor and outcome definitions were generally consistent and clinically appropriate (low risk), the use of case-cohort designs and exclusion of nonsurgical cases in several studies raised concerns surrounding representativeness.

FIG. 2.

(A) Risk of bias assessment using PROBAST+AI for the five included prediction models. (B) Summary of applicability assessment using PROBAST+AI for the five included prediction models.

Table 4.

Study-Level Risk of Bias Assessment Using PROBAST+AI for the Five Included Prediction Models

RST	Participants	Predictors	Outcomes	Statistical analysis
TiPS	Low	Low	Unclear	Unclear
MTNS	Low	Unclear	Low	Unclear
TNAPP	High	Unclear	Unclear	High
MSKCC	Unclear	Low	Low	Unclear
CUT	High	Unclear	Unclear	High

MSKCC, Memorial Sloan Kettering Cancer Centre; MTNS, McGill Thyroid Nodule Score; TiPS, Thyroid Prediction Score; TNAPP, Thyroid Nodule App.

Table 5.

Study-Level Applicability Assessment Using PROBAST+AI for the Five Included Prediction Models

RST	Participants	Predictors	Outcomes
TiPS	Low	Low	Low
MTNS	Unclear	Low	Low
TNAPP	High	Low	Low
MSKCC	Unclear	Unclear	Low
CUT	High	Low	Low

MSKCC, Memorial Sloan Kettering Cancer Centre; MTNS, McGill Thyroid Nodule Score; TiPS, Thyroid Prediction Score; TNAPP, Thyroid Nodule App.

Discussion

Summary of main findings

By focusing specifically on clinically usable multimodal tools, rather than isolated predictors or broader predictive modeling studies, this review highlights the small number of models that have progressed to patient-level application and clarifies the key barriers preventing wider clinical implementation.

Across the five included RSTs, ultrasound features like hypoechogenicity, microcalcifications, irregular margins, and nodule shape were the most consistently used predictors. TSH emerged as the most frequently incorporated clinical variable, and cytological assessment remained central to most models. Among the models, TiPS demonstrated the strongest diagnostic performance, with a high Youden Index indicating an excellent balance between sensitivity and specificity. The CUT score performed well at lower thresholds but lost sensitivity as specificity increased. TNAPP demonstrated poor overall accuracy, with little benefit beyond existing guidelines. External validation was limited, with only TNAPP and MTNS evaluated outside of their development cohorts, but only in restricted, single-center populations.

Overall, TiPS appeared to be the most promising tool in terms of reported diagnostic performance. This may reflect its use of a smaller number of standardized, high-yield variables, which could improve reproducibility compared with more complex tools incorporating numerous heterogenous parameters. However, this should be interpreted cautiously, given this model is recently published, single-center, and has not yet undergone external validation. Its apparent superiority may therefore reflect cohort-specific performance, and prospective multicenter validation is required to establish its generalizability. However, direct comparison of reported sensitivity, specificity, and AUC across models should be interpreted cautiously, as the included RSTs were developed for different clinical scenarios, target populations, and different clinical purposes, including general thyroid nodule assessment, indeterminate cytology, and decision-making. Apparent differences in performance may therefore reflect differences in case mix and intended use, rather than true superiority of one model over another.

Implications for practice

RSTs aim to improve cancer diagnosis and management by providing personalized, objective estimates of malignancy risk, with benefits proven across multiple other medical specialties.^52–56 These tools purport to enhance patient communication, reduce overtreatment, support shared decision-making, and ensure timely intervention for high-risk cases.^57–59 However, real-world implementation faces challenges, including model complexity, data input requirements, patient acceptability, and integration into clinical workflows.⁵⁷

Risk stratification holds particular value in overstretched cancer pathways, where the rising incidence of thyroid cancer demands more efficient, balanced approaches to diagnosis.⁶⁰ Such models could help avoid unnecessary FNAs and operations, reduce health care expenditure, and minimize patient harm.^61,62 Beyond individual patient management, RSTs could inform broader population health strategies,⁵⁷ improve cost-effectiveness,³¹ and serve as valuable tools for medical education and research.⁶³

However, applicability may differ across health care systems. In the United States, molecular testing is now increasingly integrated into the management of indeterminate thyroid nodules and may reduce the incremental use of nonmolecular RSTs in routine practice.^64,65 While molecular testing can improve diagnostic stratification, its widespread availability may also encourage reliance on expensive adjunctive testing in situations where structured risk assessment and clinical judgment may otherwise be sufficient. By contrast, in resource-limited settings where molecular testing is less available, less affordable, or not routinely reimbursed, multimodal RSTs may offer greater practical value.^66–68 This further emphasizes the importance of robust external validation across different health care systems and resource settings.

Risk stratification tools incorporating multimodal inputs in restricted patient cohorts

Eleven other multimodal RSTs were identified but excluded from formal analysis due to their narrowly defined patient cohorts, despite incorporating all three predictor domains.^69–79 Most models focused on patients with indeterminate cytology, demonstrating moderate to good discrimination (AUC 0.721–0.757)⁷⁰ and high negative predictive values (99.5%).⁶⁹

Other studies incorporated more detailed cytological criteria or used machine learning techniques, again demonstrating moderate to good discrimination (AUC 0.784–0.84).^71,72,79 Integrating molecular testing further improved accuracy. Models combining clinical, sonographic, cytological, and molecular data achieved superior discrimination, with AUCs up to 0.88, which outperformed molecular testing in isolation.^76,78 This underscores the potential role of multimodal RSTs in triaging indeterminate nodules suitable for molecular analysis. A smaller subset of RSTs targeted specific histological subtypes with mixed results. While one study found no reliable predictors for differentiating follicular adenoma from carcinoma,⁷³ another machine-learning model achieved high diagnostic accuracy when predicting follicular thyroid carcinoma (AUC 0.97).⁷⁵

Risk stratification tools using two key parameters

This systematic review identified over 30 RSTs incorporating only two parameters. Most combined ultrasound features with select clinical variables like age, TSH, autoimmune status, or family history. These models generally demonstrated good discriminative performance (AUCs 0.80–0.95), implying the integration of select clinical risk factors can meaningfully enhance ultrasound-based risk stratification and reduce unnecessary FNAs.^80–82 This moderate-to-strong performance is reflected in more recent studies, with the strongest RSTs integrating age as a clinical variable (AUC 0.84–0.948).^83–85 A few studies paired cytology with ultrasound to refine risk assessment following FNA.^86–89 For instance, the BETH-TR score integrated ACR-TIRADS with Bethesda scores, achieving strong diagnostic performance (92% sensitivity, 74% specificity, AUC 0.88).⁹⁰

Despite promising diagnostic performance, these high AUCs should be interpreted cautiously given most models were retrospective, single-center, and lacked external validation. Many RSTs were trained on high-risk surgical cohorts, which likely inflated diagnostic accuracy.^77,91,92 Advanced machine-learning or multimodal approaches often relied on niche biomarkers or nonroutine imaging (such as elastography or Doppler), limiting real-world applicability.^77,92–95 Extremely high AUCs (>0.95) reported in some studies likely reflect overfitting, small sample sizes, and restricted cohort selection.^43,77,92 Overall, while combining clinical data with ultrasound can improve predictive performance, two-parameter models should be used cautiously and primarily as decision-support tools until validated prospectively across diverse outpatient populations.

Limitations of existing models

Current best practice for evaluating suspicious thyroid nodules involves US FNA, although its utility is limited by interobserver variability,⁹⁶ operator subjectivity,⁹⁷ and modest diagnostic accuracy.⁴⁰ Cytology alone cannot reliably differentiate benign from malignant thyroid nodules, often necessitating diagnostic surgery.¹⁶

RSTs seek to address these limitations by integrating clinical, ultrasonographic, and cytological data. However, diagnostic accuracy alone does not establish clinical utility. For an RST to influence practice meaningfully, it must also demonstrate adequate calibration, provide clinically actionable and validated decision thresholds, and show downstream benefit in patient management, particularly in reducing unnecessary diagnostic surgery. These features were inconsistently reported across the included studies, limiting confidence in the practical applicability of otherwise promising models.

Their widespread use is further constrained by limited external validation, reliance on single-center retrospective cohorts, and uncertain calibration and performance across different health care contexts. Some models, like TNAPP, offer little improvement over established classification systems or depend on extensive clinical inputs, which may hinder routine use.

Although many models may improve malignancy risk estimation, their true clinical value hinges on defining actionable thresholds to ensure real-world applicability and meaningful integration into clinical decision-making. For instance, although models like the MTNS provide stratified malignancy risk estimates, there was no consensus on what thresholds should trigger intervention. Without clearly defined and clinically validated cut-off values, these estimates risk being academically interesting yet practically ambiguous.

Strengths and limitations

This systematic review employed a rigorous and transparent methodology and contemporary risk of bias assessment to comprehensively examine all RSTs for thyroid cancer, which incorporated clinical, ultrasound, and cytological features. By restricting inclusion to models with explicit patient-level application, the review moves beyond descriptive predictor studies and focuses on tools with potential relevance to clinical decision-making.

However, several limitations must be acknowledged. First, this review excluded machine learning models without interpretable outputs, which may have omitted emerging tools with significant predictive potential. The restriction to English-language publications introduces the risk of language bias, in addition to publication bias, as studies reporting poor model performance may be underreported in the literature. In addition, the included RSTs span a broad publication period, during which diagnostic pathways and practice patterns in thyroid nodule assessment have changed substantially. This temporal heterogeneity makes direct comparison more difficult and may limit the present-day applicability of older models.

A further challenge lies in the substantial heterogeneity of study design, inclusion criteria, outcome measures, statistical analysis, and predictor variables across included papers, rendering direct comparison of diagnostic performance difficult between RSTs. In addition, the clinical relevance of nonmolecular RSTs may vary by health care setting. In the United States, widespread access to molecular testing for indeterminate nodules may limit their incremental utility, whereas in resource-limited systems such tools may be of greater practical importance. This variability complicates the generalizability of findings, posing a challenge when considering which model is best suited for broad clinical implementation across diverse settings.

Future directions and research priorities

The significant heterogeneity identified across studies suggests that a single, universally applicable RST for all thyroid nodules and malignant subtypes may be difficult to achieve. Future research may therefore be more productive if directed toward clinically specific decision contexts, particularly indeterminate cytology and follicular-patterned lesions, where the main challenge is uncertainty around the need for diagnostic surgery. In this setting, the principal clinical value of RSTs is likely to lie in reducing unnecessary diagnostic hemithyroidectomy.

Robust, prospective, multicenter external validation is essential to establish the reliability of existing RSTs. Formal head-to-head comparisons of multiple RSTs within the same study population are needed to determine relative performance. Future tools should prioritize usability, providing clear, actionable guidance that goes beyond probabilistic estimates to support evidence-based clinical recommendations. Outside diagnostic performance, future research should also assess the broader health economic impact of RSTs, including their role in resource optimization and cost-effectiveness.

In parallel, there is potential for the development of user-friendly, digital decision-support tools to integrate validated models into everyday clinical workflows.⁹⁸ These platforms could allow for dynamic updates as new evidence emerges, including molecular markers. Artificial intelligence (AI) represents a promising avenue for enhancing risk prediction, as noted in other medical fields.^99–102 As AI-based models become more robust, interpretable, and clinically validated, they may alter future diagnostic pathways and reduce reliance on conventional rule-based RSTs based on current clinical, ultrasound, and cytological criteria. In addition, AI-driven image analysis of ultrasound images and cytology slides may uncover patterns not readily discernible with human assessment.¹⁰³ However, these approaches also face important limitations, including reduced interpretability, dependence on large high-quality datasets, risks of overfitting, and uncertainty regarding generalizability across different health care settings.^102,104,105 Such approaches should seek to complement, rather than replace, clinical judgment to support shared decision-making.

Take-Home Messages

•

Multimodal RSTs may improve thyroid nodule malignancy risk estimation, particularly for indeterminate cytology.

•

Current models are limited by retrospective design, single-center derivation, and limited external validation.

•

Simpler tools with standardized inputs may offer better reproducibility, but high-reported performance requires cautious interpretation.

•

At present, RSTs should be used as adjuncts to clinical judgment, rather than stand-alone decision tools.

•

Prospective multicenter validation and comparison with molecular and AI-based approaches are needed.

Conclusion

As the health care landscape continues to evolve, simple, reliable, and widely applicable RSTs represent a pragmatic, evidence-based approach to optimize thyroid cancer care while addressing the growing demands on health systems. Overall, the limitations of both conventional diagnostic pathways and existing RSTs highlight the need for prospective, external, multicenter validation studies in demographically diverse populations. While current RSTs show promise, further robust external validation is essential before they can be recommended for routine clinical use.

Authors’ Contributions

E.W.: Conceptualization (lead), methodology, formal analysis (lead), investigation (literature screening, narrative analysis, risk of bias assessment), and writing—original draft. Z.S.: Investigation (literature screening, risk of bias assessment), and writing—review and editing. K.B.: Conceptualization, methodology (lead), writing—review and editing, and supervision. N.S.: writing—review and editing (lead) and supervision (lead).

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Statement

No funding was received for this article.

Supplemental Material

References

1. Sanabria

, Kowalski

, Shah

, et al. Growing incidence of thyroid carcinoma in recent years: Factors underlying overdiagnosis. Head Neck 2018;40(4):855–866; doi: 10.1002/hed.25029

2. Alexander

, Cibas

. Diagnosis of thyroid nodules. Lancet Diabetes Endocrinol 2022;10(7):533–539; doi: 10.1016/S2213-8587(22)00101-2

3. Morris

LGT

, Sikora

, Tosteson

, et al. The increasing incidence of thyroid cancer: The influence of access to care. Thyroid 2013;23(7):885–891; doi: 10.1089/thy.2013.0045

4. Welch

, Doherty

. Saving thyroids—Overtreatment of small papillary cancers. N Engl J Med 2018;379(4):310–312; doi: 10.1056/NEJMp1804426

5. Al-Chalabi

, Karthik

, Vaidyanathan

. Radiological–pathological correlation of the British Thyroid Association ultrasound classification of thyroid nodules: A real-world validation study. Clin Radiol 2019;74(9):702–711; doi: 10.1016/j.crad.2019.05.026

6. Davies

. Overdiagnosis of thyroid cancer. BMJ 2016;355:i6312; doi: 10.1136/bmj.i6312

7. Iannuccilli

, Cronan

, Monchik

. Risk for malignancy of thyroid nodules as assessed by sonographic criteria: The need for biopsy. J Ultrasound Med 2004;23(11):1455–1464; doi: 10.7863/jum.2004.23.11.1455

8. Bonavita

, Mayo

, Babb

, et al. Pattern recognition of benign nodules at ultrasound of the thyroid: Which nodules can be left alone? AJR Am J Roentgenol 2009;193(1):207–213; doi: 10.2214/AJR.08.1820

9. Fish

, Langer

, Mandel

. Sonographic imaging of thyroid nodules and cervical lymph nodes. Endocrinol Metab Clin North Am 2008;37(2):401–417, ix; doi: 10.1016/j.ecl.2007.12.003

10.

10. Haugen

, Alexander

, Bible

, et al. 2015 American Thyroid Association management guidelines for adult patients with thyroid nodules and differentiated thyroid cancer: The American Thyroid Association guidelines task force on thyroid nodules and differentiated thyroid cancer. Thyroid 2016;26(1):1–133; doi: 10.1089/thy.2015.0020

11.

11. Perros

, Boelaert

, Colley

, et al.; British Thyroid Association. Guidelines for the management of thyroid cancer. Clin Endocrinol (Oxf) 2014;81 (Suppl 1):1–122; doi: 10.1111/cen.12515

12.

12. Russ

, Bonnema

, Erdogan

, et al. European Thyroid Association guidelines for ultrasound malignancy risk stratification of thyroid nodules in adults: The EU-TIRADS. Eur Thyroid J 2017;6(5):225–237; doi: 10.1159/000478927

13.

13. Lee

, Park

, Jung

, et al. A narrative review of the 2023 Korean Thyroid Association management guideline for patients with thyroid nodules. Endocrinol Metab 2024;39(1):61–72; doi: 10.3803/EnM.2024.1938

14.

14. Kloos

, Reynolds

, Walsh

, et al. Does addition of BRAF V600E mutation testing modify sensitivity or specificity of the Afirma gene expression classifier in cytologically indeterminate thyroid nodules? J Clin Endocrinol Metab 2013;98(4):E761–E8; doi: 10.1210/jc.2012-3762

15.

15. Lee

, Chung

K-W

, Min

, et al. Preoperative serum thyroglobulin as a useful predictive marker to differentiate follicular thyroid cancer from benign nodules in indeterminate nodules. J Korean Med Sci 2012;27(9):1014–1018; doi: 10.3346/jkms.2012.27.9.1014

16.

16. Tee

, Lowe

, Brand

, et al. Fine-needle aspiration may miss a third of all malignancy in palpable thyroid nodules: A comprehensive literature review. Ann Surg 2007;246(5):714–720; doi: 10.1097/SLA.0b013e3180f61adc

17.

17. Brooks

, Shaha

, DuMornay

, et al. Role of fine-needle aspiration biopsy and frozen section analysis in the surgical management of thyroid tumors. Ann Surg Oncol 2001;8(2):92–100; doi: 10.1007/s10434-001-0092-7

18.

18. Zaydfudim

, Feurer

, Griffin

, et al. The impact of lymph node involvement on survival in patients with papillary and follicular thyroid carcinoma. Surgery 2008;144(6):1070–1077; discussion 1077-8; doi: 10.1016/j.surg.2008.08.034

19.

19. Welch Dinauer

, Tuttle

, Robie

, et al. Clinical features associated with metastasis and recurrence of differentiated thyroid cancer in children, adolescents and young adults. Clin Endocrinol (Oxf) 1998;49(5):619–628; doi: 10.1046/j.1365-2265.1998.00584.x

20.

20. Rahbari

, Zhang

, Kebebew

. Thyroid cancer gender disparity. Future Oncol 2010;6(11):1771–1779; doi: 10.2217/fon.10.127

21.

21. Byun

S-H

, Min

, Choi

H-G

, et al. Association between family histories of thyroid cancer and thyroid cancer incidence: A cross-sectional study using the Korean genome and epidemiology study data. Genes (Basel) 2020;11(9):1039; doi: 10.3390/genes11091039

22.

22. Hemminki

, Eng

, Chen

. Familial risks for nonmedullary thyroid cancer. J Clin Endocrinol Metab 2005;90(10):5747–5753; doi: 10.1210/jc.2005-0935

23.

23. Wang

, Chang

, Jia

, et al. The blood biomarkers of thyroid cancer. Cancer Manag Res 2020;12:5431–5438; doi: 10.2147/CMAR.S261170

24.

24. Boelaert

. The association between serum TSH concentration and thyroid cancer. Endocr Relat Cancer 2009;16(4):1065–1072; doi: 10.1677/ERC-09-0150

25.

25. Thompson

, Mabuchi

, Ron

, et al. Cancer incidence in atomic bomb survivors. Part II: Solid tumors, 1958–1987. Radiat Res 1994;137(2 Suppl):S17–S67.

26.

26. Richardson

. Exposure to ionizing radiation in adulthood and thyroid cancer incidence. Epidemiology 2009;20(2):181–187; doi: 10.1097/EDE.0b013e318196ac1c

27.

27. Campanella

, Ianni

, Rota

, et al. Quantification of cancer risk of each clinical and ultrasonographic suspicious feature of thyroid nodules: A systematic review and meta-analysis. Eur J Endocrinol 2014;170(5):R203–R11; doi: 10.1530/EJE-13-0995

28.

28. Mathonnet

, Cuerq

, Tresallet

, et al. What is the care pathway of patients who undergo thyroid surgery in France and its potential pitfalls? A national cohort. BMJ Open 2017;7(4):e013589; doi: 10.1136/bmjopen-2016-013589

29.

29. Bartsch

, Dotzenrath

, Vorländer

, et al. Current practice of surgery for benign goitre-an analysis of the prospective DGAV StuDoQ|Thyroid Registry. J Clin Med 2019;8(4):477; doi: 10.3390/jcm8040477

30.

30. Haas

, Takahashi

, Shah

, et al. Risk-stratification methods for identifying patients for care coordination. Am J Manag Care 2013;19(9):725–732.

31.

31. Pashayan

, Morris

, Gilbert

, et al. Cost-effectiveness and benefit-to-harm ratio of risk-stratified screening for breast cancer: A life-table model. JAMA Oncol 2018;4(11):1504–1510; doi: 10.1001/jamaoncol.2018.1901

32.

32. Bernstein

. New arrows in the quiver for targeting care management: High-risk versus high-opportunity case identification. J Ambul Care Manage 2007;30(1):39–51; doi: 10.1097/00004479-200701000-00007

33.

33. Page

, McKenzie

, Bossuyt

, et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021;372:n71; doi: 10.1136/bmj.n71

34.

34. Šimundić

A-M

. Measures of diagnostic accuracy: Basic definitions. EJIFCC 2009;19(4):203–211.

35.

35. Moons

KGM

, Damen

JAA

, Kaul

, et al. PROBAST+AI: An updated quality, risk of bias, and applicability assessment tool for prediction models using regression or artificial intelligence methods. BMJ 2025;388:e082505; doi: 10.1136/bmj-2024-082505

36.

36. Garber

, Papini

, Frasoldati

, et al. American Association of Clinical Endocrinology and associazione medici endocrinologi thyroid nodule algorithmic tool. Endocr Pract 2021;27(7):649–660; doi: 10.1016/j.eprac.2021.04.007

37.

37. Triggiani

, Lisco

, Renzulli

, et al. The TNAPP web-based algorithm improves thyroid nodule management in clinical practice: A retrospective validation study. Front Endocrinol (Lausanne) 2022;13:1080159; doi: 10.3389/fendo.2022.1080159

38.

38. Ianni

, Campanella

, Rota

, et al. A meta-analysis-derived proposal for a clinical, ultrasonographic, and cytological scoring system to evaluate thyroid nodules: The “CUT” score. Endocrine 2016;52(2):313–321; doi: 10.1007/s12020-015-0785-5

39.

39. Sands

, Karls

, Amir

, et al. McGill Thyroid Nodule Score (MTNS): “Rating the risk,” a novel predictive scheme for cancer risk determination. J Otolaryngol Head Neck Surg 2011;40 Suppl 1:S1–S13.

40.

40. Varshney

, Forest

V-I

, Mascarella

, et al. The Mcgill thyroid nodule score—does it help with indeterminate thyroid nodules? J Otolaryngol Head Neck Surg 2015;44(1):2; doi: 10.1186/s40463-015-0058-6

41.

41. Sarayu

, Nair

, Khader

, et al. Prospective validation of accuracy of American College of Radiologists- Thyroid Imaging Reporting and Data System (ACR-TIRADS) in diagnosing malignancy in Thyroid nodule and a Prediction Score (TiPS) for thyroid malignancy. Indian J Endocrinol Metab 2025;29(1):101–107; doi: 10.4103/ijem.ijem_324_23

42.

42. Nixon

, Ganly

, Hann

, et al. Nomogram for predicting malignancy in thyroid nodules using clinical, biochemical, ultrasonographic, and cytologic features. Surgery 2010;148(6):1120–1128; doi: 10.1016/j.surg.2010.09.030

43.

43. Collins

, Dhiman

, Ma

, et al. Evaluation of clinical prediction models (part 1): From development to external validation. BMJ 2024;384:e074819; doi: 10.1136/bmj-2023-074819

44.

44. Nardi

, Basolo

, Crescenzi

, et al. Italian consensus for the classification and reporting of thyroid cytology. J Endocrinol Invest 2014;37(6):593–599; doi: 10.1007/s40618-014-0062-0

45.

45. Al-Hakami

, Al-Mohammadi

, Al-Mutairi

, et al. McGill thyroid nodule score in differentiating thyroid nodules in total thyroidectomy cases of indeterminate nodules. Indian J Surg Oncol 2020;11(2):268–273; doi: 10.1007/s13193-020-01053-2

46.

46. Keleşoğlu

, Aydoğdu

, Büyükkasap

, et al. Investigating the applicability of the McGill Thyroid Nodule Score (MTNS) in patients undergoing surgery for thyroid nodules: A comparison between patients with and without papillary thyroid cancer. J Clin Pract Res 2024;46(4):363–369; doi: 10.14744/cpr.2024.16948

47.

47. Scheffler

, Forest

, Leboeuf

, et al. Serum thyroglobulin improves the sensitivity of the McGill thyroid nodule score for well-differentiated thyroid cancer. Thyroid 2014;24(5):852–857; doi: 10.1089/thy.2013.0191

48.

48. Khalife

, Bouhabel

, Forest

V-I

, et al. The McGill Thyroid Nodule Score’s (MTNS+) role in the investigation of thyroid nodules with benign ultrasound guided fine needle aspiration biopsies: A retrospective review. J Otolaryngol Head Neck Surg 2016;45(1):29; doi: 10.1186/s40463-016-0141-7

49.

49. Ianni

, Pascucci

, Paragliola

, et al. Follow-up or surgery for indeterminate thyroid nodules: Could the CUT score application be a support for decision-making in the preoperative assessment? Thyroid 2020;30(1):65–71; doi: 10.1089/thy.2018.0649

50.

50. Pinhas

, Tessler

, Bizer

, et al. Validating the “CUT score” risk stratification tool for indeterminate thyroid nodules using the Bethesda system for reporting thyroid cytopathology. Eur Arch Otorhinolaryngol 2022;279(1):383–390; doi: 10.1007/s00405-021-06783-9

51.

51. Shihabi

, Hussein

, Toraih

, et al. Accuracy of the “CUT” score for assessing malignancy in Bethesda 3 and 4 thyroid nodules in North American population: A retrospective study. Cancer Invest 2022;40(8):693–699; doi: 10.1080/07357907.2022.2077956

52.

52. Gross

, Patel

, Carvalho

, et al. Nomogram for deciding adjuvant treatment after surgery for oral cavity squamous cell carcinoma. Head Neck 2008;30(10):1352–1360; doi: 10.1002/hed.20879

53.

53. Kattan

, Gönen

, Jarnagin

, et al. A nomogram for predicting disease-specific survival after hepatic resection for metastatic colorectal cancer. Ann Surg 2008;247(2):282–287; doi: 10.1097/SLA.0b013e31815ed67b

54.

54. Motzer

, Bukowski

, Figlin

, et al. Prognostic nomogram for sunitinib in patients with metastatic renal cell carcinoma. Cancer 2008;113(7):1552–1558; doi: 10.1002/cncr.23776

55.

55. Specht

, Kattan

, Gonen

, et al. Predicting nonsentinel node status after positive sentinel lymph biopsy for breast cancer: Clinicians versus nomogram. Ann Surg Oncol 2005;12(8):654–659; doi: 10.1245/ASO.2005.06.037

56.

56. Moussa

, Kattan

, Berglund

, et al. A nomogram for predicting upgrading in patients with low- and intermediate-grade prostate cancer in the era of extended prostate sampling. BJU Int 2010;105(3):352–358; doi: 10.1111/j.1464-410X.2009.08778.x

57.

57. Taylor

, Law

, Hutchinson

, et al. Acceptability of risk stratification within population-based cancer screening from the perspective of healthcare professionals: A mixed methods systematic review and recommendations to support implementation. PLoS One 2023;18(2):e0279201; doi: 10.1371/journal.pone.0279201

58.

58. Fakhari

, Scherr

, Moe

, et al. From calculation to communication: Using risk score calculators to inform clinical decision making and facilitate patient engagement. Med Decis Making 2024;44(8):900–913; doi: 10.1177/0272989X241285036

59.

59. Hanna

, Ranasinghe

, Lawrentschuk

. Risk stratification and avoiding overtreatment in localized prostate cancer. Curr Opin Urol 2019;29(6):612–619; doi: 10.1097/MOU.0000000000000672

60.

60.Anonymous. Diagnostics: Recovery and renewal. In: Report of the Independent Review of Diagnostic Services for NHS England. NHS England; 2020.

61.

61. White

, Weinstein

, Fingeret

, et al. Is less more? A microsimulation model comparing cost-effectiveness of the revised American Thyroid Association’s 2015 to 2009 guidelines for the management of patients with thyroid nodules and differentiated thyroid cancer. Ann Surg 2020;271(4):765–773; doi: 10.1097/SLA.0000000000003074

62.

62. Luo

, Stone

, Sakaguchi

, et al. Using computational approaches to improve risk-stratified patient management: Rationale and methods. JMIR Res Protoc 2015;4(4):e128; doi: 10.2196/resprot.5039

63.

63. Girwar

S-AM

, Jabroer

, Fiocco

, et al. A systematic review of risk stratification tools internationally used in primary care settings. Health Sci Rep 2021;4(3):e329; doi: 10.1002/hsr2.329

64.

64. Ringel

, Sosa

, Baloch

, et al. 2025 American Thyroid Association management guidelines for adult patients with differentiated thyroid cancer. Thyroid 2025;35(8):841–985; doi: 10.1177/10507256251363120

65.

65. Fumagalli

, Serio

. Molecular testing in indeterminate thyroid nodules: An additional tool for clinical decision-making. Pathologica 2023;115(4):205–216; doi: 10.32074/1591-951X-887

66.

66. MacKay

, Turner

, Clarke

, et al. Cost-effectiveness analysis of molecular testing for indeterminate thyroid nodules in Nova Scotia. J Otolaryngol Head Neck Surg 2024;53; doi: 10.1177/19160216241291806

67.

67. Fung

MHM

, Tang

, Kwok

, et al. High rates of unnecessary surgery for indeterminate thyroid nodules in the absence of molecular test and the cost-effectiveness of utilizing molecular test in an Asian population: A decision analysis. Thyroid 2025;35(2):166–176; doi: 10.1089/thy.2024.0436

68.

68. Uppal

, Collins

, James

. Thyroid nodules: Global, economic, and personal burdens. Front Endocrinol (Lausanne) 2023;14:1113977; doi: 10.3389/fendo.2023.1113977

69.

69. Rago

, Scutari

, Latrofa

, et al. The large majority of 1520 patients with indeterminate thyroid nodule at cytology have a favorable outcome, and a clinical risk score has a high negative predictive value for a more cumbersome cancer disease. J Clin Endocrinol Metab 2014;99(10):3700–3707; doi: 10.1210/jc.2013-4401

70.

70. Yoon

, Lee

, Kim

E-K

, et al. A nomogram for predicting malignancy in thyroid nodules diagnosed as atypia of undetermined significance/follicular lesions of undetermined significance on fine needle aspiration. Surgery 2014;155(6):1006–1013; doi: 10.1016/j.surg.2013.12.035

71.

71. D’Andréa

, Gal

, Mandine

, et al. Application of machine learning methods to guide patient management by predicting the risk of malignancy of Bethesda III-V thyroid nodules. Eur J Endocrinol 2023;188(3):249–257; doi: 10.1093/ejendo/lvad017

72.

72. Öcal

, Korkmaz

, Yılmazer

, et al. The malignancy risk assessment of cytologically indeterminate thyroid nodules improves markedly by using a predictive model. Eur Thyroid J 2019;8(2):83–89; doi: 10.1159/000494720

73.

73. Macias

, Arumugam

, Arlow

, et al. A risk model to determine surgical treatment in patients with thyroid nodules with indeterminate cytology. Ann Surg Oncol 2015;22(5):1527–1532; doi: 10.1245/s10434-014-4190-8

74.

74. Ito

, Kawakami

, Hirokawa

, et al. Management of thyroid tumors diagnosed cytologically as follicular neoplasms in a high-volume center: Utility of a scoring system using serum thyroglobulin level, tumor size, ultrasound testing, and cytological diagnosis. Endocr J 2025;72(2):161–170; doi: 10.1507/endocrj.EJ24-0364

75.

75. Zheng

, Zhang

, Lu

, et al. Diagnostic value of an interpretable machine learning model based on clinical ultrasound features for follicular thyroid carcinoma. Quant Imaging Med Surg 2024;14(9):6311–6324; doi: 10.21037/qims-24-601

76.

76. Figge

, Gooding

, Steward

, et al. Do ultrasound patterns and clinical parameters inform the probability of thyroid cancer predicted by molecular testing in nodules with indeterminate cytology? Thyroid 2021;31(11):1673–1682; doi: 10.1089/thy.2021.0119

77.

77. Tang

Z-W

, Li

X-X

, Luo

. Development and validation of the nomogram based on ultrasound, thyroid stimulating hormone, and inflammatory marker in papillary thyroid carcinoma: A case-control study. Transl Cancer Res 2023;12(3):490–501; doi: 10.21037/tcr-22-2478

78.

78. Wu

, Stewardson

, Eszlinger

, et al. Development of a nomogram to integrate molecular testing and clinical variables to improve malignancy risk assessment among cytologically indeterminate thyroid nodules. Thyroid 2025;35(5):508–515; doi: 10.1089/thy.2024.0481

79.

79. Rorive

, D’Haene

, Fossion

, et al. Ultrasound-guided fine-needle aspiration of thyroid nodules: Stratification of malignancy risk using follicular proliferation grading, clinical and ultrasonographic features. Eur J Endocrinol 2010;162(6):1107–1115; doi: 10.1530/EJE-09-1103

80.

80. Chen

, Zhang

, Meng

, et al. A new ultrasound nomogram for differentiating benign and malignant thyroid nodules. Clin Endocrinol (Oxf) 2019;90(2):351–359; doi: 10.1111/cen.13898

81.

81. San Laureano

, Alba

JJF

, Heras

JMJ

, et al. Development and internal validation of a predictive model for individual cancer risk assessment for thyroid nodules. Endocr Pract 2020;26(10):1077–1084; doi: 10.4158/EP-2020-0004

82.

82. Li

, Hong

, Fang

, et al. Incorporation of a machine learning pathological diagnosis algorithm into the thyroid ultrasound imaging data improves the diagnosis risk of malignant thyroid nodules. Front Oncol 2022;12:968784; doi: 10.3389/fonc.2022.968784

83.

83. Fernández Alba

, Carral

, Ayala Ortega

, et al. External validation of a predictive model for thyroid cancer risk with decision curve analysis. Diagnostics (Basel) 2025;15(6):686; doi: 10.3390/diagnostics15060686

84.

84. Ni

, Liu

, Li

, et al. Using dynamic nomogram to modify TI-RADS and reduce the unnecessary FNA of thyroid nodules. Clin Hemorheol Microcirc 2025;90(3):131–144; doi: 10.1177/13860291251357792

85.

85. He

, Liang

, Zou

, et al. Development and optimisation strategies for a nomogram-based predictive model of malignancy risk in thyroid nodules. Hong Kong Med J 2026;32(1):30–40; doi: 10.12809/hkmj2512718

86.

86. Lin

, Xiang

, Qiao

, et al. A predictive model for selecting malignant thyroid nodules in patients with nondiagnostic or indeterminate fine-needle aspiration cytologic findings. J Ultrasound Med 2015;34(7):1245–1251; doi: 10.7863/ultra.34.7.1245

87.

87. Colombo

, Muzza

, Pogliaghi

, et al. The Thyroid Risk Score (TRS) for nodules with indeterminate cytology. Endocr Relat Cancer 2021;28(4):225–235; doi: 10.1530/ERC-20-0511

88.

88. Zhang

, Esebua

, Layfield

. American College of Radiology thyroid imaging reporting system and cytopathologic classification: Integration for improved assessment of malignancy risk. J Am Soc Cytopathol 2026;15(1):73–78; doi: 10.1016/j.jasc.2025.08.001

89.

89. Aydin

, Colakoglu

, Kayhan

, et al. Prospective optimization of malignancy risk prediction in indeterminate thyroid nodules: Diagnostic synergy of ACR TI-RADS and the 2023 Bethesda system. Endocrines 2026;7(1):12; doi: 10.3390/endocrines7010012

90.

90. Mehta

, Kannan

. Approaching indeterminate thyroid nodules in the absence of molecular markers: “The BETH-TR score. Indian J Endocrinol Metab 2020;24(2):170–175; doi: 10.4103/ijem.IJEM_620_19

91.

91. Liu

, Zheng

, Li

, et al. A predictive model of thyroid malignancy using clinical, biochemical and sonographic parameters for patients in a multi-center setting. BMC Endocr Disord 2018;18(1):17; doi: 10.1186/s12902-018-0241-7

92.

92. Xu

, Ni

, Zhou

, et al. Development and validation of a novel diagnostic tool for predicting the malignancy probability of thyroid nodules: A retrospective study based on clinical, B-mode, color doppler and elastographic ultrasonographic characteristics. Front Endocrinol (Lausanne) 2022;13:966572; doi: 10.3389/fendo.2022.966572

93.

93. Gu

, Xie

, Zhao

, et al. A machine learning-based approach to predicting the malignant and metastasis of thyroid cancer. Front Oncol 2022;12:938292; doi: 10.3389/fonc.2022.938292

94.

94. Mehanna

, Deeks

, Boelaert

, et al. Real-time ultrasound elastography in the diagnosis of newly identified thyroid nodules in adults: The ElaTION RCT. Health Technol Assess 2024;28(46):1–51; doi: 10.3310/PLEQ4874

95.

95. Maddaloni

, Briganti

, Crescenzi

, et al. Usefulness of color Doppler ultrasonography in the risk stratification of thyroid nodules. Eur Thyroid J 2021;10(4):339–344; doi: 10.1159/000509325

96.

96. Weller

, Sharif

, Qarib

, et al. British Thyroid Association 2014 classification ultrasound scoring of thyroid nodules in predicting malignancy: Diagnostic performance and inter-observer agreement. Ultrasound 2020;28(1):4–13; doi: 10.1177/1742271X19865001

97.

97. Kim

, Kim

E-K

, Park

, et al. US-guided fine-needle aspiration of thyroid nodules: Indications, techniques, results. Radiographics 2008;28(7):1869–1886; discussion 1887; doi: 10.1148/rg.287085033

98.

98. Solomon

, Dauber-Decker

, Richardson

, et al. Integrating clinical decision support into electronic health record systems using a novel platform (EvidencePoint): Developmental study. JMIR Form Res 2023;7:e44065; doi: 10.2196/44065

99.

99. Ellahham

. Artificial intelligence: The future for diabetes care. Am J Med 2020;133(8):895–900; doi: 10.1016/j.amjmed.2020.03.033

100.

100. Mekov

, Miravitlles

, Petkov

. Artificial intelligence and machine learning in respiratory medicine. Expert Rev Respir Med 2020;14(6):559–564; doi: 10.1080/17476348.2020.1743181

101.

101. Kaul

, Enslin

, Gross

. History of artificial intelligence in medicine. Gastrointest Endosc 2020;92(4):807–812; doi: 10.1016/j.gie.2020.06.040

102.

102. Sorrenti

, Dolcetti

, Radzina

, et al. Artificial intelligence for thyroid nodule characterization: Where are we standing? Cancers (Basel) 2022;14(14):3357; doi: 10.3390/cancers14143357

103.

103. Cao

C-L

, Li

Q-L

, Tong

, et al. Artificial intelligence in thyroid ultrasound. Front Oncol 2023;13:1060702; doi: 10.3389/fonc.2023.1060702

104.

104. Toro-Tobon

, Loor-Torres

, Duran

, et al. Artificial intelligence in thyroidology: A narrative review of the current applications, associated challenges, and future directions. Thyroid 2023;33(8):903–917; doi: 10.1089/thy.2023.0132

105.

105. Rao

, Fernandez-Alvarez

, Guntinas-Lichius

, et al. The limitations of artificial intelligence in head and neck oncology. Adv Ther 2025;42(6):2559–2568; doi: 10.1007/s12325-025-03198-4

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB