Abstract
Background:
The introduction of molecular testing (MT) of cytologically indeterminate thyroid nodules (ITNs) alone has not impacted thyroidectomy rates. Due to this, we evaluated the incremental diagnostic value of various clinical variables in addition to MT results, in predicting the risk of malignancy (ROM) among ITNs.
Methods:
This prospective observational study included 1024 consecutive ITNs that underwent reflexive ThyroSPEC MT between Jul 30, 2020, and Oct 30, 2023. A multivariable logistic regression model was built to assess the relationship between histology outcomes and clinical variables, including nodule discovery by palpation, ultrasound risk categories, maximum nodule size, Bethesda category, Bethesda atypia, and ThyroSPEC categories. A total of 332 out of 1024 patients who underwent surgery and had complete data for all variables were included in the model. A nomogram was subsequently developed based on the model.
Results:
The model achieved a cross-validated AUC of 0.831 (95% confidence intervals: 0.787–0.874). Patients with high-risk mutations or malignant molecular markers exhibited significantly higher odds (152.79 times) of malignancy compared to those with mutation-negative or benign molecular marker results. Patients with maximum nodule size >5 cm have 4.34 times higher odds of malignancy than those 0–2 cm. The presence of nuclear atypia increased the odds of malignancy by 4.26 times, while ultrasound malignancy risk category 5 increased the odds of malignancy by 2.89 times compared to categories 1–3. Positive palpation discovery increased the odds by 1.83 times. The integrated ROM estimated from the regression model is significantly associated with the surgery type (p < 0.001). In the low (0–30%) and intermediate ROM (31–70%) categories, lobectomy alone is the most common surgery (61% and 70%, respectively), while in the high ROM (>70%) category, total thyroidectomy dominates (62%).
Conclusions:
Although MT alone played an important role in decision-making regarding surveillance versus surgery in our study population, integrating MT results with additional clinical variables improved the malignancy risk prediction for ITNs. Our results highlight the importance of contextualizing MT results within an integrated interdisciplinary thyroid nodule diagnostic pathway.
Introduction
Contrary to predictions that high negative predictive value and high benign call rate of molecular testing (MT) decrease diagnostic resections, MT of cytologically indeterminate thyroid nodules (ITNs) alone has not impacted thyroidectomy rates. 1,2 This is perhaps unsurprising given that intermediate-risk mutations such as RAS (Rat sarcoma, which can be present in both malignant and benign nodules) are the most prevalent mutation category in mutation-positive ITNs.
Previous MT validation studies provided no data on upstream malignancy risk stratification or the impact of MT on clinical decision-making. As such, the impact of MT needs to be evaluated for its incremental value in the context of other clinical variables that impact thyroid nodule malignancy risk stratification. In this manner, malignancy risk estimates are not based on a single feature such as Thyroid Imaging Reporting and Data System (TIRADS) score or molecular alteration, but rather, the risk of malignancy (ROM) is a result of the integration of various features within a locally optimized, integrated, interdisciplinary thyroid nodule diagnostic pathway. 3 –5 Studies by Hu et al. and Ghaznavi et al. describe our local efforts to optimize the thyroid nodule diagnostic pathway. 6,7
The objective of this study was to evaluate the incremental diagnostic value of ThyroSPEC MT within an optimized, integrated thyroid nodule diagnostic pathway. We aimed to build a multivariable logistic regression model (lrm) to assess the relationship between several clinical variables and histology outcomes using a larger patient population than our previous study. 3
In addition, we developed a nomogram to illustrate the contribution of each diagnostic component within our integrated interdisciplinary thyroid nodule diagnostic pathway (Fig. 1). This nomogram aimed to estimate the integrated ROM by combining ThyroSPEC results with other clinical variables in our optimized diagnostic pathway, thus facilitating clinical decision-making as recommended in recent guidelines and textbooks. 8,9

Summary of the local integrated interdisciplinary thyroid nodule diagnostic pathway.
We also examined the relationship between surgery type and integrated ROM to evaluate the impact of integrated ROM on clinical decision-making.
Method
Study population
This prospective observational study represents an update to our prior study, with increased patient numbers (n = 1024) and longer follow-up (median 20 months). 3 We included all patients in Southern Alberta diagnosed with a Bethesda III (BIII) or Bethesda IV (BIV) thyroid nodule between Jul 30, 2020, and Oct 30, 2023, who underwent reflexive ThyroSPEC MT as previously described. 3 The study was approved by the Health Research Ethics Board of Alberta (HREBA.CHC-20–0068) in accordance with the 2013 Declaration of Helsinki. Patient consent was waived as the study posed no risks to participants.
Data collection
This study utilized electronic health records to gather patient clinical data with a cutoff date of Oct 30, 2023 (Fig. 2). All surgeries scheduled before the cutoff date were included, even when they were performed after the cutoff date. Noninvasive follicular thyroid neoplasm with papillary-like nuclear features, well-differentiated tumors of uncertain malignant potential (WDT-UMP), and follicular tumors of UMP (FT-UMP) were classified as neoplasms requiring surgery and, therefore, included in the malignant cohort.

Flow chart. USMR category 1–3: ATA benign-low risk, ACR-TIRADS 1–3. USMR category 4: ATA intermediate risk, ACR-TIRADS 4. USMR category 5: ATA high risk, ACR-TIRADS 5. USMR, ultrasound malignancy risk; ACR-TIRADS, American College of Radiology-Thyroid Imaging Reporting and Data System; ATA, American Thyroid Association.
The target nodule was identified by matching its location and size with the pre-ThyroSPEC ultrasound reports to ensure that the ultrasound, fine-needle aspiration (FNA), MT, and pathology reports referred to the same nodule. Incidental microcarcinomas were excluded, only those microcarcinomas that corresponded to the target nodules were included.
MT as part of an optimized integrated thyroid nodule diagnostic pathway
Out of 1024 residual FNA materials, 959 (94%) passed quality control (Fig. 2) and were analyzed by ThyroSPEC, a locally developed MassARRAY-based diagnostic test. 3 ThyroSPEC test results fall into one of five categories: no mutation detected, benign molecular markers, intermediate-risk mutations, malignant molecular markers, and high-risk mutations. The summary of each mutation category is provided in Supp_TableS1.
Statistical analyses
Statistical analyses were performed using the R statistical software package. Univariable analyses were performed on variables commonly included in our previous study and other similar studies. 3 Variables demonstrating stronger associations with histological outcomes in the univariable analyses were then selected for multivariable analysis. The final chosen variables included palpation discovery (nodules discovered by palpation rather than radiographically), ultrasound risk categories, maximum nodule size, Bethesda category, Bethesda nuclear or architectural atypia, and ThyroSPEC MT categories. In addition, a nomogram based on the model was provided as a graphical tool for approximate graphical computation of the model. 10
The model and nomogram were developed using the lrm () and nomogram () functions from the rms (Regression Modeling Strategies) package. The model predicts binary outcomes (malignancy vs. benign) using data with known histological outcomes. For each predictor variable, the model estimates coefficients, which represent the log odds of the outcome associated with that predictor. These regression coefficients are then scaled and converted into a points-based system for each predictor in the nomogram. Of the 375 patients who underwent surgery (Fig. 2), 332 had complete data for all clinical parameters and were included in the model.
The AUC, which represents the Area under the Receiver Operating Characteristic (ROC) curve, was provided for the model to quantify its ability to distinguish between covariates with malignancy outcomes. All reported AUCs were calculated using 10-fold cross-validation, with 95% confidence intervals (CIs) included. A p value of < 0.05 was considered statistically significant.
Some patients had changes in their management decisions before the data cutoff. However, all patients were categorized based on their last management status at the time of the data cutoff.
Results
In post-ThyroSPEC, the percentage of BIII/IV FNA cytologies is 15% (1454/9949). After exclusions we included 1024 ITNs: 887 (87%) were BIII, and 137 (13%) were BIV (Fig. 2). Of all patients with ITN, 48% (495/1024) are under ongoing surveillance with a median follow-up period of 21 months (interquartile range [IQR] 12–30 months) after ThyroSPEC testing (Fig. 2). Notably, 60 out of the 495 (12%) patients undergoing ongoing surveillance tested positive for a RAS mutation (Supp_FigS1), with a median follow-up time of 21 months (IQR 11.8–30.3 months).
Supplementary Table S2 summarizes all relevant clinical variables and their associations with histology outcomes from the univariable analyses. Age and sex were excluded from the subsequent multivariable regression analysis due to a lack of significant association with histology outcomes. Maximum nodule size, although not significant in univariable analysis, became significant in the multivariable model and was thus included. Bethesda Category was also included based on its significance in our previous study.
Table 1 presents the outcomes of the multivariable lrm which was used to examine the relationship between histologically confirmed malignancy and the six clinical parameters included as covariates in the model. Patients with high-risk mutations or malignant molecular markers have significantly higher odds (152.79 times) of malignancy compared to those with mutation-negative or benign molecular marker results. Patients with intermediate-risk mutations also demonstrate significantly higher odds (5.67 times) of malignancy compared to those with mutation-negative or benign molecular marker results. Moreover, patients with maximum nodule size greater than 5 cm have significantly higher odds (4.34 times) of malignancy compared to those between 0 and 2 cm. The presence of Bethesda nuclear atypia mentioned in the FNA report also correlates with higher odds (4.26 times) of malignancy than in patients without this mentioned. In addition, patients falling into ultrasound malignancy risk (USMR) Category 5 (American Thyroid Association [ATA] high risk or American College of Radiology [ACR]-TIRADS 5) exhibit significantly higher odds (2.89 times) of malignancy compared to those in Categories 1–3. The odds of malignancy are also significantly elevated (1.83 times) for patients with a positive palpation discovery compared to those without. Although not significant, patients in the BIV category had higher odds (1.57 times, p = 0.16) of malignancy compared to those in the BIII category.
Multivariable Logistic Regression Model (Resected Nodules Only)
USMR category 1–3: ATA benign—low risk, ACR-TIRADS 1–3.
USMR category 4: ATA intermediate risk, ACR-TIRADS 4.
USMR category 5: ATA high risk, ACR-TIRADS 5.
OR, odds ratio; CI, confidence interval; ATA, American Thyroid Association; BIII, Bethesda III; BIV, Bethesda IV.
The multivariable model integrating all six variables yields an AUC value of 0.831 (CI: 0.787–0.874). This suggests the model has good discriminatory power in distinguishing between individuals with benign or malignant outcomes. The optimal threshold point on the ROC curve has a sensitivity of 0.761 (CI: 0.681–0.829) and a specificity of 0.727 (CI: 0.658–0.788), respectively.
A model with the ThyroSPEC test alone achieves an AUC of 0.740 (CI: 0.663–0.818). With the progressive inclusion of additional variables maximum nodule size, Bethesda atypia, USMR Category, and palpation discovery, the AUCs increase to 0.753 (CI: 0.693–0.813), 0.790 (CI: 0.739–0.842), 0.820 (CI: 0.775–0.866), and 0.829 (CI: 0.785–0.874), respectively. This trend demonstrates that each additional variable contributes to the model’s ability to discriminate between malignant and non-malignant cases. A model that includes all other variables but excludes the ThyroSPEC test has the lowest AUC of 0.693 (CI: 0.638–0.748), demonstrating the incremental diagnostic value of the ThyroSPEC test on model performance. The highest AUC is achieved with the comprehensive model that includes all six variables, indicating its superior performance compared to the more limited models.
The nomogram in Figure 3 is based on the comprehensive model in Table 1. Total points were calculated by summing individual points assigned to each of the six variables. These total points were then compared with the corresponding total points bar to determine the corresponding malignancy probability.

Nomogram for the multivariable logistic regression model (resected nodules only). USMR category 1–3: ATA benign-low risk, ACR-TIRADS 1–3. USMR category 4: ATA intermediate risk, ACR-TIRADS 4. USMR category 5: ATA high risk, ACR-TIRADS 5. If an aspirate exhibits both types of Bethesda atypia, only the higher point value is assigned to cytological atypia, rather than combining the points for both. USMR, ultrasound malignancy risk; ACR-TIRADS, American College of Radiology-Thyroid Imaging Reporting and Data System; BIII, Bethesda III; BIV, Bethesda IV; ATA, American Thyroid Association.
An expanded regression model and corresponding nomogram (Table 2, Fig. 4) were developed to incorporate 182 nodules under surveillance that were assumed to be benign on the basis of at least one year of follow-up and stable or improved ultrasound. These 182 nodules were added to the 332 resected nodules from the original model. The AUC of this expanded model is 0.838 (CI: 0.801–0.876).

Nomogram for the multivariable logistic regression model (resected and surveillance nodules). USMR category 1–3: ATA benign-low risk, ACR-TIRADS 1–3. USMR category 4: ATA intermediate risk, ACR-TIRADS 4. USMR category 5: ATA high risk, ACR-TIRADS 5. If an aspirate exhibits both types of Bethesda atypia, only the higher point value is assigned to cytological atypia, rather than combining the points for both. USMR, ultrasound malignancy risk; ACR-TIRADS, American College of Radiology-Thyroid Imaging Reporting and Data System; BIII, Bethesda III; BIV, Bethesda IV; ATA, American Thyroid Association.
Multivariable Logistic Regression Model (Resected and Surveillance Nodules)
USMR category 1–3: ATA benign—low risk, ACR-TIRADS 1–3.
USMR category 4: ATA intermediate risk, ACR-TIRADS 4.
USMR category 5: ATA high risk, ACR-TIRADS 5.
OR, odds ratio; CI, confidence interval; ATA, American Thyroid Association; BIII, Bethesda III; BIV, Bethesda IV.
The odds ratios (ORs) for most covariates are consistent between the two models, though some notable differences exist. The OR for ThyroSPEC high-risk mutations or malignant molecular markers decreased in the expanded model, as the broader range of nodule types reduces the relative strength of its association with malignancy; however, the OR remained high, indicating the continued strong predictive value of the ThyroSPEC test. The inclusion of assumed benign nodules likely reduced variability in USMR risk, while Bethesda cytology played a stronger role in malignancy assessment within the broader dataset. As a result, USMR categories showed reduced significance, while Bethesda cytology showed increased significance in the expanded model.
To evaluate the relationship between surgery type and integrated ROMs calculated from the regression model (Table 1), we classified ROM into three categories: low (0–30%), intermediate (31–70%), and high (>70%), and provided the distribution of surgery types within each ROM category (Table 3). The chi-square test demonstrated that the ROM category is significantly associated with the surgery type (p < 0.001). In the low and intermediate ROM categories, lobectomy alone was the most common surgery (61% and 70%, respectively), while in the high ROM category, total thyroidectomy dominates (62%).
Distribution of Surgery Types in Each ROM Category (Resected Nodules Only)
ROM, risk of malignancy.
The distribution of management type (surveillance, lobectomy, or total thyroidectomy) in each ROM category for the expanded model is provided in Table 4. The type of management remains significantly associated with the ROM category (p < 0.001). In the low ROM category, 45% of patients remained under surveillance. In the intermediate ROM category, lobectomy alone was the most common surgery type (56%). Conversely, the high ROM category was predominantly managed with total thyroidectomy (62%).
Distribution of Management Type (Surveillance, Lobectomy, or Total Thyroidectomy) in Each ROM Category (Resected and Surveillance Nodules)
ROM, risk of malignancy.
Discussion
A model excluding the ThyroSPEC test has the lowest AUC, highlighting the incremental value of MT results beyond the other five variables. However, the integration of all six variables provides the best discriminatory performance.
Integrating the MT result with other clinical variables can substantially improve the diagnostic value of MT results, thereby offering additional guidance for shared decision-making and illustrating the advantage of optimizing each step of the integrated diagnostic pathway for optimal diagnostic performance. For example, based on the nomogram presented in Figure 3, for an intermediate-risk mutation patient (35 points) with a maximum nodule size of 3 cm (4 points), BIII cytology (0 points), nuclear atypia reported in the cytology report (29 points), along with ATA intermediate-risk or ACR-TIRADS 4 ultrasound classification (12 points), and palpation nodule discovery (12 points), the overall estimated score is 92 points. This corresponds to an integrated malignancy probability of 77% in contrast to 53% for isolated RAS mutations. Given that RAS and other intermediate-risk mutations represent the most common test-positive finding, it is important to optimize their discriminatory value by integrating the MT result with other relevant clinical variables, rather than assessing malignancy risk based on MT result alone.
We provided a blueprint for developing a nomogram within local thyroid nodule diagnostic pathways, including data collection through a local database, optimization of pathway components, integration of the ROM of MT results and other clinical variables, and appropriate statistical methods. This framework allows other institutions to create their own version of the nomogram tailored to their specific thyroid nodule diagnostic pathways. Due to variability in data and model variable coefficients in different practice environments, our nomogram—based on coefficients unique to our diagnostic pathway—should not be directly applied to other settings. Our nomogram estimates the integrated ROM by combining ThyroSPEC results with other clinical variables for our diagnostic pathway rather than relying solely on ThyroSPEC results, especially for intermediate-risk mutations and rearrangements. Ultimately, if a similar nomogram is used in other practice environments, the nomogram should be informed by local outcome data specific to the respective environment. Furthermore, to optimize diagnostic impact, it is important to optimize each variable included in the nomogram, similar to what we have done in our setting. 6,7,11
A total of 515 out of 1024 (50%) ITNs had nuclear atypia noted in the FNA report, while architectural atypia was described for 279/1024 (27%) reports, representing a significant increase compared to our 2-year ThyroSPEC data (p < 0.001) 3 likely as a result of knowledge translation efforts with our cytopathologists to encourage them to report these variables at the time of initial data analysis. Our multivariable regression model demonstrated that nuclear atypia is a strong predictor of malignancy, with an OR of 4.26 (CI: 2.19–8.30, p < 0.001) compared to cases without nuclear atypia mentioned. In addition, univariable analysis revealed that nuclear atypia is also significantly more predictive of malignancy than architectural atypia, with an OR of 1.90 (CI: 1.06–3.42, p = 0.030). These findings are consistent with the results of previous studies conducted by Valderrabano et al. and Gan et al. 12,13
USMR categories demonstrated a significant incremental impact on the overall ROM, aligning with the findings of Hu et al. who assessed USMR categories in combination with MT, 4 and with Larcher et al. and Ahmadi et al. who demonstrated that combining ultrasound risk classification with cytological subcategorization can further stratify the ROM for ITNs. 14,15
Conversely, a study by Figge et al reported that neither the ATA nor ACR-TIRADS ultrasound scoring systems provided additional ROM information beyond MT for ITN. 16 However, Figge et al. retrospectively assigned the USMR category to each nodule, and it is unclear how the ultrasound category was used for patient assessment. Chaigneau et al. also reported a limited value of the TIRADS score in risk stratification. However, their sample predominantly consisted of ACR-TIRADS 4 nodules with minimal representation of ACR-TIRADS 3 and no ACR-TIRADS 5 nodules. 17 Such differences in sample composition could potentially account for the discrepant results.
Figge et al. used multivariable regression models along with nomograms to evaluate the association of ROM and clinical variables as well. 16 However, in contrast to the current study, the study by Figge et al. was a post hoc retrospective study, and surgical decision-making was made without MT results available. In addition, Figge et al. included Bethesda V nodules in their regression models.
The integrated ROM combining MT results and other variables played an important role in guiding clinical management decisions for patients. This highlights the value of the regression model and its corresponding nomogram as tools for personalized patient management.
Limitations
Our median follow-up period of 21 months for ongoing surveillance patients is limited, considering the slow growth of most thyroid cancers. However, given our centralized cancer registry and electronic medical record system across the health region and generally low rates of loss to follow-up, we do not expect a high chance of missed cancers, although this has not been directly studied. Our single-center study potentially limits the generalizability of our findings to other populations, with differing thyroid nodule diagnostic pathways. The ThyroSPEC assay is a hotspot panel, and the design includes coverage of most malignancy-associated mutations in thyroid carcinoma. Certain rare pathogenic mutations are not covered, including a subset of fusions with uncommon breakpoints and/or gene partners. External validation is not feasible as the nomogram is tailored to our specific diagnostic setting. However, we employed 10-fold cross-validation to calculate all AUCs.
Conclusion
Our multivariable regression model quantifies the contribution of clinical variables and MT as integrated constituents of an optimized integrated interdisciplinary thyroid nodule diagnostic pathway to predict histology outcomes for ITNs. This enabled the development of a nomogram, a visually intuitive and clinically useful graphical tool, to illustrate the contribution of each diagnostic component within our integrated interdisciplinary thyroid nodule diagnostic pathway. Our study highlights the importance of appropriately interpreting the MT result as an adjunct to an optimized diagnostic pathway to further improve diagnostic impact.
Footnotes
Authors’ Contributions
J.W.: Conceptualization, data analysis, investigation, methodology, validation, visualization, and writing (original draft). P.S.: Conceptualization, investigation, methodology, validation, writing (review and editing). M.E.: Conceptualization, investigation, methodology, validation, and writing (review and editing). M.K.: Investigation, resources, and writing (review and editing). S.G.: Investigation, methodology, validation, and writing (review and editing). E.N.: Investigation and writing (review and editing). A.B.: Investigation and writing (review and editing). R.P.: Conceptualization, funding acquisition, investigation, methodology, project administration, supervision, validation, and writing (review and editing).
Author Disclosure Statement
M.E. and R.P. report receiving licensing fees for the ThyroSPEC test. None of the other authors report any potential conflicts of interest or disclosures.
Funding Information
No funding to report.
Supplementary Material
Supplementary Figure S1
Supplementary Table S1
Supplementary Table S2
