Abstract
Background
Myocardial fibrosis is often detected in patients with hypertrophic cardiomyopathy (HCM), which causes left ventricular (LV) dysfunction and tachyarrhythmias.
Purpose
To evaluate the potential value of a machine learning (ML) approach that uses radiomic features from late gadolinium enhancement (LGE) and cine images for the prediction of ventricular tachyarrhythmia (VT) in patients with HCM.
Material and Methods
Hyperenhancing areas of LV myocardium on LGE images were manually segmented, and the segmentation was propagated to corresponding areas on cine images. Radiomic features were extracted using the PyRadiomics library. The least absolute shrinkage and selection operator (LASSO) method was employed for radiomic feature selection. Our model development employed the TabPFN algorithm, an adapted Prior-Data Fitted Network design. Model performance was evaluated graphically and numerically over five-repeat fivefold cross-validation. SHapley Additive exPlanations (SHAP) were employed to determine the relative importance of selected radiomic features.
Results
Our cohort consisted of 60 patients with HCM (73.3% male; median age = 51.5 years), among whom 17 had documented VT during the follow-up. A total of 1612 radiomic features were extracted for each patient. The LASSO algorithm led to a final selection of 18 radiomic features. The model achieved a mean area under the receiver operating characteristic curve of 0.877, demonstrating good discrimination, and a mean Brier score of 0.119, demonstrating good calibration.
Conclusion
Radiomics-based ML models are promising for predicting VT in patients with HCM during the follow-up period. Developing predictive models as clinically useful decision-making tools may significantly improve risk assessment and prognosis.
Introduction
Hypertrophic cardiomyopathy (HCM) encompasses a broad spectrum of clinical presentations, ranging from asymptomatic disease to sudden cardiac death (SCD). In addition to increased myocardial wall thickness and left ventricular pressures, myocardial fibrosis is central to the clinical progression of HCM (1). Previous studies have clearly demonstrated that myocardial fibrosis is associated with left ventricular (LV) systolic dysfunction, ventricular tachyarrhythmias (VT), and SCD (2,3). Detecting myocardial fibrosis is, therefore, critical in patients with HCM. Because of its unique ability to identify fibrosis, cardiac magnetic resonance imaging (CMRI) is recommended for all patients with HCM per current clinical guidelines (4,5). To date, late gadolinium enhancement (LGE) by CMRI remains the sole non-invasive imaging approach that distinguishes normal myocardium from fibrotic tissue. Beyond detecting fibrosis, LGE facilitates the characterization of fibrotic properties, such as extension and percentage of LV mass.
Medical images contain many quantitative parameters formed by analyzing pixel/voxel relationships and distributions. While these quantitative parameters may have subtle differences imperceptible to the human eye, radiomic-based machine learning (ML) algorithms can now detect such differences. Clinical use of radiomic features has steadily risen in recent years (6). Multiple studies have demonstrated the diagnostic utility of radiomic features in patients with HCM, including etiological differentiation of LV hypertrophy, diagnosis of HCM itself, and discrimination between cardiac amyloidosis and HCM (7–13).
Given the research indicating the potential benefits of leveraging LGE and cine images in radiomics for HCM risk stratification, the aim of the present study was to to investigate the utility of an ML approach utilizing radiomic features derived from LGE and cine images. Specifically, we focused on the prediction of VT occurrence in patients with HCM.
Material and Methods
A step-by-step representation of the study workflow is depicted in Fig. 1.

Step-by-step representation of the study workflow. AUC, area under the ROC curve; AUPRC, area under the precision-recall curve; CMRI, cardiac magnetic resonance imaging; ICC, interclass correlation coefficient; LASSO, least absolute shrinkage and selection operator; LGE, late gadolinium enhancement; MCC, Matthew's correlation coefficient; PRC, precision-recall curve; ROC, receiver operating characteristic; SHAP, SHapley additive exPlanations; SMOTE, synthetic minority over-sampling technique.
Study design, patients, and patient selection
This retrospective analysis used prospectively collected data. The institutional review board approved the study protocol with a waiver of written informed consent. We retrospectively reviewed records of patients presenting to our hospital between January 2019 and January 2023. HCM diagnosis aligned with current guidelines (5). Briefly, patients with myocardial wall thickness ≥15 mm without increased afterload (hypertension, aortic stenosis, etc.) and those with myocardial wall thickness ≥13 mm with a first-degree relative with HCM received a diagnosis of HCM. The inclusion criteria were as follows: (i) confirmed new diagnosis of HCM; and (ii) presence of LGE in the LV myocardium.
The following criteria were used to exclude patients from the analysis: (i) previous diagnosis of HCM; (ii) hypertensive heart disease; (iii) intracardiac defibrillator (ICD) implantation before CMRI; (iv) lack of 24-h Holter monitoring; (v) chronic kidney disease with an estimated glomerular filtration rate <30 mL/min; (vi) atrial fibrillation; (vii) prior myocardial infarction; (viii) extracardiac diseases that may infiltrate myocardium (e.g. Fabry disease, amyloidosis); and (ix) patients who were lost to follow-up.
Relevant data were collected for all patients, including demographic characteristics (age and sex), HCM-related clinical parameters (HCM type, family history, history of syncope, and left ventricular outflow tract obstruction [LVOTO]), laboratory parameters (brain natriuretic peptide [BNP], and troponin), CMRI parameters (ejection fraction [%, EF], end-diastolic volume index [EDVi; mL/m2], end-systolic volume index [ESVi; mL/m2], stroke volume index [SVi; mL/m2], cardiac index [Ci; L/min/m2], myocardial mass index [g/m2], maximum LV wall thickness [mm], extent of LGE [%], and left atrial volume index [LAVi; mL/m2]). A comprehensive echocardiographic assessment was performed on every patient using an echocardiography machine (EPIQ; Philips, Andover, MA, USA) by a board-certified cardiologist, and images were reviewed by a second board-certified senior cardiologist.
Assessment of VT
Patients underwent regular 1-year follow-ups after the initial HCM diagnosis. VT, including sustained ventricular tachycardia (a ventricular rhythm of 100–250 bpm) and ventricular fibrillation (a ventricular rhythm exceeding 250 bpm), was diagnosed via surface electrocardiogram (ECG), Holter monitoring, or ICD records within 1 year after CMRI.
CMRI acquisition
All CMRI studies occurred at initial HCM diagnosis using a 1.5-T scanner (Magnetom; Siemens, Germany) with 12-channel phase array body coils. Retrospectively gated balanced steady-state free precession cine images were obtained at rest without contrast in two-chamber, three-chamber, four-chamber, and short-axis orientations at intervals of 1 cm. LGE images followed 10 min after intravenous administration of 0.2 mmol/kg gadopentetate dimeglumine (Magnevist; Bayer Healthcare, Germany). LGE parameters were as follows: TR/TE = 732/1.0 ms; slice thickness = 8 mm; and inversion time = 200–300 ms, adjusted to null normal myocardium signal.
CMRI analysis
Quantitative CMRI measurements were made by two radiologists (authors 4 and 5), blinded to the outcome of interest, by consensus. In the event of disagreement, a third senior radiologist (author 1) evaluated the images, and the final decision was made based on the majority opinion. These measurements were carried out using commercially available software and included EF (%), EDVi (mL/m2), ESVi (mL/m2), SVi (mL/m2), Ci (L/min/m2), myocardial mass index (g/m2), maximum LV wall thickness (mm), extent of LGE (%), and LAVi (mL/m2).
Region of interest (ROI) segmentation
Two radiologists (authors 4 and 5), blinded to the outcome of interest, then independently performed image segmentation using the open-source 3D Slicer software (www.slicer.org). The radiologists manually delineated the areas of hyperenhancement in the LV myocardium based on LGE imaging. This 3D myocardium segmentation was then applied to corresponding regions on short-axis cine end-diastolic phase images (Fig. 2).

(a) Illustration of hyperenhancing areas of myocardium on LGE images; (b) depiction of the segmentation process applied to hyperenhancing areas of myocardium on LGE images; (c) depiction of the segmentation process applied to corresponding areas on cine images. LGE, late gadolinium enhancement.
Radiomic feature extraction
The PyRadiomics library was utilized to extract a broad array of radiomic features from both cine and LGE images using the segmentations from both radiologists (14). Image preprocessing utilized the PyRadiomics library example settings file, enabling intensity normalization and isotropic 1 × 1 × 1 mm grid resampling. The radiomic feature extraction process yielded 11 distinct categories of features, including original as well as wavelet-transformed features: low-low-low (LLL), high-high-high (HHH), LLH, LHH, LHL, HLL, HHL, and HLH. Here, “L” denotes a low-pass filter, while “H” represents a high-pass filter, indicating the nature of the wavelet transform applied.
Each feature category encompassed five separate feature matrices. These matrices included first-order statistics (18 features), the Gray Level Co-occurrence Matrix (GLCM; 24 features), the Gray Level Size Zone Matrix (GLSZM; 16 features), the Gray Level Run Length Matrix (GLRLM; 16 features), and the Gray Level Dependence Matrix (GLDLM; 14 features, with 14 features specifically for the original category). An additional shape matrix (14 features) was included within the original feature category.
These features encapsulate a diverse range of information from the image data, including intensity distributions, textures, shapes, and wavelet transforms, enhancing the comprehensiveness of the data analyzed by our ML models. For a detailed explanation of these radiomic features, refer to the PyRadiomics documentation: https://pyradiomics.readthedocs.io/en/latest/features.html.
Feature selection
The initial step in our feature selection process involved the computation of interclass correlation coefficients (ICCs) to evaluate the inter-observer reliability of the extracted radiomic features. ICC values in the range of 0.75–1 were deemed “excellent,” serving as a benchmark of high reproducibility. For the features yielding an ICC ≥0.75, the values from both radiologists’ segmentations were averaged. In the next stage, we utilized the least absolute shrinkage and selection operator (LASSO) regression algorithm (alpha = 0.005).
Model development and evaluation
Our model development employed TabPFN, an adapted Prior-Data Fitted Network design. TabPFN makes use of a meta-learning system to enable adjustment to new, unseen information through learning from varied datasets (15). Prior-data fitted networks such as TabPFN undergo pre-training on simulated data to estimate Bayesian deduction on real-world data (15, 16). This pre-training allows TabPFN to capture intricate patterns in tabular information and easily transition to novel datasets.
We evaluated model effectiveness using a five-repeat fivefold stratified cross-validation framework. In each iteration, the data was separated into five approximately equal folds with different random splits each time, equalizing outcome class ratios (stratification) to ensure class equilibrium across folds. Inside each fold during every iteration, the preliminary training dataset (80% of data) was additionally divided into a final training subset (70% of total data) and a validation subset (10% of total data). This produced a final 70:10:20 proportion for training to validation to hold-out testing.
We employed the synthetic minority over-sampling technique (SMOTE) within each training fold to tackle potential class imbalance in the training data. SMOTE rectifies skewed class distributions by synthetically generating novel cases belonging to the minority class rather than duplicating existing samples (17). Implementing SMOTE guaranteed adequate examples of both classes and prevented learning bias during training that tends to prefer the majority class.
The validation subsets enabled sigmoid calibration to align predicted risks with factual outcomes (18). The scikit-learn CalibratedClassifierCV class handled this sigmoid calibration model fitting on the validation information (19). We then assessed model discrimination, calibration, and accuracy on the held-out test folds.
The calibrated TabPFN model produced predictions and probabilities on each test fold over the five cross-validation repetitions. We assessed overall model performance by combining results across all folds and iterations. Cross-validation permitted dependable evaluation of generalizable predictive ability. To enhance interpretability, SHapley Additive Explanations (SHAP) were used to ascertain relative feature impact (20). The SHAP plot exhibited chosen features hierarchically, with the most significant at the top. The model code can be accessed in the study GitHub repository (https://github.com/mertkarabacak/HCMP) for complete transparency.
We visually assessed model performance using a receiver operating characteristics (ROC) curve, a precision-recall curve (PRC), a calibration plot, and a confusion matrix aggregating predictions across all folds and repetitions. Numerically, we used several metrics: precision; recall; F1-score; the Matthew’s correlation coefficient (MCC); the area under the ROC curve (AUC); the area under the PRC (AUPRC); and the Brier score (measures the average squared deviation between predicted probabilities and factual outcomes, with a lower score signifying a more accurate and well-calibrated model). A 95% confidence interval (CI) for each metric was derived using a bootstrap approach with 1000 resampled datasets.
Statistical analysis
In the descriptive statistics, the mean ± standard deviation (SD) values were reported for continuous variables that conformed to a normal distribution, while the median values (interquartile ranges) were utilized for continuous variables that were not normally distributed. Categorical variables were presented as the count and their respective percentages. For comparing differences between VT and non-VT groups, the independent t-test was used for normally distributed continuous variables with equal variances, while Welch's t-test was applied for normally distributed continuous variables with unequal variances. For non-normally distributed continuous variables, the Mann–Whitney U test was used. For categorical variables, Pearson's chi-square test was employed. The Shapiro–Wilk test was used to assess the normality of data, and the equality of variances for a variable was checked using Levene's test. A P value <0.05 was considered statistically significant. Python (version 3.7.15) on Google Colab was used for all statistical computations.
Results
Patient characteristics
Our cohort consisted of 60 patients with HCM (44 men, 16 women; median age = 51.5 years), among whom 17 had documented VT during the follow-up (Fig. 3). In the comparison between the VT and non-VT groups, we found no significant differences in numerous parameters, including age, sex, HCM type, family history, LVOTO, serum BNP levels, EF, SVi, Ci, myocardial mass index, maximum LV wall thickness, and LAVi. However, certain factors, such as a history of syncope, troponin levels, EDVi, ESVi, and the extent of LGE, were found to be significantly associated with the occurrence of VT. Detailed patient characteristics and corresponding CMRI data are compiled in Table 1.

Patient cohort flowchart.
Patient characteristics.
Values are given as n (%), mean ± SD, or median (IQR).
BNP, brain natriuretic peptide; Ci, cardiac index; EDVi, end-diastolic volume index; EF, ejection fraction; ESVi, end-systolic volume index; HCM, hypertrophic cardiomyopathy; IQR, interquartile range; LAVi, left atrial volume index; LGE, late gadolinium enhancement; LV, left ventricle; LVOTO, left ventricular outflow tract; SD, standard deviation; SVi, stroke volume index; VT, ventricular tachyarrhythmia.
Model performance metrics.
AUC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve); CI, confidence interval.
Feature extraction and feature selection
A total of 1612 radiomic features, evenly split with 806 each from cine and LGE images, were initially extracted for each patient. Of these, 1214 features exhibited strong inter-observer agreement, as reflected by ICC values in the range of 0.75–0.98. Further refinement using the LASSO algorithm led to a final selection of 18 radiomic features, characterized by non-zero coefficients. The majority of these selected features, precisely 11, were from cine images, while the remaining seven were from LGE images.
Model evaluation
The model exhibited robust predictive capability with a precision of 0.719 (95% CI = 0.577–0.848), recall of 0.607 (95% CI = 0.483–0.730), F1-score of 0.627 (95% CI = 0.497–0.744), accuracy of 0.846 (95% CI = 0.813–0.875), MCC of 0.583 (95% CI = 0.456–0.694), and Brier score of 0.119 (95% CI = 0.093–0.147). The AUC was 0.877 (95% CI = 0.823–0.924), denoting good discrimination (21). The AUPRC of 0.862 (95% CI = 0.817–0.910) also showed sturdy precision-recall performance (Table 2). The model ROC curve, PRC, calibration curve, and confusion matrix are exhibited in Fig. 4a–d, respectively. Fig. 5 displays relative feature importance according to SHAP values.

The model's (a) ROC curve, (b) precision-recall curve, (c) calibration curve, and (d) confusion matrix. AUC, area under the ROC curve; AUPRC, area under the precision-recall curve; ROC, receiver operating characteristic; VT, ventricular tachyarrhythmia.

SHAP plot of the model sorting features by their relative importance. SHAP, SHapley additive exPlanations.
Discussion
Our study presents an ML approach that accurately predicts VT occurrence in patients with HCM. Model performance achieved a mean AUC of 0.877 and a mean AUPRC of 0.862. In total, 18 radiomic features were utilized: 11 from cine images and 7 from LGE images. These findings indicate that LGE and cine radiomic feature-based ML models may assist clinicians in predicting VT occurrence to guide preventative measures. As HCM constitutes a relatively common inherited disease with increasing clinical recognition, effective complication management strategies, such as VT and SCD risk stratification, are paramount.
Current guidelines recommend CMRI for initial HCM patient assessment, enabling evaluation of maximum wall thickness, apical aneurysm detection, and fibrosis identification through LGE – closely linked to VT (5). Prior evidence shows that LGE presence, extent, and heterogeneity are associated with significant clinical endpoints. As such, LGE was proposed as a potential HCM risk marker. Our study demonstrates ML models leveraging cine and LGE image-derived radiomic features may prove useful for predicting VT occurrence in patients with HCM.
SCD is the predominant mortality cause in young patients with HCM, threatening approximately 5%. Mortality risk stratification is, therefore, fundamental, as ICDs effectively prevent SCD in HCM. Current guidelines recommend using the European Society of Cardiology (ESC) 6% 5-year risk threshold to categorize high-risk patients as candidates for ICD therapy. However, adverse events frequently affect low-risk patients with HCM as well (22). Therefore, additional strategies to predict adverse outcomes still warrant development since standard risk scoring has limitations.
Recent evidence from Fahmy et al. shows LGE radiomics is significantly associated with SCD risk in HCM, improving risk stratification beyond standard ESC or American College of Cardiology/American Heart Association models (11). These findings align with Wang et al., where LGE image-derived radiomic features held independent prognostic value for identifying high sudden death risk in patients with HCM (9). Another noteworthy study in the context of SCD and ICD usage is the research conducted by Augusto et al., which demonstrated that ML-facilitated measurement of left ventricular maximum wall thickness in hypertrophic cardiomyopathy surpasses the accuracy of human experts. This advancement might have potential implications, especially considering the critical cutoff point of over 30 mm for ICD implementation (23).
In this study, we used a novel ML algorithm to investigate the predictive value of cine and LGE CMRI radiomic features. Similar initiatives exist, like Kochav et al.'s examination of clinical and imaging feature integration in patients with HCM (24). Their ML models outperformed traditional risk stratification accuracy. Smole et al. likewise underscored superior, feasible ML-based risk stratification (25). Uniquely, our study predicted VT solely from CMRI, without additional clinical variables. Moving forward, ML models hold the potential to significantly advance HCM management and risk stratification if barriers to clinical integration diminish.
Alis et al. similarly assessed VT occurrence in HCM using ML models and LGE image-derived radiomic features (12). Their best model achieved a slightly higher AUC of 0.92 with SMOTE. However, our study has some superior aspects. First, Alis et al. did not share source code for data preprocessing and classification, preventing independent reproducibility. Second, they omitted calibration metrics/curves. Calibration refers to the agreement between estimated and observed event numbers, considered the “Achilles heel” of predictive analytics (26). Systematic reviews highlight that poor calibration can produce misleading predictions (27), yet calibration assessment remains underutilized (28–33). We displayed near-ideal calibration (Fig. 4c) with a Brier score of 0.119. Third, we additionally incorporated cine image-based radiomic features. Notably, 11 of 18 selected features were cine-derived. We also integrated 2D and 3D shape descriptors, whereas Alis et al. only used textural features.
The present study has some limitations. Our major limitation was the small sample size. As we only included patients with recently diagnosed HCM, our work constitutes an initial hypothesis-generating study. Further research with expanded cohorts is still needed to validate our findings. In addition, we did not perform external validation, which would be a crucial next step for the generalizability of our findings. Class imbalance was another notable limitation, which we aimed to address using the SMOTE.
In conclusion, growing recognition and understanding of hypertrophic cardiomyopathy (HCM) underscore the need for improved risk stratification to prevent complications like VT and SCD. Our study reveals an ML model that accurately predicts VT occurrence in patients with HCM, achieving an AUC of 0.991 and AUPRC of 0.985. This promising performance leverages 12 radiomic features from LGE and cine CMRI. Therefore, our work enhances CMRI's potential as a pivotal tool for VT prevention in patients with HCM through non-invasive imaging.
Footnotes
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
