Abstract
Hemorrhagic progression of contusion (HPC) often occurs early in cerebral contusions (CC) patients, significantly impacting their prognosis. It is vital to promptly assess HPC and predict outcomes for effective tailored interventions, thereby enhancing prognosis in CC patients. We utilized the Attention-3DUNet neural network to semi-automatically segment hematomas from computed tomography (CT) images of 452 CC patients, incorporating 695 hematomas. Subsequently, 1502 radiomic features were extracted from 358 hematomas in 261 patients. After a selection process, these features were used to calculate the radiomic signature (Radscore). The Radscore, along with clinical features such as medical history, physical examinations, laboratory results, and radiological findings, was employed to develop predictive models. For prognosis (discharge Glasgow Outcome Scale score), radiomic features of each hematoma were augmented and fused for correlation. We employed various machine learning methodologies to create both a combined model, integrating radiomics and clinical features, and a clinical-only model. Nomograms based on logistic regression were constructed to visually represent the predictive procedure, and external validation was performed on 170 patients from three additional centers. The results showed that for HPC, the combined model, incorporating hemoglobin levels, Rotterdam CT score of 3, multi-hematoma fuzzy sign, concurrent subdural hemorrhage, international normalized ratio, and Radscore, achieved area under the receiver operating characteristic curve (AUC) values of 0.848 and 0.836 in the test and external validation cohorts, respectively. The clinical model predicting prognosis, utilizing age, Abbreviated Injury Scale for the head, Glasgow Coma Scale Motor component, Glasgow Coma Scale Verbal component, albumin, and Radscore, attained AUC values of 0.846 and 0.803 in the test and external validation cohorts, respectively. Selected radiomic features indicated that irregularly shaped and highly heterogeneous hematomas increased the likelihood of HPC, while larger weighted axial lengths and lower densities of hematomas were associated with a higher risk of poor prognosis. Predictive models that combine radiomic and clinical features exhibit robust performance in forecasting HPC and the risk of poor prognosis in CC patients. Radiomic features complement clinical features in predicting HPC, although their ability to enhance the predictive accuracy of the clinical model for adverse prognosis is limited.
Introduction
Traumatic brain injury (TBI) is a severe neurological disorder 1 and is expected to persist as one of the top three causes of injury-related human mortality and disability by 2030. 2 Secondary hemorrhagic progression of contusion (HPC) significantly impacts the prognosis of cerebral contusions (CC) patients. 3 -5 Precise early prediction of HPC and prognosis in CC patients is crucial for implementing personalized interventions to mitigate secondary brain injury, thereby improving patient prognosis.
The current practice trend and consensus involve a thorough and personalized assessment of patients by integrating various characteristic features. 6,7 Radiomics, facilitating the high-throughput extraction of quantitative features from patient images, is increasingly combined with feature engineering and machine learning techniques to assist in clinical decision-making. 8 –11
In this study, we employed a three-dimensional (3D) segmentation network for the semi-automatic segmentation of traumatic intracerebral hematomas, focusing on three main categories of clinical features: historical and physical examination features, laboratory results, and radiological features. Radiomic features were extracted from the hematomas to complement and predict HPC and prognosis. The results indicate that our combined model demonstrates substantial predictive efficacy. This facilitates early precision prediction and individualized intervention for CC patients, aiming to improve prognosis.
Methods
Patient cohort
We retrospectively included patients treated at the Neurosurgery Department of Chongqing Emergency Medical Center from September 2015 to October 2022. The inclusion criteria were discharge diagnosis of non-obstetric patients with TBI coded as International Classification of Diseases (ICD) 10-S06 and the occurrence of CC. For external validation, we included similar patients admitted to hospitals in Yubei District, Qianjiang District, and Bishan District of Chongqing from October 2022 to October 2023.
Inclusion criteria were patients aged 18 years or older, availability of a 5-mm head computed tomography (CT) scan within 12 h post-trauma, and a follow-up 5-mm head CT within 72 h of the baseline CT. Both baseline and follow-up CT scans should show clear traumatic intracerebral hematomas, and laboratory results should be conducted within 12 h post-injury. Exclusion criteria included incomplete patient medical records, severe artifacts on baseline or follow-up CT scans, surgical intervention before the follow-up CT scan, a history of craniotomy, pre-existing neurological deficits, and anticoagulant therapy before the trauma. The inclusion and exclusion process, as well as the study workflow, are illustrated in Figure 1. All patients received treatment according to the internal standard operating procedures of their respective hospitals and traumatic brain injury management guidelines. 12

Inclusion flowchart for patients with CC and workflow of the study. Center 1: Chongqing Emergency Medical Center; Center 2: Yubei District Hospital of Traditional Chinese Medicine; Center 3: Chongqing Qianjiang Central Hospital; Center 4: Bishan hospital of Chongqing Medical University. CC, cerebral contusions.
Criteria for defining targets are as follows: an increase in traumatic intracerebral hematoma volume by 6 mL or more, or an increase of 33% or greater compared with baseline measurements, is considered indicative of hematoma expansion. 13,14 If any hematoma in a patient shows such expansion, the patient is determined to have experienced HPC. Prognosis is assessed using the Glasgow Outcome Scale (GOS) score at discharge, with scores of 1-3 indicating poor prognosis and scores of 4-5 indicating a favorable prognosis.
Patients were randomly assigned to either the training or test set in a 7:3 ratio. To prevent data leakage, hematoma data from patients were allocated to the same set as the patients themselves.
This retrospective study solely collected de-identified patient data and has been reviewed and approved by the medical ethics committee of the Chongqing Emergency Medical Center (Chongqing University Central Hospital), by the medical ethics committee of Yubei District Hospital of Traditional Chinese Medicine, by the medical ethics committee of Chongqing Qianjiang Central Hospital (Chongqing University Qianjiang Hospital), and by the medical ethics committee of Bishan hospital of Chongqing Medical University. In accordance with local laws and the requirements of the above-mentioned institutions, written informed consent from the relevant patients/participants or their legal guardians/relatives was waived for this study.
Data collection
Clinical history and physical examination data, encompassing age, heart rate, Glasgow Coma Scale (GCS) score, Injury Severity Score (ISS), and more were extracted from the medical record system. Additionally, laboratory test results, including platelet count, hemoglobin concentration, prothrombin time, and international normalized ratio (INR), were collected.
Baseline and follow-up CT original Digital Imaging and Communications in Medicine (DICOM) data were obtained for patients. For the selection of follow-up CT scans, in cases of hematoma expansion, the scan showing the largest hematoma within 72 h was chosen. If no expansion was evident, the latest scan within the same timeframe was selected. CT imaging of the head used the orbitomeatal line as the scanning baseline, with a uniform slice thickness of 5 mm, scanning voltage ranging from 120 to 140 KV, and image resolution at 512 × 512 pixels. The DICOM data originated from various equipment produced by multiple manufacturers, and detailed CT scanning parameters and equipment models are provided in the Supplementary Material (Supplementary Material S1).
Radiological features were independently diagnosed by two physicians, A and B, each with 15 years of experience in radiology. Any discrepancies were resolved by a third radiologist, C, with 20 years of experience. Assessed radiological features included the presence or absence of concurrent hematomas, the status of the basal cisterns and third ventricle, Marshall CT score, Rotterdam CT score, and the multi-hematoma fuzzy sign (MFS). 15
Upon completing model development, patient data required for the models were collected from three external validation centers to assess the external validity of the models. The identification of radiological features in these patients followed the aforementioned protocol.
Region of interest segmentation
In this study, the region of interest (ROI) was defined as the area of traumatic intracerebral hematoma identified by a radiologist on the patient's CT scan. The segmentation process included manual segmentation, training of an automatic segmentation model, and semi-automatic segmentation.
Patient imaging data in DICOM format were loaded into 3D-Slicer software (version 5.0.3; http://www.slicer.org), with window settings adjusted to W100 L50. Hematoma segmentation was performed by setting thresholds between 40-100 Hounsfield units (HU). 16 -18 Physician A used this methodology to segment hematomas for all patients in the automatic segmentation model development cohort. Subsequently, Physician C reviewed the segmentation results for all patients.
The segmentation outcomes were split into training and test sets in a 7:3 ratio. Imaging data and manual segmentation masks from the training set were used to build an automatic segmentation model. Transfer learning was applied to the publicly available parameters of the open-source segmentation model SSL4MIS (https://github.com/HiLab-git/SSL4MIS), training popular convolutional encoder-decoder network models such as 3DUNet, Attention-3DUNet, and 3DVNet. Model performance was compared on the test set, and the optimal model was chosen to segment hematomas for all patients in the predictive model development cohort and external validation cohort.
The original images and segmentation results from the model development cohort and external validation cohort were converted to NRRD files and loaded into 3D-Slicer. The Paint and Erase commands were used for review and correction, and the Islands tool was employed to separate non-contiguous hematoma regions. Post-segmentation, the ROI volume was measured in the Volume Information module. Physician A, along with Physician B, corrected the automatic segmentation results. Sixty-six hematomas from 50 patients were corrected by both physicians, and their individual correction results were used to calculate the intraclass correlation coefficient (ICC) to determine the stability of radiomic features throughout the segmentation process.
Feature extraction and selection
Radiomic feature extraction
Patient CT images and segmentation masks were saved as NRRD files and processed using the Pyradiomics package (version 3.0.1; https://pyradiomics.readthedocs.io/) for feature extraction. Before extraction, images were normalized (normalizeScale = 1) and discretized based on gray level (binWidth set at 25). 19 The interpolation method “sitkBSpline” was used to resample images to a resolution of 1 × 1 × 1 mm3, while “sitkNearestNeighbor” was employed for resampling masks to the same resolution. 20
A total of 1502 features, comprising shape, first-order, and texture features, were extracted from each ROI's original and filtered images (wavelet, Laplacian of Gaussian (LoG), and Local Binary Pattern (LBP) filters). The detailed composition of radiomic features is provided in the Supplementary Material (Supplementary Material S2).
Fusion of multi-lesional radiomic features for prognosis
The expansion probability of each hematoma was analyzed individually, with the maximum probability across all hematomas assigned as the patient's overall risk of HPC. However, considering prognosis at the patient level requires thoughtful approaches. While consensus on the optimal method for combining multi-lesional radiological features is lacking, literature reports valuable approaches.
21
–24
Consequently, the following additional features were calculated for each patient: (i)Radiomic features of the largest hematoma; (ii) The sum of radiomic features across all hematomas; (iii) The average of radiomic features across all hematomas; (iv) The volume-weighted sum of radiomic features across all hematomas.
These features, generated using the four aforementioned methods, were integrated as individual patient radiomic features and analyzed with patient prognosis as the label.
Radiomic feature selection and Radscore construction
Initially, radiomic features with an ICC less than 0.8 were discarded. The remaining features were standardized and subjected to univariate analysis to eliminate those not showing significant associations with the outcome variable. Subsequently, the top 200 features with the highest Fisher coefficients were selected for further analysis. The least absolute shrinkage and selection operator (LASSO) method were employed to refine the feature set, with the penalty parameter determined through 5-fold cross-validation to minimize overfitting.
After LASSO, the 10 features with the largest coefficients were extracted. To address multi-collinearity, Spearman's correlation coefficients were computed for all pairs of these features. Pairs with a correlation coefficient greater than 0.8 were identified, and from each pair, the feature with the smaller LASSO coefficient was removed. The remaining features were used to construct the radiomic signature (Radscore), calculated by the formula:
where Xi
represents the value of each feature for a patient,
Clinical feature selection
For the selection of clinical features, categorical variables were one-hot encoded and combined with continuous variables to form a comprehensive feature set, which then underwent feature selection. Initially, the top five features with the highest Fisher scores were identified. Spearman's correlation coefficients were calculated for each pair of these features, and for any pairs with a correlation coefficient greater than 0.8, the feature with the lower Fisher score was discarded. Subsequently, additional features from the remaining set were selected based on their Fisher scores to reconstitute a set of five features. This process was repeated until the selected features exhibited no strong pairwise correlations. All feature selection procedures were performed on the training dataset.
Model construction and evaluation
Using the chosen five clinical features and Radscore, various machine learning techniques—including multiple logistic regression (LR), linear discriminant analysis (LDA), support vector machine (SVM), stochastic gradient descent (SGD), Gaussian Naive Bayes (GNB), decision tree (DT), and random forest (RF)—were employed to build models for HPC and prognosis. These models comprised a radiomics model, a clinical model, and a combined model. Additionally, a logistic model was employed to create a nomogram. In the training set, a 5-fold cross-validation approach was employed to determine the optimal hyperparameters, selecting those yielding the highest mean area under the receiver operating characteristic curve (AUC) across validation sets. Subsequently, the final model was fitted using these parameters.
Model evaluation used metrics such as accuracy, sensitivity, specificity, F1-score, positive predictive value (PPV), and negative predictive value (NPV) to gauge model performance. The AUC was employed to assess model discrimination, while the calibration curve and Brier scores were used to evaluate model calibration. Decision Curve Analysis (DCA) was employed to showcase the clinical utility of the model.
Statistical analysis
Statistical analyses were carried out using the open-source Python package “Scipy.stats.” The details of univariate significance testing are as follows. Initially, the Shapiro-Wilk test assessed the normality of feature distributions. For features deviating from a normal distribution, the Mann-Whitney U test determined their significance concerning the outcome of interest. Features adhering to a normal distribution underwent Levene's test to examine homogeneity of variances. Those with equal variances were further analyzed using Student's t-test for significance, while those with unequal variances were assessed using Welch's t-test. AUC values and their confidence intervals (CIs) for the 5-fold cross-validation were computed using the “cvAUC” package in R. 25 Nomograms were generated using the “simpleNomo” Python package. 26 Confidence intervals for the remaining AUC values and Brier scores were obtained through bootstrapping with 1000 resamples of the respective datasets. The threshold probabilities for all models were determined using Youden's index at its maximum from the training cohort. Throughout the study, two-tailed p values less than 0.05 were considered statistically significant.
Results
Clinical data
The automatic segmentation model development cohort comprised 452 patients with CC, involving 695 hematomas. Among these, the training set included 496 hematomas from 315 patients, while the test set encompassed 199 hematomas from 137 patients.
The predictive model development cohort involved 261 patients with 358 hematomas. Within this group, 139 patients experienced HPC, and 106 were determined to have a poor prognosis. The training set for this cohort included 182 patients with 248 hematomas, and the test set included 79 patients with 110 hematomas.
The external validation cohort comprised 170 individuals with 234 hematomas. Among these individuals, 64 exhibited HPC, while 69 were deemed to have a poor prognosis.
Detailed statistics of patients' clinical features can be found in the Supplementary Material (Supplementary Material S3). The distribution of various features between the training and test sets is outlined in the Supplementary Material (Supplementary Material S4). Features of the external validation cohort are provided in the Supplementary Material (Supplementary Material S5).
Regarding HPC, factors such as Radscore for hematoma expansion (Radscore-HE), hematoma location in the frontal lobe or deep brain, concurrent subdural hemorrhage (SDH), concurrent skull fracture, status of the basal cisterns, Marshall CT score, Rotterdam CT score, MFS, multiple cerebral contusions, INR, and hemoglobin all demonstrated significance (p < 0.05).
Concerning prognosis, factors including Radscore for prognosis (Radscore-PS), presence of diabetes, bilateral pupillary reactivity, hematoma in the parietal lobe, concurrent SDH, status of the basal cisterns, age, admission systolic blood pressure, GCS score and its three sub-scores, ISS score and Abbreviated Injury Scale for the head (AIS-Head), body temperature, total volume of intracerebral hematomas, D-dimer, INR, prothrombin time, prealbumin, lactate dehydrogenase, hemoglobin, platelet count, albumin, and white blood cell count all exhibited significance (p < 0.05).
Comparison of automated segmentation model performance
As shown in Table 1, among various automated segmentation models, the 3D U-Net demonstrated the highest Dice similarity coefficient (DSC), registering a value of 0.83 ± 0.22. This outperformed the Attention-3D U-Net (0.82 ± 0.30) and V-Net (0.79 ± 0.30). However, the 95th percentile Hausdorff distance (HD95) for the Attention-3D U-Net was significantly lower than that of the 3D U-Net. Considering these metrics comprehensively, the Attention-3D U-Net was chosen as the segmentation model for subsequent analyses. Figure 2 illustrates the segmentation effects achieved by the automated models.

Display of the Attention-3D U-Net automatic segmentation model performance. Panels
Performance of Automated Segmentation Models
DSC, Dice similarity coefficient; HD95, 95th percentile Hausdorff distance; Std, standard deviation.
Selected features
From an initial set of 1502 radiomic features, 1226 exhibited an ICC above 0.8. Concerning HPC, out of these 1226 features, 445 persisted after univariate significance analysis. Subsequent Fisher feature selection and LASSO dimensionality reduction resulted in 12 features. After eliminating linearly correlated features, seven features remained (Fig. 3A). For prognosis, 4904 individual radiomic features were derived from the initial 1226 features, with 1234 remaining post-univariate analysis. Following Fisher selection and LASSO reduction, 17 features were selected, ultimately reducing to seven features after excluding linearly correlated ones (Fig. 3B). Detailed heatmaps of the Spearman correlation coefficients for the two LASSO reduction processes and the selected radiomic features are provided in the Supplementary Material (Supplementary Material S6).

LASSO Coefficients of the selected radiomic features.
For HPC, clinical features selected through Fisher scores and Spearman pairing included hemoglobin, a Rotterdam CT score of 3, MFS, concurrent SDH, and the INR. For prognosis, selected clinical features included age, AIS-Head, Glasgow Coma Scale Motor component (GCS-M), Glasgow Coma Scale Verbal component (GCS-V), and albumin levels. The Spearman correlation coefficients between each feature are detailed in the Supplementary Material (Supplementary Material S7). The formula for calculating the Radscore is provided in the Supplementary Material (Supplementary Material S8).
Model performance
Considering model predictive efficiency (Supplementary Material S9 in the Supplementary Material), fit, complexity, and interpretability, LR was selected as the final model and illustrated in a nomogram.
Model for HPC
The radiomics model achieved an AUC of 0.774 (95% CI: 0.679-0.863) in the test set (Table 2; Fig. 4C) and 0.800 (95% CI: 0.745-0.857) in the external validation set (Supplementary Material S10 in the Supplementary Material) for predicting HPC. In contrast, the clinical model, employing predictors such as a Rotterdam CT score of 3, presence of MFS, hemoglobin levels, concurrent SDH, and INR, attained AUCs of 0.803 (95% CI: 0.713-0.890) in the test set (Table 2; Fig. 4B) and 0.773 (95% CI: 0.712-0.833) in the external validation set (Supplementary Material S10 in the Supplementary Material).

Performance of the hematoma expansion model
Predictive Effect of Each Selected Logistic Model for HPC
AUC, area under the receiver operating characteristic curve; ACC, accuracy; SEN, sensitivity; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value.
The combined model, integrating clinical and radiomic features, reached AUCs of 0.848 (95% CI: 0.765-0.924) in the test set and 0.836 (95% CI: 0.780-0.888) in the external validation set (Table 2; Fig. 4A, 4F). The calibration curves, indicate that the combined model has good calibration, with Brier scores of 0.158 (95% CI: 0.111-0.207) in the test set (Fig. 4D) and 0.151(95% CI: 0.123-0.179) in the external validation cohort (Fig. 4F). DCA curves suggest that both the clinical and combined models have substantial clinical utility, with the addition of radiomic features being of significant value (Supplementary Material S11 in the Supplementary Material).
Model for prognosis
The clinical model, consisting of patient age, AIS-Head, GCS-V, GCS-M, and albumin levels, achieved AUC values of 0.840 (95% CI: 0.757–0.915) and 0.781 (95% CI: 0.722–0.840) in the test set (Table 3; Fig. 5B) and external validation set (Supplementary Material S10 in the Supplementary Material), respectively. Acknowledging that older patients often experience worse prognosis, the performance of the clinical model excluding age is displayed in the Supplementary Material (Supplementary Material S12); the remaining clinical variables still exhibit predictive utility, with AUC values of 0.793 (95% CI: 0.704-0.878) in the test set and 0.759 (95% CI: 0.697-0.819) in the external validation set. However, the radiomics model demonstrated suboptimal performance, with AUCs of only 0.686 (95% CI: 0.579-0.781) and 0.665 (95% CI: 0.595-0.732) in the test set (Table 3; Fig. 5C) and external validation set (Supplementary Material S10 in the Supplementary Material), respectively.

Performance of the prognosis model.
Predictive Effect of Each Logistic Model Selected for Prognosis
AUC, area under the receiver operating characteristic curve; ACC, accuracy; SEN, sensitivity; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value.
The combination of Radscore-PS with the clinical model resulted in marginal improvements, raising the AUCs to 0.846 (95% CI: 0.769-0.921) and 0.803 (95% CI: 0.747-0.860) in the test and external validation cohorts, respectively (Table 3; Fig. 5A, 5F). The calibration curves indicate that the combined model for predicting prognosis is well-calibrated, with Brier scores of 0.156 (95% CI: 0.115-0.197) and 0.191 (95% CI: 0.159-0.226) in the test and external validation sets, respectively (Fig, 5D, 5F). DCA curves suggest that the combined model holds clinical value (Supplementary Material S11 in the Supplementary Material). Nonetheless, the inclusion of radiomic features offers only a modest enhancement to the predictive performance of the model when compared with the clinical model alone.
Discussion
Previous investigations have explored the risks associated with HPC and poor prognosis in patients with CC. 6,27 –30 Despite these efforts, no singular biomarker has exhibited the required accuracy and reliability for prognostication. 7 Additionally, models integrating multiple features have not yet met the necessary standards for clinical application. 2,6,28,31 Certain features, especially those extracted from radiological reports, are susceptible to subjective interpretations by physicians, introducing human variability into the results. 32 –34 In this study, we leveraged the Attention-3DUNet to aid physicians in the semi-automatic segmentation of intracerebral hematomas. By integrating various radiomic features from CC patients with clinical features, we aimed to predict the risk of hematoma expansion and poor prognosis. Our findings suggest that radiomic features are effective in predicting hematoma expansion, with the combined model outperforming the clinical model, although its utility for prognosis prediction is comparatively weaker.
The implementation of Attention-3DUNet aimed to alleviate the burden of manual segmentation on physicians, consequently reducing the impact of human factors on the outcomes. Current radiomic analyses heavily rely on the accurate delineation of the ROI in patients, and the manual layer-by-layer segmentation of three-dimensional images is both time-consuming and introduces subjectivity. 31,35,36 In our study, various mainstream 3D segmentation neural networks were trained based on CT images of CC patients. After comparison, Attention-3DUNet was selected to assist in segmentation.
While the training set performance of 3DUNet was suboptimal, its testing set segmentation performance, evaluated by the DSC, surpassed the other two networks. This discrepancy may stem from the complex architecture of Attention-3DUNet, incorporating residual and attention mechanisms, necessitating a larger dataset for optimal performance. Although the mean DSC for Attention-3DUNet in the testing set was slightly lower than that of 3DUNet, its mean HD95 reduced by 32%, indicating that with an expanded training set, Attention-3DUNet could potentially surpass the performance of 3DUNet. Post-training, all three networks achieved a DSC of around 0.8, accurately localizing hematomas for the majority of cases, approaching the segmentation proficiency of radiological experts. However, manual correction revealed challenges in distinguishing extracerebral hemorrhages, such as epidural and subarachnoid, from intracerebral hematomas. Additionally, the networks exhibited limitations in delineating certain hematoma boundaries, highlighting areas for improvement.
This study adopted a comprehensive approach by integrating radiomic features from multiple hematomas to analyze patient prognosis, providing a more holistic consideration of the influence of multiple hematomas. Currently, no standardized method exists for summarizing a patient's features from multiple lesions, 24,37,38 although some literature suggests effective combination methods. 21 –24 In this study, four aggregation methods—maximal lesion, sum of lesions, mean of lesions, and volume-weighted lesions—were employed and combined to identify the most predictive features. To mitigate redundancy and collinearity resulting from these aggregation methods, we employed univariate significance analysis and Fisher screening to pinpoint features with clear predictive roles. Subsequently, LASSO, coupled with correlation coefficient pairing, was applied to suppress feature collinearity. Results indicated the superiority of this fusion method over using only the largest lesion's radiomic features. Evaluation using the VIF suggested that the selected features did not suffer from severe collinearity, and the final model exhibited no signs of overfitting.
In the context of extensive prior research, 32,39 -42 this study constructed predictive models based on a larger and more diverse patient cohort from multiple centers, incorporating an extensive array of clinical features from patients with CC. By carefully selecting the most salient features, robust discriminative models with high calibration for predicting HPC and prognosis were established. The combined prediction model for HPC and prognostic demonstrated excellent performance, showing good discrimination and calibration in both the test set and the external validation set (AUC >0.80, Brier score <0.20). Notably, the specificity and PPV of the HPC model were high, effectively identifying patients at low risk for HPC (with specificity of 0.861 and PPV of 0.865 in the test set, and specificity of 0.830 and PPV of 0.723 in the external validation set). Supplementary Material S11 in the Supplementary Material demonstrates that DCA curves highlight a significant advantage of the combined HPC model over the clinical model alone. While the introduction of radiomics features did not markedly enhance the prognosis model, the combined model still holds substantial clinical utility.
For the prediction of HPC, we strategically selected five clinical features—hemoglobin levels, a Rotterdam CT score of 3, MFS, concurrent SDH, and INR—to construct our clinical prediction model. Lower baseline hemoglobin levels emerged as a significant predictor of higher HPC risk, potentially linked to the role of red blood cells in hemostasis and their impact on platelet radial transport efficiency to the vessel wall. 13 The Rotterdam CT score, a measure of brain injury extent on CT scans, demonstrated relevance to post-operative hematoma expansion in TBI patients undergoing decompressive craniectomy. Notably, patients with a Rotterdam CT score of 3 exhibited a higher propensity for HPC in our study. The risk distribution across scores from 1 to 5 was 25% (5/20), 35.7% (25/70), 74.8% (83/111), 43.4% (23/53), and 42.8% (3/7), respectively. These findings slightly deviate from prior research, possibly owing to our more stringent definition of hematoma expansion. 43,44 In the external validation cohort, though the smaller dataset may limit its generalizability, the percentages of HPC observed in patients with Rotterdam scores from 1 to 6 were 11.9% (5/42), 28% (14/50), 56.5% (26/46), 57.1% (12/21), 60% (6/10), and 100% (1/1), respectively.
MFS, 15 indicative of a hematoma containing both clots and fresh blood, suggested ongoing bleeding and a lack of stabilization, thus associating with secondary hemorrhage.45 Concurrent SDH, reflecting head injury severity, was hypothesized to be linked with damage to cortical veins, bridging veins, and venous sinuses, elevating HPC risk. 30,45,46 Our study revealed a higher INR as a significant predictor of HPC, suggesting that patients with coagulopathy are more prone to HPC. 2,47 However, some literature posits a lack of correlation between coagulopathy and HPC in CC patients. 42,46,48 Notably, in our study, only INR emerged as a robust predictor of HPC. Although patients with HPC exhibited longer prothrombin time and activated partial thromboplastin time, and lower platelet counts and fibrinogen levels, statistical significance was not reached (p values of 0.076, 0.507, 0.332, and 0.661, respectively). Further validation in a larger patient sample is warranted.
For prognosis, we identified five pivotal features for constructing a clinical prediction model: age, AIS-Head, GCS-M, GCS-V, and albumin levels. The paramount predictor was patient age, a consistent and well-established factor widely supported by the literature. 6,49 Baseline GCS scores, reflective of the patient's consciousness level, have consistently emerged as prognostic indicators for TBI patients in multiple studies, signifying injury severity. 2,6,47 In our study, both the AIS score and the overall ISS demonstrated prognostic efficacy, with the former proving more effective, aligning with the observations of Tsai and colleagues. 50 The heightened metabolic stress in TBI patients, vascular damage leading to hemorrhage, blood–brain barrier dysfunction, and the release of inflammatory factors contributing to increased vascular permeability may collectively contribute to lower baseline serum albumin levels. Sustained hypoalbuminemia can result in reduced plasma oncotic pressure, inducing cerebral edema and elevated intracranial pressure, thereby exacerbating secondary brain injury and influencing patient prognosis. 51 -53
In predicting HPC, our study underscores the significant value of radiomic features as a valuable complement to the clinical model. We identified seven radiomic features, comprising one shape feature and six texture features. The shape feature, original_shape_Sphericity, gauges the sphericity of the ROI, revealing that lower sphericity of the hematoma correlates with an increased risk of expansion. Irregularly shaped hematomas exhibit a higher propensity for progression, with shape features contributing 42.161% to the importance in the Radscore-HE. Texture features, including ShortRunHighGrayLevelEmphasis, ShortRunLowGrayLevelEmphasis, ShortRunEmphasis, LongRunLowGrayLevelEmphasis, RunLengthNonUniformityNormalized, and DependenceNonUniformityNormalized, evaluate the joint distribution of run lengths at different gray levels. They provide insights into the texture of the ROI by measuring similarities in run lengths and dependencies. Texture features contribute 57.839% to the importance in the Radscore-HE. Radiomic features, specifically assessing the ROI's shape and texture, serve as valuable indicators, reflecting the risk of HPC.
In our study, the predictive effect of radiomic features on patient prognosis was relatively weaker. We identified seven radiomic features for forecasting the risk of poor prognosis, encompassing two first-order intensity features, one shape feature, and four texture features. The shape feature, WEIGHT_original_shape_MajorAxisLength, signifying the weighted major axis length across multiple ROIs relative to each hematoma's volume, played a significant role in the Radscore-PS, contributing 18.55% to its importance. Our findings suggest that a larger weighted axis length is associated with a poorer prognosis, possibly reflecting the mass effect of the hematoma, providing predictive value for prognosis. First-order features, including Ibp-3D-k_firstorder_Minimum, MEAN_log-sigma-4-0-mm-3D_firstorder_90Percentile, and MEAN_wavelet-LLH_firstorder_Mean, describing the minimum gray level, 90th percentile, and mean of gray levels across ROIs, respectively, contributed 54.982% to the importance in the Radscore-PS. This indicates that lower HU values in hematomas are indicative of a higher risk of poor prognosis.
Additionally, features such as SUM_log-sigma-3-0-mm-3D_gldm_LowGrayLevelEmphasis, MEAN_log-sigma-3-0-mm-3D_gldm_LowGrayLevelEmphasis, and SUM_log-sigma-4-0-mm-3D_glszm_GrayLevelNonUniformity signify that lower and more non-uniform gray levels within ROIs correlate with a worse prognosis. Despite some efficacy in prognostic prediction, the pure radiomics model could not surpass the performance of clinical features alone. This suggests that the predictive power of radiomic information, derived from baseline CT scans, is relatively limited in capturing the multifactorial nature of patient prognosis throughout the treatment process.
Our study exhibits certain limitations. Firstly, despite conducting multi-center validation, the generalizability of our model warrants further confirmation in a larger patient cohort. Secondly, the three mainstream 3D automatic segmentation neural networks we tested rely on extensive manual segmentation. Future research could explore semi-supervised models, such as generative adversarial networks, to mitigate manual segmentation workload and enhance results. Thirdly, the need persists for more rational approaches to integrate features from multiple lesions. A promising avenue involves employing deep learning strategies based on multi-target attention mechanisms, capable of assigning appropriate weights to each lesion in a patient. 38 This approach, however, necessitates validation on a larger sample size to ensure robustness. Lastly, patient prognosis is influenced by numerous factors during the treatment process. Future endeavors should involve incorporating post-baseline patient features into prediction models. Ongoing predictive models could be developed by following up with patients throughout their treatment processes, providing a more comprehensive understanding of prognostic factors.
Conclusion
In this study, we successfully constructed a predictive model by amalgamating radiomic and clinical features, encompassing medical history, physical examination findings, and laboratory and imaging results. The model exhibits commendable efficacy in forecasting the risk of HPC and poor prognosis in patients with CC. Notably, the incorporation of radiomic features markedly enhances the predictive capacity of clinical attributes for HPC. However, in the realm of prognostication for poor outcomes, radiomic features exhibit limited incremental improvement in predictive performance over clinical features alone.
Transparency, Rigor, and Reproducibility Summary
This study is a retrospective analysis registered with the Chinese Clinical Trial Registry (www.chictr.org.cn) subsequent to its commencement.
While the analytical plan was not formally preregistered, team members primarily responsible for the analysis, Haoyue He and Jinxin Liu, have attested to the plan being predetermined.
As outlined in the Supplementary Material (Supplementary Material S14), sample size calculations adhered to standards set forth in the literature concerning machine learning and radiomics. The established predictive models for both hematoma expansion and poor prognosis yielded an expected prediction value (EPV) of 23.167 and 17.000, respectively, in line with previously documented requirements for predictive models. Additionally, contemporary metrics such as Root Mean Squared Percentage Error (rMSPE) and Mean Absolute Percentage Error (MAPE) were employed to assess whether the sample size met the demands for model accuracy. The sample size was deemed sufficient to ensure the models' robust predictive performance.
From a cohort of 2406 patients, 452 were selected to develop an automated segmentation model, and data from 261 patients were used to construct the clinical prediction model. External validation was conducted with 170 individuals out of 581 from three separate hospitals. The detailed patient inclusion flowchart is depicted in Figure 1.
Image acquisition and analysis were conducted by team members who maintained confidentiality regarding the participants' characteristics; similarly, clinical outcomes were assessed by team members blinded to the patients' disease progression and outcomes.
As shown in the Supplementary Material (Supplementary Material S1), imaging data were collected from September 2015 to October 2023 across nine different CT scanner models from four medical centers, representing four distinct manufacturers. The imaging data underwent preprocessing (resampling and normalization, details in section “Radiomic Feature Extraction”) before radiomic analysis. The imaging acquisition parameters are detailed in the Supplementary Material (Supplementary Material S1).
All equipment and software utilized for imaging and preprocessing are commercially available. Inclusion criteria and outcome assessments were based on established standards, with references provided within the text.
Normality of features was assessed using the Shapiro-Wilk test, significance of non-normally distributed features was tested with the Mann-Whitney U test, homogeneity of variances was evaluated using Levene's test, and the significance of normally distributed features with equal variances was tested using the Student's t-test, while the Welch's t-test was applied to features with unequal variances. Chi-squared tests were used for categorical features.
The models developed in this study were tested in three external centers, demonstrating good predictive capabilities. De-identified data in this study is not provided in public archives.
To protect intellectual property, the analytical code used in this study is not publicly available; however, a detailed description of the algorithms is provided in the manuscript.
The authors agree to provide the full content of the manuscript on request by contacting Yongbing Deng.
Footnotes
Acknowledgments
We sincerely thank Prof. Songtao Guo's team at the College of Computer Science, Chongqing University for providing guidance on the programming of this research.
Authors' Contributions
Haoyue He: Conceptualization, data curation (equal), formal analysis (equal), investigation, methodology (equal), software (equal), visualization (equal), writing—original draft (equal); Jinxin Liu: Conceptualization, data curation (equal), formal analysis (equal), investigation, methodology (equal), software (equal), visualization (equal), writing—original draft (equal); Chuanming Li: Investigation; Yi Guo: Investigation; Kaixin Liang: Validation; Jun Du: Validation; Jun Xue: Validation; Yidan Liang: Investigation; Peng Chen: Investigation; Liu Liu: Investigation; Min Cui: Investigation; Jia Wang: Investigation; Ye Liu: Investigation; Shanshan Tian: Project administration, supervision, validation (lead), writing—review and editing; Yongbing Deng: Conceptualization (lead), funding acquisition, investigation (lead), project administration (lead), resources, supervision (lead), writing—review and editing (lead).
Hao-Yue He and Jin-Xin Liu contributed equally to this work.
Funding Information
This work was supported by the Fundamental Research Funds for the Central Universities (2023CDJYGRH-ZD06); by the Natural Science Foundation Project of Chongqing Science and Technology Commission (cstc2020jcyj-msxmX0769); by the Project of Science and Technology Bureau of Yuzhong District, Chongqing (20200135).
Author Disclosure Statement
No competing financial interests exist.
Supplementary Material
Supplementary Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
