Abstract
Background
Patients with colorectal liver metastases (CRLM) who undergo thermal ablation are at risk of developing new CRLM after ablation. Identification of these patients might enable individualized treatment.
Purpose
To investigate whether an existing machine-learning model with radiomics features based on pre-ablation computed tomography (CT) images of patients with colorectal cancer can predict development of new CRLM.
Material and Methods
In total, 94 patients with CRLM who were treated with thermal ablation were analyzed. Radiomics features were extracted from the healthy liver parenchyma of CT images in the portal venous phase, before thermal ablation. First, a previously developed radiomics model (Original model) was applied to the entire cohort to predict new CRLM after 6 and 24 months of follow-up. Next, new machine-learning models were developed (Radiomics, Clinical, and Combined), based on radiomics features, clinical features, or a combination of both.
Results
The external validation of the Original model reached an area under the curve (AUC) of 0.57 (95% confidence interval [CI]=0.56–0.58) and 0.52 (95% CI=0.51–0.53) for 6 and 24 months of follow-up. The new predictive radiomics models yielded a higher performance at 6 months compared to 24 months. For the prediction of CRLM at 6 months, the Combined model had slightly better performance (AUC=0.60; 95% CI=0.59–0.61) compared to the Radiomics and Clinical models (AUC=0.55–0.57), while all three models had a low performance for the prediction at 24 months (AUC=0.52–0.53).
Conclusion
Both the Original and newly developed radiomics models were unable to predict new CLRM based on healthy liver parenchyma in patients who will undergo ablation for CRLM.
Introduction
Approximately 50% of patients with colorectal cancer (CRC) develop liver metastases (1). Thermal ablation can be a valuable alternative to surgery with comparable long-term control and overall survival (OS), when maximally five lesions up to 3 cm are ablated (2–6). After ablation, 24%–48% of patients develop new colorectal liver metastases (CRLM) (7–9). Knowledge about the risk for new metastases provides an opportunity to improve patient selection before ablation. Clinically, it is most relevant to identify patients at risk of new CRLM at up to six months after ablation, because then ablative treatment could be avoided or alternative therapies, such as adjuvant chemotherapy or systemic treatment alone, could be considered to avoid futile treatment and improve patient outcome. However, no tool is available for the prediction of new CRLM before ablation. In radiomics studies, a large number of quantitative imaging features are extracted from regions of interest and mined with a machine-learning (ML) algorithm. This algorithm selects representative features and classifies them to predict outcome (10,11). Radiomics has been investigated exponentially in recent years and has shown potential in the prediction new CRLM, based on contrast-enhanced computed tomography (CECT) scans before any treatment for CRC (12,13). The rationale of these studies is that occult CRLM are already present, but not (yet) visible on radiological images. Radiomics could detect microstructural changes due to visually occult metastases. CECT is most widely used for the diagnosis of CRLM, planning of thermal ablation, and follow-up (14), and is thus readily available for analysis. Since radiomics studies have been successful in the prediction of metachronous CRLM in patients with CRC at diagnosis, we hypothesize that a radiomics model may also have potential to identify patients at risk for developing new CRLM after ablation. The aim of the present study was to predict the development of new CRLM during follow-up after local ablative therapy upfront, by investigating the apparently non-diseased liver parenchyma on pre-ablation CT images, using a previously developed ML model (13).
Material and Methods
Patients
Patients with CRLM, who were treated with thermal ablation (radiofrequency ablation [RFA] or microwave ablation [MWA]) at our institute between 2008 and 2018, were retrospectively analyzed. The inclusion criteria were as follows: (i) histopathologically confirmed colorectal adenocarcinoma; (ii) presence of suspected CRLM on CECT; (iii) thermal ablation of one or more CRLM (≤5 lesions per patient); (iv) CRLM ≤3 cm; and (v) presence of a CECT scan in the portal venous phase (PVP) ≤3 months before ablation. Exclusion criteria were as follows: (i) incomplete ablation, defined as visible residual tumor at the first CECT scan after ablation; (ii) history of diffuse liver diseases such as steatosis or cirrhosis; (iii) uncertainty about colorectal origin of the metastasis; (iv) history of liver treatment (radiotherapy, transarterial chemoembolization [TACE], or selective internal radiation therapy [SIRT]); and (v) delineation problems (including artefacts or insufficient quality of the CECT scan). Patients were not excluded if ablation was combined with hepatic resection or if they had a history of systemic chemotherapy or previous hepatic resection. Follow-up consisted of liver imaging 4–8 weeks after ablation, with subsequent follow-ups every 3–4 months during the first year. In the second year, patients returned to the regular CRC follow-up, with liver imaging every six months (15). New CRLM were defined as the development of any new lesion during follow-up (CT, magnetic resonance imaging [MRI], and/or positron emission tomography) after ablation.
This research study was conducted retrospectively from data obtained for clinical purposes. We consulted extensively with the IRB of the Netherlands Cancer Institute Antoni van Leeuwenhoek (IRB AVL) who determined that for this type of study formal consent is not required. An IRB official waiver was granted from the IRB AVL (18.313/IRBd18066): this application reviewed by the IRB does not meet the WMO criteria and can be considered as a non-WMO statement.
Ablation
RFA was performed either with a single or cluster antenna Kit E series (tip 2–4 cm; 17 Gauge) with the Cool-tip™ RF Ablation System E Series (Medtronic, Dublin, Ireland) for a duration of 12 min. For MWA, an Emprint microwave antenna with the Emprint™ Ablation System with Thermosphere™ Technology (Medtronic, Dublin, Ireland) was used at 100 W, in the range of 3–7 min, depending on the size of the tumor.
Image acquisition and segmentation
CECT scans in the PVP were acquired ≤3 months before ablation. Detailed information about the scanners is provided in Supplemental Material S1. The volume of interest (VOI) was defined as the liver parenchyma, excluding all visible metastases, benign lesions, and main vessels or bile ducts. To achieve this VOI, CRLM were manually segmented by a dedicated reader (FS) with a 3D slicer (version 4.8.1) (16) on each consecutive slice. The whole liver, portal vein, and hepatic arteries were automatically segmented, using Philips Intellispace Portal software (version 10.1). Next, the delineated visible benign/malignant lesions, portal vein, hepatic arteries, and the border of the liver were excluded automatically, leaving only the “clean” liver parenchyma. All segmentations were checked and adjusted, if deemed necessary, by one of four board-certified radiologists with expertise in liver imaging (EKH, EK, FGM, FI).
Radiomics workflow
The feature extraction pipeline is explained in detail in Supplemental Material S2 and was in line with the pipeline of a previously published model by the authors (13). In that study, a ML model was constructed that could accurately predict new CRLM at diagnosis (area under the curve [AUC] = 0.86). Three models were constructed in this study: a radiomics model (based on radiomics features only); a clinical model (based on clinical variables only); and a combined model (radiomics + clinical features). Because the combined model was not superior to the radiomics model and no clinical features were selected by the combined model, the radiomics model was used for this study.
Radiomics feature extraction was performed in 3D, using the Pyradiomics package (version 2.2) (17) in Python (version 3.7). In total, 1767 radiomics features were extracted for each patient. For the validation of this model (13) on our dataset, which will be further referred to as the “Original model,” 101 radiomics features were used that were stable across different hospitals and not highly intercorrelated in the previous multicenter study publication (13). To account for the differences between the Original and the current (ablation) patient cohort, sub-analyses were performed. In addition, three alternative ML models were newly designed: a Radiomics model (radiomics features only); a Clinical model (clinical variables only); and a Combined model (combination of radiomics and clinical features). For the development of these new ML models, the dataset was split into training and validation set, randomly taking into account the percentage of patients from each scanner. For the Clinical and Combined model, the following clinical variables were considered: age; sex; cT stage; cN stage; primary tumor location; ablation modality (RFA/MWA); metachronous (disease-free interval [DFI] >12 months)/synchronous (DFI <12 months) CRLM; carcinoembryonic antigen (CEA) level; number of CRLM; thermal ablation combined with hepatic resection; administration of neoadjuvant chemotherapy (≤3 months before ablation); and adjuvant chemotherapy (administered ≤2 years after ablation, before occurrence of new CRLM). Categorical variables were converted to numerical variables (vectors of zeros and ones). The cT/cN stage and/or CEA level were not available and not retrievable for 3 and 18 patients, respectively. Multivariate imputation by chained equations was used to impute these missing values (18).
Two steps of unsupervised feature reduction were applied before training of the Radiomics and Combined models. First, a Kruskal–Wallis test was applied to remove unstable radiomics features across scanners. Next, highly correlated features were removed by pairwise correlation of 0.90 or higher; for each pair of features F1 and F2, the one with the largest mean absolute correlation was removed.
ML models were trained by a three-step pipeline. First, features were standardized with zero mean and unit variance. Next, wrapper feature selection was used to select a subset of features that are most relevant to the classification model. In the third step a random forest (RF) was designed to build a classification model. Bayesian hyperparameter optimization was used to tune the hyperparameters in the training set (19). The radiomics workflow is presented in Fig. 1. The hyperparameters that were used in the ML pipeline are explained in Supplemental Material S3. Models were built using the ML Python (version 3.7) libraries scikit-learn (0.19.2).

Overview of the radiomics machine-learning workflow. (a) Semi-automatic segmentation of healthy liver parenchyma on contrast-enhanced computed tomography. (b) Features extraction with and without filtering (Laplacian of Gaussian [LoG] and non-linearity). (c) Application of the Original model. (d) Feature selection and development of new machine-learning models with random forest.
All three predictive models were trained on the training set via fivefold cross-validation to tune a hyperparameter for 500 iterations. The best combination of hyperparameters that led to the highest predictive performance (i.e. AUC) was selected as the final model. This model was validated on the test set. The 95% confidence intervals (CI) were estimated via bootstrapping 1000 times.
Results
Several methods and hypotheses were explored to find the best predictive model as visualized in Fig. 2.

The hypotheses were investigated to find the best model to predict the development of CRLM in the ablation cohort. AUC, area under the curve; CRLM, colorectal liver metastases; H, hypothesis; ML, machine-learning model.
Application of an existing machine learning model to predict new metastases
As a first step, the Original model was applied to predict new CRLM six months after ablation, for which the patient cohort was divided into 66 patients without new CRLM in the liver during the six months of follow-up and 28 patients who developed new CRLM within six months (Table 1) (13). Application of the Original model yielded a low predictive performance for the development of new CRLM: AUC = 0.57 (95% CI = 0.56–0.58).
Patient characteristics for each dataset for the prediction of new CRLM at six months.
Values are given as n (%), mean ± SD, or median (IQR). For categorical variables actual numbers are reported (A/B/A). Comparison between the training and validation sets was computed using the chi-square test (categorical variables) or Wilcoxon rank sum test (continuous variables).
CEA, carcinoembryonic antigen; CRLM, colorectal liver metastases; MWA, microwave ablation; NA, not available; RFA, radiofrequency ablation.
Several hypotheses were explored to explain the poor performance of the model. First, the analyses were repeated for prediction of CRLM after 24 months instead of 6 months (as the Original model was initially developed for the prediction of CRLM at 24 months (13)), for which the patient cohort was divided into 28 patients without new CRLM during 24 months of follow-up and 54 patients who developed new CRLM within 24 months (Table 2). The performance at 24 months was also low: AUC = 0.52 (95% CI =0.51–0.53).
Patient characteristics for each dataset for the prediction of new CRLM at 24 months.
Values are given as n (%), mean ± SD, or median (IQR). For categorical variables actual numbers are reported (A/B/A). Comparison between the training and validation sets was computed using the chi-square test (categorical variables) or Wilcoxon rank sum test (continuous variables).
CEA, carcinoembryonic antigen; CRLM, colorectal liver metastases; MWA, microwave ablation; NA, not available; RFA, radiofrequency ablation.
Next, it was evaluated whether the poor performance of the Original model was related to the administration of neoadjuvant chemotherapy, because no patients received neoadjuvant treatment in the Original model cohort. Therefore, the Original model was applied on subsets of patients (59 for the outcome CRLM after 6 months, 45 for the outcome CRLM after 24 months) who did not receive any neoadjuvant chemotherapy before ablation, again yielding a low performance to predict new CRLM at both 6 months and 24 months (AUC = 0.56; 95% CI = 0.54–0.57 and AUC = 0.53; 95% CI = 0.52–0.54, respectively).
The three features that were selected as most predictive in the Original model (13) were compared between patients who developed CRLM within 6 months and patients who remained disease-free for 24 months. No significant differences were found between these patient groups, as tested with the Mann–Whitney U-test (Supplementary Table S5).
Development of new machine-learning models
The new ML models were developed using the same pipeline that was used to develop the Original model. There were no significant differences in clinical variables between the patients in the training and validation sets (Tables 1 and 2).
Of 1676 radiomics features, 753 (45%) showed no confounding related to the scanner variations, of which only 187 (25%) radiomics features had correlation coefficients <0.9. These 187 features were considered stable and available for selection by the radiomics model. Further details, regarding the listed hyperparameters and tuned hyperparameters for each model, are provided in Supplementary Tables S1–S5. Table 3 describes the model performance for the Radiomics, Clinical, and Combined models, respectively. For the prediction of CRLM within six months, the Combined model had slightly better performance (AUC = 0.60; 95% CI = 0.59–0.61) compared to the Radiomics and Clinical models (AUC = 0.57; 95% CI = 0.56–0.58 and AUC = 0.55; 95% CI = 0.53–0.54, respectively). For the prediction of new CRLM at 24 months, all three models had a low performance in the validation set (AUC = 0.52–0.53) (Table 3).
Performance of each predictive model.
Values are given as AUC (95% CI).
AUC, area under the curve; CI, confidence interval.
Discussion
The present study evaluated the predictive value of a radiomics ML model applied to baseline CT images acquired before ablation to predict the development of new CRLM during short-term (≤6 months) and long-term (≤24 months) follow-ups after thermal ablation. Both a previously published model (13) and newly developed ML models based on radiomics features derived from CT of the healthy liver parenchyma on pre-ablation scans were unable to predict which patients would develop new CRLM.
The previously published model yielded a high performance for the prediction of new CRLM <24 months of follow-up in patients with non-metastasized CRC at diagnosis (13). It seems that this promising ML model is not able to predict new CRLM in patients who already have CRLM and are planned for thermal ablation. This finding is further supported by the poor performance yielded in the subgroup analyses, which accounted for differences in the patient population between the Original model and current cohort (i.e. previous treatment, short-term vs. long-term development of CRLM, presence of CRLM in the liver). It was hypothesized that the presence of CRLM may potentially change the whole liver parenchyma and as such the Original model that was developed in “clean” liver parenchyma might not perform well in the ablation cohort with CRLM. Since the existing model yielded a poor performance, new ML models were developed to specifically predict new CRLM in patients with CRLM who will undergo thermal ablation. However, even these ML models had a poor performance for both the prediction of CRLM during short-term (≤6 months) and long-term (≤24 months) follow-ups.
Since no accurate ML model could be developed, it was hypothesized that the presence of CRLM affects the whole liver texture (or micro-environment) permanently. In other words, the healthy liver parenchyma is permanently changed as soon as patients have CRLM in situ, regardless of the development of new metastases in the liver. This hypothesis is supported by the absence of differences in radiomics features that were selected by the Original model, between patients with new CRLM <6 months and patients without new CRLM >24 months—the two patient groups who are expected to have the largest difference in liver parenchyma. This finding suggests there are no actual differences in the liver parenchyma on CT detectable by a radiomics. This hypothesis is supported by two studies that reported significant differences in the liver texture between patients with and without visible liver metastases (20,21). Even though this is circumstantial evidence, it could explain why the Original model and the new ML models were unable to distinguish patients that would develop new metastases and patients that would remain disease-free.
It is striking that our clinical model yielded such a poor performance for the prediction of new CRLM, while in the previous years a clinical risk score (CRS) was constructed to select patients who may benefit from thermal ablation (7). This CRS was a significant predictor of local tumor progression-free survival (LTPFS) and OS, but the development of new CRLM specifically was not studied. Some of the factors included in the ablation CRS were also considered in our Clinical model (cN status, DFI, CEA level), yet none was selected as a relevant predictor. Furthermore, the ablation CRS includes large CRLM as a predictive variable (i.e. size of largest tumour >3 cm), while the current CIRSE guidelines do not recommend ablation for CRLM >3 cm and these were, therefore, excluded from our population. Thus, while the other clinical variables may be predictive factors for LTPFS and OS based on the study by Shady et al. (7), our study suggests that these variables are less valuable for assessing the risk of new CRLM after ablation.
The fact that the radiomics ML models have a low performance could also be explained by other factors that have not been explored in this study. It is possible that CT imaging is not the most suitable modality to provide accurate information about the liver parenchyma. It is known that MRI has a higher sensitivity for the detection of CRLM compared to CT (22), which could also mean that MRI might perform better for the assessment of liver parenchyma with radiomics. So, to further explore our hypothesis on changed liver parenchyma due to the presence of CRLM an MRI-based radiomics model should be studied.
The aforementioned considerations point us in the direction that radiomics models may not only be tumor-specific, but also highly specific for the patient population under investigation (including their disease and treatment history) and outcomes studied. This aspect might be overlooked because negative results are rarely published in the field of radiomics. The majority of published studies contain a positive result (98%), while there are a few negative results published, as reported by Song et al. (23), which highlight publication bias (24). Negative studies can provide important insight into what fields may be less promising and encourage researchers to focus research on different paths.
The present study has some limitations. It is a retrospective single-center study with a relatively small sample size and multiple CT scanners. However, the effect of multiple CT scanners was minimized by using the preprocessing step before features extraction and using stable radiomics features across scanners (25,26). Moreover, it was not possible to add details for ablation techniques (i.e. inter-procedural variability), due to the retrospective nature of study and long inclusion period (i.e. 10 years). Ablation techniques and follow-up have improved over time, and this might have some influence on the results, even though the influence on the development of new CRLM outside the ablation zone is expected to be low.
In conclusion, this study shows that both an existing radiomics ML model, developed for the prediction of metachronous CRLM in patients at primary staging for CRC (13), and newly developed ML models, based on healthy liver parenchyma on pre-treatment CECT, were unable to predict new CRLM during follow-up in patients who underwent thermal ablation. Furthermore, it underlines the fact that radiomics ML models may be very population-specific and validation is key to assess its usefulness in clinical practice. Future studies may focus on further assessment of changes in the liver parenchyma in metastasized patients and other imaging modalities (e.g. MRI) or other Artificial Intelligence techniques (including deep learning), to clarify whether radiomics can be helpful in predicting new CRLM in patients with metastatic CRC.
Supplemental Material
sj-pdf-1-acr-10.1177_02841851211060437 - Supplemental material for CT radiomics models are unable to predict new liver metastasis after successful thermal ablation of colorectal liver metastases
Supplemental material, sj-pdf-1-acr-10.1177_02841851211060437 for CT radiomics models are unable to predict new liver metastasis after successful thermal ablation of colorectal liver metastases by Marjaneh Taghavi, Femke CR Staal, Rita Simões, Eun K Hong, Doenja MJ Lambregts, Uulke A van der Heide, Regina GH Beets-Tan and Monique Maas in Acta Radiologica
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
