Deep survival learning for prognosis prediction in non-metastatic castration-resistant prostate cancer

Abstract

Background

Non-metastatic, castration-resistant prostate cancer (nmCRPC) is an advanced state of prostate cancer with variable prognosis; early identification of patient risk is crucial, so that clinicians can recommend optimal treatment.

Objective

Compare predictive models in identifying patient risk; evaluate the value of electronic healthcare record (EHR) time-series (TS) information in prediction.

Methods

We evaluated SurvTRACE, Weibull Time to Event Recurrent Neural Network (WTTE-RNN), and traditional Cox proportional hazards (CPH) models’ performance on EHR data from 12,819 nmCRPC patients in the Veterans Health Administration, using area under the receiver operating characteristic curve and Brier score.

Results

WTTE-RNN, which intrinsically uses EHR TS information, outperformed the other models without TS information. Feature-engineered TS information improved performances of CPH and especially SurvTRACE; with TS information, SurvTRACE outperformed WTTE-RNN.

Conclusion

Deep learning methods, whether intrinsically able to handle TS data or enhanced with TS information, can outperform traditional survival analysis in predicting risk.

Introduction

Prostate cancer (PC) is the second most common cancer in men, and accounts for a large proportion of all cancer-related deaths worldwide.^1,2 PC patients who stop responding to androgen deprivation therapy (ADT)—referred to as castration resistance—but have no evidence of metastatic disease on radiographic imaging are classified as having non-metastatic, castration-resistant PC (nmCRPC).³ These patients are typically of advanced age, have chronic comorbidities, and generally face high risk of developing metastasis and morbidity.⁴ The prognosis for nmCRPC patients can vary, and early intervention and effective treatment are crucial, especially for individuals with a higher risk of metastatic disease and mortality. Early identification of high-risk patients could help clinicians adjust treatment plans and thus prolong patients’ progression-free survival.

The most common clinical guideline for identifying high-risk nmCRPC patients comes from tracking prostate-specific antigen doubling time (PSADT),⁵ which does not account for other patient characteristics. In practice, clinicians lack a standard tool for calculating PSADT, and often must make a rough estimate, rendering PSADT a less-than-ideal criterion for identifying risk. Electronic healthcare record (EHR) data are a rich source of information on patient characteristics, and individual responses to treatment. However, EHR data are also complex, with a high degree of missingness, irregular intervals of measurement, and heterogeneity among patients in quantity and quality of information collected. While risk classification tools for PC have been developed, they have not fully explored optimal methods of summarizing and standardizing EHR time series. Potential advantages of models capable of directly intaking time-series data for accurate risk prediction for nmCRPC patients versus models reliant on summaries of EHR data remain unclear.

A comparison of PC risk classification tools⁶ found that MSKCC nomogram,⁷ a simple predictive model developed by the Memorial Sloan Kettering Cancer Center, had the best prognostic performance for predicting death in PC patients prior to starting any treatments. Ni, et al.⁸ developed a machine learning model to stratify nmCRPC patients by risk for metastasis or death based on data from the SPARTAN⁹ and ARAMIS¹⁰ clinical trials. However, models based on clinical trial data cannot be assumed to be applicable to the general PC patient population, as healthier individuals are overrepresented in clinical trials. Work exploring the optimal processing of EHR time-series data has largely been conducted in more general healthcare contexts than in PC. In Johnson et al.,¹¹ careful feature engineering of time-series data greatly improved the predictive performance of logistic regression and outperformed a long short-term memory neural network directly processing time-series data. This paper focused on a general prediction task of in-hospital mortality and validated results using the MIMIC-III dataset. Similarly, Wu et al.¹² fit logistic regression with summarized time-series features that outperformed a long short-term memory model in predicting vasopressor onset in ICU units also trained with MIMIC-III data. These studies applied only simulated datasets, and the models’ performance on real-world EHR data is unknown. Another study compared performance in prediction of systemic lupus erythematosus in 925 patients using an ensemble machine learning approach with feature-engineered time-series data versus a long short-term memory model with raw time-series data, concluding that each approach had strengths and limitations.¹³ A study focusing on PC showed that a cross-sectional DL survival model performed better using feature-engineered time-series data to predict a composite outcome of adverse events for PC patients, while a DL survival model that automatically processed time-series data performed better at predicting PC mortality, though the differences in performance were not significant for either outcome.¹⁴ This study focused generally on PC patients, resulting in a large cohort size of 110,000 patients, which is an easier predictive problem compared to risk prediction in nmCRPC, where fewer patients have less heterogeneity.

We sought to develop a model using data from the Veterans Health Administration (VHA) to predict patients’ prognoses automatically to identify high-risk patients and thus facilitate closer surveillance and more tailored treatment plans. We also investigated whether time-series information helps improve prediction accuracy; whether being able to process longitudinal data directly confers an advantage in predictive performance; and whether feature engineering can be used to overcome limitations in models that do not traditionally handle temporal data.

The VHA is the largest integrated healthcare system in the US and contains the biggest cohort of nmCRPC patients. However, there are still patients lost to follow-up, or censored. To address censoring, we implemented models designed for survival analysis. Cox proportional hazards (CPH) model is one of the most popular models for risk prediction and stratification in survival contexts.^15,16 However, the CPH model must satisfy linearity and proportional hazards assumptions, as well as requiring domain-specific knowledge to account for interactions between variables.¹⁷ Deep learning (DL) survival models have been developed as an alternative to traditional statistical methods like CPH, and are not subject to the same limitations.¹⁸ These models have been successfully used in risk prediction contexts for PC patients,^14,19 making them an attractive choice for our comparison.

DeepSurv,²⁰ DeepHit,²¹ and deep proportional hazards models are all popular architectures for deep survival analysis. DeepSurv and the broader family of deep proportional hazards all make the same proportional hazards assumption as the traditional Cox proportional hazards (CPH) model, but allow for non-linear relationships between event risk and covariates. DeepHit directly learns the survival distribution by discretizing time into distinct intervals and predicting the event probability for each interval. All these models are designed to intake tabular data, but versions leveraging recurrent neural networks (RNNs) exist for each of them that can directly process longitudinal data. However, for DeepSurv and other similar models, the proportional hazards assumption is flawed and can lead to biased estimates. Multiple alternative architectures that do not rely on this assumption exist, including DeepHit, Weibull Time to Event Recurrent Neural Network²² (WTTE-RNN), and SurvTRACE,¹⁸ a transformer-based neural network for survival contexts. WTTE-RNN is capable of directly processing temporal data, whereas SurvTRACE requires tabular data. Previous work has shown success using WTTE-RNN to predict survival outcomes for nmCRPC patients.²³ SurvTRACE has been found to outperform DeepHit and DeepSurv¹⁸ as well as being successfully used in risk stratification and prediction of recurrent cardiovascular events for patients with ischemic heart disease.²⁴ Given the success of SurvTRACE over DeepHit and DeepSurv, we included it in our comparison. We also included WTTE-RNN due to its ability to intake time-series data and proven success with treatment recommendation for nmCRPC patients²³ as well as a regularized CPH model to establish a baseline of performance.

Using SurvTRACE, WTTE-RNN, and regularized CPH, we compared predictive performance in risk stratification for nmCRPC patients resulting from different methods for handling time-series data.

Methods

This was a retrospective cohort study designed to develop and compare machine learning models for predicting time to metastasis or all-cause mortality in a nationwide cohort of U.S. Veterans with nmCRPC. Using data from the Department of Veterans Affairs (VA) health system, we trained and evaluated several survival analysis models, including a regularized CPH model and two DL models (SurvTRACE and WTTE-RNN).

Study cohort

Using data from the VA Cancer Registry System and pharmacy dispensation records from the VA Corporate Data Warehouse, we identified a nationwide cohort of 13,557 patients diagnosed with PC from January 1, 2006, through December 31, 2019, who later developed nmCRPC. Three months post-nmCRPC was selected as a landmark date, and patients who developed metastatic disease or died within the first 3 months post-nmCRPC were excluded. We also excluded 173 patients whose PC diagnosis date could not reliably be determined. Figure 1 provides the patient cohort flow diagram. The final cohort consisted of 12,819 nmCRPC patients.

Figure 1.

Study cohort.

Data processing

Features

We characterized features as either static or time-varying. Static features included race, time from PC diagnosis to nmCRPC, Charlson comorbidity index (CCI) 6 months prior to nmCRPC date, as well as age, body mass index (BMI), and Gleason score at the time of nmCRPC diagnosis. Time-varying features included prostate-specific antigen (PSA), number of days from PC diagnosis to the record time, treatment, and an nmCRPC status indicator. We only included measurements of time-varying features recorded between PC diagnosis and initiation of a first line of treatment or the landmark date, if no treatments were initiated prior to the landmark date.

Feature engineering was used to condense time-varying features to a single row per patient. Longitudinal PSA values were summarized by the minimum, maximum, median, and slope. Treatments were summarized by treatment type and duration. Further details are provided in the appendix.

Data split

Data were split into 10 folds, with each fold generated by randomly sampling without replacement and stratified by treatments patients received over time, ensuring the treatment distribution in each fold reflected the original dataset. Models were trained and evaluated using 10-fold cross-validation based on these splits. For each iteration of the cross-validation, 8 folds were used for training, 1 for testing, and 1 for validation. Repeating for 10 iterations resulted in 10 validation sets. Performance was evaluated by averaging metrics over these validation sets with confidence intervals generated using bootstrapping. Further details around model configuration are included in the appendix.

Missing data imputation

We imputed missing data for static features using the mean values within each fold of the data split. For time-varying features, the missing values were filled in using the latest available non-missing value prior to the missing value. The missing values at the first visit were imputed with the mean first values within each fold. We adopted this imputation approach to avoid temporal information leakage and to reduce information leakage from the testing set to the training set.

Models

The outcome of interest was time to metastatic PC, or all-cause mortality. Our prediction models generated progression free survival curves for each individual. The baseline time was taken to be either the initiation of first-line treatment post-nmCRPC diagnosis, or the landmark date, if no treatment was initiated within 3 months of nmCRPC diagnosis. Metastasis was ascertained from clinical documents using natural language processing techniques as described in²⁵ and²⁶. Models incapable of directly processing time-varying data were trained both with and without summaries of time-varying features, as shown in Figure 2.

Figure 2.

Data pipelines for compared models. (A) Displays the pipeline for Weibull Time to Event Recurrent Neural Network (WTTE-RNN) that directly ingests time series data and static data; (B) displays the pipeline for Regularized Cox Proportional Hazards (Regularized CPH) and SurvTRACE that ingest static data; (C) displays the pipeline for time series (TS) Regularized CPH and TS SurvTRACE that ingest summaries of time series data and static data. All models output progression free survival curves.

Regularized CPH

We used an elastic net²⁷ which is a combination of ridge²⁸ and lasso²⁹ regularization for CPH model regularization. The model performs automatic feature selection by setting some coefficients to zero according to the elastic net.³⁰ Regularized CPH is incapable of handling time-varying data; thus, we refer to the regularized CPH model trained using summaries of time-series data as TS regularized CPH.

SurvTRACE

SurvTRACE is a transformer-based model that encodes each feature in a low-dimensional embedding and uses self-attention to account for full interactions between features.¹⁸ The main architecture includes a baseline covariate embedding module, a deep-stacked attentive encoder module, and an alignment and subnetwork prediction module. Categorical variables and numerical variables are embedded and concatenated to represent features. Multi-head self-attention is used to enable sufficient interactions between covariate embeddings. For single-event survival analysis, the loss function is defined as the piecewise constant hazard loss proposed by Kvamme et al.³¹ Like regularized CPH, SurvTRACE cannot process time-series data. We refer to the SurvTRACE model trained using summaries of time-series data as TS SurvTRACE.

WTTE-RNN

Weibull Time to Event Recurrent Neural Network (WTTE-RNN) is a DL prediction model that incorporates survival analysis. It discretizes time into steps, with time to the next event assumed to follow a Weibull distribution. An RNN is used to learn the scale and shape parameters of this Weibull distribution. Our implementation used a Gated Recurrent Unit (GRU) as the RNN structure. WTTE-RNN can directly process time-varying features due to its RNN architecture.

Model performance metrics

Models were evaluated based on area under the receiver operating characteristic curve (AUROC) and Brier score. AUROC is a discrimination metric that indicates how well models stratify groups based on the outcome. It ranges between 0.5 and 1, with higher values indicating better discrimination. Brier score is the mean squared difference between the predicted event rate and the observed event rate. It is commonly used for calibration and can be considered a proximity measure.³² A perfect model gives a Brier score of 0, whereas a reference model yields a Brier score of 0.25.

Results

Data characteristics

Among the cohort of 12,819 patients, 6,944 patients (54.2%) experienced death or metastasis, the events of interest, during the observation period. Of patients who had an event, 2,860 patients (22.3%) developed metastatic disease at some point during the observation period, and 6,012 (46.9%) died during the observation period. The median progression-free survival time was 2.4 years (interquartile range [IQR]: 1–4.5 years). The median overall survival time was 3.6 years (IQR: 1.9–6.3 years). The baseline characteristics of the study cohort are summarized in Table 1. The median number of records per patient was 10 (IQR: 6–15 records) and the maximum number of records per patient was 75.

Table 1.

Baseline characteristics.

Patients (N = 12,819)
Self-identified race/ethnicity, n (%)
Hispanic	812 (6.3)
Non-Hispanic Black	3,628 (28.3)
Non-Hispanic White	7,220 (56.3)
Other	1,159 (9.0)
Age at nmCRPC, median [IQR] ¹	72 [66–80]
Treatment ² , n (%)
Abiraterone	215 (1.7)
Androgen deprivation therapy	7,774 (60.6)
Bicalutamide	3,050 (23.8)
Enzalutamide	158 (1.2)
Finasteride	400 (3.1)
Flutamide	128 (1.0)
Others	80 (0.6)
No treatment	1,014 (7.9)
Body mass index, mean (sd ³ )	29.4 (6.0)
Missing, n	15
Gleason Score, n (%)
$\leq$ 6	1862 (14.5)
7	4246 (33.1)
8	2176 (17.0)
9	2196 (17.1)
10	124 (1.0)
Missing	2,215 (17.3)
PSADT ⁴ ≤ 10 months, n (%)
0	3,295 (60.1)
1	2,187 (39.9)
Missing	7,337 (57.2)
Metastasis, n (%)
0	9,959 (77.7)
1	2,860 (22.3)
Death, n (%)
0	6,807 (53.1)
1	6,012 (46.9)

¹IQR: interquartile range.

²Treatment initiated within 3 months of date of non-metastatic, castration-resistant status.

³Standard deviation.

⁴Prostate specific antigen doubling time.

Model performance

Model performance is summarized in Table 2. TS SurvTRACE had the best calibration with the lowest Brier score at both the 1-year and 2-year milestones after nmCRPC date. Calibration plots are provided in the appendix. WTTE-RNN had similar performance to TS SurvTRACE and achieved slightly better discrimination than TS SurvTRACE at the 2-year milestone, with a higher AUROC, though the difference was not significant.

Table 2.

Model performance.

Model	Metrics at 1 year		Metrics at 2 years
Model	AUROC¹ (95% CI²)	Brier score (95% CI)	AUROC (95% CI)	Brier score (95% CI)
Regularized CPH³	0.639 (0.638, 0.641)	0.134 (0.133, 0.134)	0.654 (0.653, 0.655)	0.196 (0.196, 0.197)
TS⁴ Regularized CPH	0.648 (0.646, 0.649)	0.133 (0.132, 0.133)	0.663 (0.662, 0.664)	0.195 (0.194, 0.195)
SurvTRACE	0.688 (0.687, 0.689)	0.129 (0.128, 0.129)	0.689 (0.689, 0.690)	0.188 (0.187, 0.188)
TS SurvTRACE	0.712 (0.710, 0.713)	0.125 (0.124, 0.125)	0.723 (0.722, 0.724)	0.180 (0.180, 0.181)
WTTE-RNN⁵	0.702 (0.701, 0.703)	0.126 (0.126, 0.127)	0.724 (0.723, 0.725)	0.186 (0.185, 0.186)

¹Area under the receiver operating characteristic curve.

²Confidence interval.

³Cox Proportional Hazards.

⁴Time Series.

⁵Weibull Time to Event Recurrent Neural Network.

TS regularized CPH outperformed regularized CPH, and TS SurvTRACE outperformed SurvTRACE. SurvTRACE received a greater benefit from time-series summaries compared with regularized CPH.

Discussion

We developed a risk prediction approach using structured EHR data from a nationwide cohort of Veterans with nmCRPC and explored the contribution of time-series data to the predictive performance of survival models. Our findings indicate the inclusion of time-varying features significantly enhances the predictive performance of survival models compared with using static features alone. While WTTE-RNN can intrinsically use time-series data, TS SurvTRACE achieved superior performance in Brier scores at the 1- and 2-year milestones. This suggests the capacity to process time-series data does not necessarily offer an advantage over creating summaries of time-series data. Specifically, summarized approaches like TS SurvTRACE may offer better model calibration, whereas automated models like WTTE-RNN may provide marginal gains in discrimination as shown by its higher AUROC over longer follow-up periods. Consequently, the optimal choice of architecture depends on whether a clinician prioritizes precise calibration or long-term risk discrimination.

Our work demonstrates the capacity of DL models such as WTTE-RNN or SurvTRACE to provide more individualized guides than PSADT for clinicians when recommending treatment. Thus, the practical impact of this work lies in its potential to refine treatment selection and guide therapy timing in real-world practice, and inform risk stratification in clinical trials. For instance, by integrating our models with EHR, a clinical decision support system (CDSS) could provide clinicians with real-time, personalized risk predictions. This would enable more nuanced decision-making—escalating therapy for high-risk patients while avoiding overtreatment for those at low risk.⁸

Limitations and future work

This work has several limitations. One key challenge was the significant missingness in certain features, notably Gleason score, which may have impacted model performance. Although PSADT also had a high degree of missingness, this is common in EHR data and was a motivating factor for our approach. Consequently, we did not include PSADT as a predictive feature. Our primary focus was to leverage the richer information available in the PSA time series directly. PSADT is presented in the baseline characteristics table only to provide a conventional and more interpretable summary of patient PSA kinetics.

While we provide foundational results for developing a CDSS for managing nmCRPC, more work is required prior to implementation. Our models relied exclusively on structured EHR data for prediction; future work will involve incorporating unstructured clinical notes, which is expected to enhance model performance. Furthermore, for these models to be clinically useful, their predictions must be interpretable. The “black box” nature of complex models is a significant barrier to clinician trust and adoption. A current limitation is a lack of transparency in decision-making processes for models such as SurvTRACE and WTTE-RNN, making them less desirable to clinicians. Explainable machine learning is an active area of research³³ and multiple explainers exist that can offer insight to model decisions such as SHAP³⁴ and LIME.³⁵ SHAP values quantify each feature’s marginal contribution to an individual patient’s predicted risk and support personalized, case-level explanations, while permutation feature importance estimates a feature’s global relevance by measuring the degradation in predictive performance when its values are randomly shuffled. Partial dependence plots and individual conditional expectation curves can further characterize how predicted risk varies with individual features such as PSA slope, Gleason score, or treatment type. For WTTE-RNN, time-step attributions derived from integrated gradients or attention weights can highlight which points in a patient’s PSA trajectory drove a prediction; for the transformer-based SurvTRACE, self-attention weights over covariate embeddings offer a native path to explanation. Translating these technical outputs into clinician-facing explanations—for example, by surfacing the top contributors to an individual’s risk estimate within the EHR interface—will be essential to support clinical trust and appropriate use.³⁶ In a companion study currently under review, we benchmark interpretable or “glass-box” methods against black box models using the same cohort of prostate cancer patients.

Next, successful integration into clinical practice is a complex task. A CDSS must be seamlessly embedded within existing EHR workflows to avoid disrupting care or increasing clinician workload. This involves not only technical interoperability but also careful consideration of human factors to prevent issues like alert fatigue and over reliance on automation.³⁷

Finally, the lack of external validation reducing the generalizability of our findings beyond the VHA is a limitation that must be addressed. Therefore, the next steps toward real-world application include rigorous external validation on diverse, multi-institutional datasets to ensure model generalizability and fairness, followed by prospective clinical studies to determine whether the CDSS improves clinical outcomes. The VHA cohort is composed almost entirely of male U.S. Veterans and differs from community-based populations in demographics, comorbidity burden, treatment access, and ascertainment of outcomes. Consequently, both the absolute performance and the relative ordering of models may shift when they are applied to other healthcare systems, and site-specific differences in laboratory platforms, PSA measurement cadence, diagnostic coding practices, and unstructured documentation of metastasis could all induce measurable data drift. External validation on diverse, multi-institutional datasets—ideally through federated evaluations across academic medical centers, community oncology networks, and large claims-linked EHR consortia—combined with fairness audits across racial, age, and comorbidity strata, will be necessary before deployment. This translational process will require a collaborative effort between data scientists, clinicians, and informatics experts to create a tool that is accurate, interpretable, and meaningfully integrated into the fabric of patient care.³⁸ However, until such validation is completed, the results reported here should be interpreted as evidence of the relative value of time-series representations within a single integrated health system rather than as a deployable risk model.

Despite limitations, this work provides essential groundwork for implementing a tool to help clinicians in their treatment decisions for patients with nmCRPC, offering valuable insight into which model architectures and features are effective for identifying high-risk nmCRPC patients.

Conclusion

DL survival methods are useful in predicting individual nmCRPC patients’ risk, demonstrating superior performance compared to the traditional CPH survival analysis approach. Prediction using DL on EHR data should leverage its time-series nature, either via models that can intrinsically utilize time-series information or through careful engineering of time-series features.

Supplemental material

Supplemental material - Deep survival learning for prognosis prediction in non-metastatic castration-resistant prostate cancer

Supplemental material for Deep survival learning for prognosis prediction in non-metastatic castration-resistant prostate cancer by Chunyang Li, Julia Bohman, Vikas Patil, Richard Mcshinsky, Christina Yong, Zach Burningham and Ahmad Halwani in Health Informatics Journal.

Footnotes

Acknowledgments

We would like to thank Dr. Siamack Ayandeh for the creation of the analytics study mart environment and the Veterans Health Administration Office of Research and Development for funding cloud credits.

ORCID iD

Julia Bohman

Ethical considerations

The University of Utah Institutional Review Board approved this study under IRB_00129907.

Consent to participate

The requirement for informed consent was waived by the University of Utah Institutional Review Board.

Author Contributions

Conceptualization, Chunyang Li, Ahmad Halwani, Zach Burningham, Julia Bohman; methodology, Chunyang Li; formal analysis, Chunyang Li; data curation, Vikas Patil, Richard McShinsky; writing—original draft preparation, Chunyang Li, Julia Bohman; writing—review and editing, Chunyang Li, Julia Bohman, Christina Yong; supervision, Ahmad Halwani, Zach Burningham.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Data include PHI and are not available. Detailed information on model tuning and implementation is available upon request.*

Supplemental material

Supplemental material for this article is available online.

References

Bergengren

Pekala

Matsoukas

, et al. 2022 update on prostate cancer epidemiology and risk factors—a systematic review. European urology 2023; 84: 191–206. https://doi.org/10.1016/j.eururo.2023.04.021

Gandaglia

Leni

Bray

, et al. Epidemiology and prevention of prostate cancer. European urology oncology 2021; 4: 877–892. https://doi.org/10.1016/j.euo.2021.09.006

Halwani

Patil

Morreall

, et al. Real-world treatment patterns among veterans with nonmetastatic castration-resistant prostate cancer (nmCRPC). American Society of Clinical Oncology, 2022.

Wang

Y-J

Tseng

C-S

Huang

C-Y

, et al. Prognostic Outcomes and Predictive Factors in Non-Metastatic Castration-Resistant Prostate Cancer Patients Not Treated with Second-Generation Antiandrogens. Biomedicines 2024; 12: 2275. https://doi.org/10.3390/biomedicines12102275

Berruti

Bracarda

Caffo

, et al. nmCRPC, a look in the continuous care of prostate cancer patients: state of art and future perspectives. Cancer Treatment Reviews 2023; 115: 102525. https://doi.org/10.1016/j.ctrv.2023.102525

Zelic

Garmo

Zugna

, et al. Predicting prostate cancer death with different pretreatment risk stratification tools: a head-to-head comparison in a nationwide cohort study. European urology 2020; 77: 180–188. https://doi.org/10.1016/j.eururo.2019.09.027

Kattan

Eastham

Stapleton

, et al. A preoperative nomogram for disease recurrence following radical prostatectomy for prostate cancer. JNCI: Journal of the National Cancer Institute 1998; 90: 766–771. https://doi.org/10.1093/jnci/90.10.766

Wang

, et al. Development and validation of a machine learning-based risk model for metastatic disease in nmCRPC patients: a tumor marker prognostic study. International Journal of Surgery 2025; 111: 3331–3341. https://doi.org/10.1097/JS9.0000000000002321

Small

Saad

Chowdhury

, et al. SPARTAN, a phase 3 double-blind, randomized study of apalutamide (APA) versus placebo (PBO) in patients (pts) with nonmetastatic castration-resistant prostate cancer (nmCRPC). American Society of Clinical Oncology, 2018.

10.

Fizazi

Shore

Tammela

, et al. Overall survival (OS) results of phase III ARAMIS study of darolutamide (DARO) added to androgen deprivation therapy (ADT) for nonmetastatic castration-resistant prostate cancer (nmCRPC). American Society of Clinical Oncology, 2020.

11.

Johnson

Parbhoo

Ross

, et al. Learning predictive and interpretable timeseries summaries from ICU data. AMIA Annual Symposium Proceedings, 2022, p. 581.

12.

Parbhoo

Havasi

, et al. Learning optimal summaries of clinical time-series with concept bottleneck models. Machine Learning for Healthcare Conference. PMLR, 2022, pp. 648–672.

13.

Zhao

Smith

Jorge

. Comparing two machine learning approaches in predicting lupus hospitalization using longitudinal data. Scientific Reports 2022; 12: 16424. https://doi.org/10.1038/s41598-022-20845-w

14.

Dai

Park

Yoo

, et al. Survival analysis of localized prostate cancer with deep learning. Scientific reports 2022; 12: 17821. https://doi.org/10.1038/s41598-022-22118-y

15.

McLernon

Giardiello

Van Calster

, et al. Assessing performance and clinical usefulness in prediction models with survival outcomes: practical guidance for Cox proportional hazards models. Annals of internal medicine 2023; 176: 105–114. https://doi.org/10.7326/M22-0844

16.

Kim

Park

, et al. Impact of high dose radiotherapy for breast tumor in locoregionally uncontrolled stage IV breast cancer: a need for a risk-stratified approach. Radiation Oncology 2023; 18: 168. https://doi.org/10.1186/s13014-023-02357-7

17.

Patil

Rasmussen

, et al. Predicting survival in veterans with follicular lymphoma using structured electronic health record information and machine learning. International journal of environmental research and public health 2021; 18: 2679. https://doi.org/10.3390/ijerph18052679

18.

Wang

Sun

. Survtrace: Transformers for survival analysis with competing events. Proceedings of the 13th ACM international conference on bioinformatics, computational biology and health informatics, 2022, pp. 1–9.

19.

Roffman

Hart

Leapman

, et al. Development and validation of a multiparameterized artificial neural network for prostate cancer risk prediction and stratification. JCO Clinical Cancer Informatics 2018; 2: 1–10. https://doi.org/10.1200/CCI.17.00119

20.

Katzman

Shaham

Cloninger

, et al. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC medical research methodology 2018; 18: 24. https://doi.org/10.1186/s12874-018-0482-1

21.

Lee

Zame

Yoon

, et al. Deephit: A deep learning approach to survival analysis with competing risks. Proceedings of the AAAI conference on artificial intelligence, 2018.

22.

Martinsson

. WTTE-RNN: Weibull time to event recurrent neural network a model for sequential prediction of time-to-event in the case of discrete or continuous censored data, recurrent events or time-varying covariates, 2017.

23.

Bohman

Patil

, et al. Deep Learning Treatment Recommendations for Patients Diagnosed with Non-Metastatic Castration-Resistant Prostate Cancer Receiving Androgen Deprivation Treatment. BioMedInformatics 2025; 5: 42. https://doi.org/10.3390/biomedinformatics5030042

24.

Shinohara

Kodera

Nagae

, et al. The potential of the transformer-based survival analysis model, SurvTrace, for predicting recurrent cardiovascular events and stratifying high-risk patients with ischemic heart disease. Plos one 2024; 19: e0304423. https://doi.org/10.1371/journal.pone.0304423

25.

Patil

Rasmussen

Morreall

, et al. RWD140 Using Machine Learning to Identify Non-Metastatic Castration-Resistant Prostate Cancer (NMCRPC) Patients from Electronic Health Record Data. Value in Health 2022; 25: S603. https://doi.org/10.1016/j.jval.2022.04.1663

26.

Halwani

Rasmussen

Patil

, et al. Real-world practice patterns in veterans with metastatic castration-resistant prostate cancer. Urologic Oncology: Seminars and Original Investigations. Elsevier, 2020, pp. 1.e1–1.e10.

27.

Zou

Hastie

. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B: Statistical Methodology 2005; 67: 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x

28.

Verweij

Van Houwelingen

. Penalized likelihood in Cox regression. Statistics in medicine 1994; 13: 2427–2436. https://doi.org/10.1002/sim.4780132307

29.

Tibshirani

. The lasso method for variable selection in the Cox model. Statistics in medicine 1997; 16: 385–395. https://doi.org/10.1002/(sici)1097-0258(19970228)16:4<385::aid-sim380>3.0.co;2-3

30.

Sill

Hielscher

Becker

, et al. c060: Extended inference with lasso and elastic-net regularized Cox and generalized linear models. Journal of Statistical Software 2015; 62: 1–22. https://doi.org/10.18637/jss.v062.i05

31.

Kvamme

Borgan

. Continuous and discrete-time survival prediction with neural networks. Lifetime data analysis 2021; 27: 710–736. https://doi.org/10.1007/s10985-021-09532-6

32.

Haider

Hoehn

Davis

, et al. Effective ways to build and evaluate individual survival distributions. Journal of Machine Learning Research 2020; 21: 1–63.

33.

Bohman

Patil

Yong

, et al.

Predicting Metastasis, Death, or Escalation of Treatment in Non-Metastatic Castration-Resistant Prostate Cancer Patients: Do Glass Box Methods Offer an Advantage?

Under Review 2026.

34.

Lundberg

Lee

S-I

. A unified approach to interpreting model predictions. Advances in neural information processing systems 2017; 30.

35.

Ribeiro

Singh

Guestrin

. Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016, pp. 1135–1144.

36.

Yang

Rhoads

Sepulveda

, et al. Building the model: challenges and considerations of developing and implementing machine learning tools for clinical laboratory medicine practice. Archives of pathology & laboratory medicine 2023; 147: 826–836.

37.

Yan

Guo

Inoue

, et al. A roadmap to implementing machine learning in healthcare: from concept to practice. Frontiers in Digital Health 2025; 7: 1462751. https://doi.org/10.3389/fdgth.2025.1462751

38.

Subasri

Krishnan

Kore

, et al. Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models. JAMA Network Open 2025; 8: e2513685. https://doi.org/10.1001/jamanetworkopen.2025.13685

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.58 MB