Abstract
Health technology assessments of interventions impacting survival often require extrapolating current data to gain a better understanding of the interventions’ long-term benefits. Both a comprehensive examination of the trial data up to the maximum follow-up period and the fitting of parametric models are required for extrapolation. It is standard practice to visually compare the parametric curves to the Kaplan-Meier survival estimate (or comparison of hazard estimates) and to assess the parametric models using likelihood-based information criteria. In place of these two steps, this work demonstrates how to minimize the squared distance of parametric estimators to the Kaplan-Meier estimate. This is in line with the selection of the model using Mean Squared Error, with the modification that the unknown true survival is replaced by the Kaplan-Meier estimate. We would assure the internal validity of the extrapolated model and its appropriate representation of the data by adhering to this procedure. We use both simulation and real-world data with a scenario where no model that properly fits the data could be found to illustrate how this process can aid in model selection.
Introduction
Researchers frequently encounter the issue of choosing the optimal model from a pool of candidates that offers the best balance between efficiency and interpretability. The Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) are the most frequently used criteria for choosing models. While AIC and BIC have different theoretical motivations (Kuha, 2004), they are both limited to parametric models and necessitate the definition and computation of likelihood. With the possible exception of economic modeling (Gallacher et al., 2021a; Gallacher et al., 2021b; Sweeting et al., 2023) or extrapolation of clinical trail data (Soikkeli et al., 2019; Langford et al., 2022), which requires parametric methods, survival analysis is characterized by the dominance of non- or semiparametric models (Miller Jr., 1983). This is primarily because it is challenging to select parametric models that outperform the Kaplan-Meier estimator (Meier et al., 2004). The selection process includes a pre-selection of models based on subject-specific knowledge and a visual comparison of parametric survival curves to the Kaplan-Meier estimate, with the goal of selecting models with survival estimates that closely follow estimates from the Kaplan-Meier estimator and can help identify discrepancies between the model and the data (Latimer, 2013). Deviations from the observed data may indicate issues with the chosen model. The Kaplan-Meier estimator is consistent (i.e. unbiased) (Stute & Wang, 1993), thus if a parametric survival estimator closely follows the Kaplan-Meier estimate we could assume that its distribution concentrated in some neighborhood of the true value. It is necessary to quantify the proximity and concentration of the parametric survival estimate distribution around the Kaplan-Meier estimate, and there are a number of distance measurements that one might take into account. The squared distance is the most basic and widely used of these measurements. A suitable way to assess concentration for unbiased estimators is to use the variance, which is equal to the expected mean squared distance from the real value (Cramer, 1946). For biased estimators variance is built around a biased estimate; thus in itself is not a proper measure of concentration around the true value. In addition to the variance of the estimate one needs to consider the squared distance between the true and estimated value, the squared bias. Model selection based on the sum of variance and squared bias, i.e. Mean Squared Error (MSE) has a long history in statistics (Trenkler & Toutenburg, 1990) and offers an objective framework to compare nonparametric and parametric estimators (Jullum & Hjort, 2017; Jullum & Hjort, 2019). In this paper we outline the framework for selection parametric survival models based on MSE where the unknown true value is replaced by the Kaplan-Meier estimate.
Estimators of the survival function
We assume that survival times
where
and asymptotic variance
with finite sample variance given by the Greenwood variance estimator
Next, we establish the notation for the maximum likelihood parametric estimators. The subscript ml will be used to note that the survival or associated variance estimate refers to the maximum likelihood estimator. The parametric survival function is given by
As the censoring distribution
In addition we define the score function
Under mild regularity conditions the maximum likelihood estimator is consistent and converges in probability
with
where
The last we present is the covariance between the non-parametric and parametric estimate of the survival function. The derivation of the covariance (Jullum & Hjort, 2017; Nemes, 2023) is based on the influence components (Reid, 1981; Hjort, 1992) and is given by
where
The MSE for the Kaplan-Meier,
The Kaplan-Meier estimator’s consistency (Stute & Wang, 1993) assures that, as sample size increases, small sample bias quickly converges to zero. In addition with an upper bound of
If
The literature generally agrees that the estimate with the lowest MSE should be chosen; that is, we should favor the parametric estimator only in cases where
Equation (11) can be rewritten as
This equation can be factored into two components. First, we have the ratio of variances. i.e. the Asymptotic Relative Efficiency (ARE) of the two competing estimators. If both estimators are unbiased then
This performance generally refers to statistical power, or the width of approximate Wald-type confidence intervals, while preserving the nominal coverage levels. ARE can also be interpreted as an estimate of the limiting correlation (
where
is the expectation of the non-central
In summary, MSE based selection with known true parameter value selects the true model with a limit probability inverse to the ARE of the considered parametric and non-parametric estimators
While here we have used MSE as the criterion of choice, it is not the only alternative. As MSE partition in variance and bias it has easy interpretation and connection to first order approximations. Alternatives such as Mean Absolute Error (MAE) or Mean Squared Logarithmic Error (MSLE) are attractive alternatives specially if outliers are present. True, both with increased computational complexities.
While
Using the above-described corrected squared bias, the relationship between standard normal and
This limit probability is a special case for a direct comparison. i.e.
the limit probability of selecting the true parametric model does depend on the Asymptotic Relative Efficiency. The Kaplan-Meier estimator’s ARE(t) varies with follow-up time; ARE is close to 0 at the start of the follow-up and as time approaches infinity (Miller Jr., 1983; Nemes, 2023). However, at the extremes of the follow-up time, this does not favor parametric estimators. This due to the fact that the variance of the estimated bias is inversely proportional to the ARE between the parametric and nonparametric estimator,
Selection framework
While the numerical comparison of MSE for scalar parameters is straightforward; in this instance, we are comparing two time indexed functions. We propose a graphical method that plots the MSE ratio against follow-up time to help simplify this. We can conclude that the parametric model should be favored over the Kaplan-Meier estimator in intervals where this ratio is less than 1. In this data-driven approach, the model with the lowest
Naturally, the lower and upper limits of integration can be set to best suit the purpose of the study. Similarly, integration can be carried out with respect to a weight function if the study necessitates emphasis on a particular interval of the follow-up period. This can also be data-driven by concentrating on stable regions within the data set rather than attempting to achieve a good fit throughout the entire follow-up. Alternatively, one could estimate Kaplan-Meier integrals (i.e. restricted mean survival time) and then calculate the MSE (Nemes et al., 2022).

Claeskens and Hjort (Claeskens et al., 2008) identified precise tolerance limits within the local misspecification framework where a narrow parametric model outperforms a wider model in terms of
Modeling examples from the literature
Global mean squared efficiency of parametric estimators of survival compared to the Kaplan-Meier curve for overall survival males and females at the Channing House between 1964–75. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) of the parametric models is provided as comparison. Parametric estimators perform better when the global MSE ratio is less than one
Global mean squared efficiency of parametric estimators of survival compared to the Kaplan-Meier curve for overall survival males and females at the Channing House between 1964–75. The Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) of the parametric models is provided as comparison. Parametric estimators perform better when the global MSE ratio is less than one
Probability of death in the Channing House retieremnt home (
The Channing House is a retirement center in California (USA) that served 97 men and 365 women between 1964 and 1975, with and 46 men and 130 women dying (Fig. 2a). Together with the Kaplan-Meier estimator, we have fitted four parametric estimators to the data: the exponential, weibull, gamma, and log-logistic estimators (Fig. 2b & c). For both sexes, the bias was larger than the variance reduction when using the exponential survival estimator. The Weibull survival model had the best bias-variance trade-off when compared to the Kaplan-Meier estimator. (2 b & c). Together with the graphical presentation, we also calculated the global scores, AIC, BIC, and the integrated ratio MSE(t) normalized to the duration of follow-up (Table 1). The Weibull distribution had the best bias-variance trade-off compared to the Kaplan-Meier estimator, and this was evident in the comparison based on MSE. The Weibull distribution was also found by AIC to be the best fit, but BIC found that the exponential distribution was better for males. This could be the results of the conservative nature of BIC is small samples. For females, the average age of entry into the Channing House is 75.2, while for males it is 76.5. Therefore, the exponential distribution is an unlikely choice for modeling overall survival due to its constant hazard. For the Weibull distribution, the shape parameter exceeded 1 for both males (1.325) and females (1.439). This indicates a increased death rate with time, i.e. a natural aging process.
Probability of death due to Diffuse Large-B-Cell Lymphoma estimated with the Kaplan-Meier estimator (black line) and log-logistic parametric estimator (grey line) (
In cancer studies, where events are expected to increase initially and decrease later, the log-logistic distribution is frequently used. Rosenwald et al. (Rosenwald et al., 2002) investigated the genetic aspects of Diffuse Large-B-Cell Lymphoma survivorship following chemotherapy. We used the log-logistic distribution and the non-parametric Kaplan-Meier estimator to analyze the data that was published by the journal. It appears that the log-logistic distribution overestimated survival at the start of the follow-up and underestimated it at the end (Fig. 3a). The sample size was relatively large (
Although it may be necessary, extrapolating existing survival data carries a risk, and subsequent research may reveal that the parametric assumptions made for this purpose are not always true (Davies et al., 2013, Klijn et al., 2021). The selection process of models should be methodical, with an emphasis on evaluating hazard functions and validating survival functions that have been extrapolated (Bell Gorrod et al., 2019). Information Criterion based model selection methods, do select the best candidate from a pool of competing models, but they are unable to provide an assessment of the chosen model’s appropriateness. The importance of subject specific knowledge in selecting candidate models cannot be overestimated (Kearns et al., 2020) and as we demonstrated here this can be augmented with objective evaluation of the chosen models.
This work builds upon the framework by Claeskens and Hjort (Claeskens & Hjort, 2003), a selection method that does not seek a model-wise balance between efficiency and interpretability, but instead the focus is to obtain estimates for the parameter of interest with minimum possible bias (interpretability) and efficient statistical inference (variance), the Focused Information Criterion. In this case the parameter of interest or focus parameter is the survival curve and the aim is to select a parametric model with that has better or equal bias variance trade off that the non-parametric Kaplan-Meier estimator (Jullum & Hjort, 2017; Jullum & Hjort, 2019). Naturally, this does not completely de-risk extrapolation, but assures that the extrapolation is based on a model with adequate fit to the data. The adequacy in this case is the nearness to the unbiased Kaplan-Meier curve. This is in line with the decision making framework by Latimer (Latimer, 2013) and aims to replace or augment the visual inspection of survival curves or log-cumulative hazards. Given the relationship between cumulative hazards and survival curves decision made on either suffices. Using a rigorous numerical evaluation, rather than a subjective visual inspection, can help determine the most suitable model for extrapolation.
