Abstract
The present study considers non-mixture models based on the discrete Burr XIII distribution to model recurrent event data in the presence of a cure fraction. In this context, we provide an alternative to the standard Cox proportional hazards model using a discretized distribution to analyze lifetime data assuming a non-mixture structure for cure rates. In a Bayesian setting, the proposed methodology was considered for analyzing a real dataset from a retrospective cohort study that aimed to evaluate specific clinical conditions that affect the lifetimes of 299 heart failure patients admitted to the Institute of Cardiology and Allied Hospital – Faisalabad, Pakistan (April-December, 2015). The model validation process was addressed using the Cox-Snell residuals, which allowed us to identify the suitability of the proposed non-mixture cure rate model.
Keywords
Introduction
In medical research, many studies are developed to evaluate how some subject-specific factors impact the probability of an exposed patient experiencing the event of interest (such as death, for example). Conversely, researchers may also be interested in understanding which conditions explain the existence of an underlying cluster of right-censored patients. In this case, it would be essential to access the mechanism responsible for characterizing “cured” patients. Standard survival analysis techniques, as the Cox proportional hazards model (Cox, 1972), provide no straightforward estimation for cure rates, which we acknowledge as the primary motivation for adopting mixture or non-mixture cure fraction model. In this way, several approaches have been proposed for modeling cure rates on univariate lifetime data (Farewell, 1982; De Angelis et al., 1999; Cancho & Bolfarine, 2001; Price & Manatunga, 2001; Yu et al., 2004; Yin & Ibrahim, 2005; Lambert et al., 2006; Lu, 2010; Othus et al., 2012; Achcar et al., 2012). The bivariate setting was treated by Wienke et al. (2003) and Wienke et al. (2006); in each case, the authors have introduced model-based structures for the analysis of bivariate time-to-event data cure rates. Besides, a general multivariate cure rate model was derived, and well-discussed by Cancho et al. (2016).
Lambert (2007) describes a cure rate model that incorporates the expected (or background) mortality for each subject, and so one can estimate cure fractions when some patients die of anything other than the investigated cause. Such an approach may be advantageous in practice since the information of cure at the individual level will rarely be available. Lambert also points out that, usually, the relative survival curve stays at a plateau level after several years in oncology studies. This plateau effect occurs when the mortality rate of diseased subjects is the same as the expected mortality rate in the general population, or equivalently, the excess mortality rate is equal to zero; that is, there is a cure fraction in the target population.
According to Vahidpour (2016), there are two approaches to modeling cure fractions in lifetime data. The first one is based on the mixture cure rate model, also known as the standard cure rate model (De Angelis et al., 1999; Tsodikov et al., 2003; Lambert et al., 2006), which has been widely applied in the analysis of survival datasets with some subjects that do not experience the event of interest before the end of the study. The second approach is based on non-mixture cure rate models, which are not very popular in survival analysis (Achcar et al., 2012; Vahidpour, 2016). In this way, let us denote
where
and so the hazard function (hf) of
where
The primary goal of this paper is to explore the structure of non-mixture cure fraction models for analyzing long-term lifetimes depending on potential risk factors. Specifically, we aim to introduce a non-mixture cure fraction model based on the Discrete Burr XIII (DB-XIII) distribution (Krishna & Pundir, 2009; Akdoğan et al., 2015, Kamari et al., 2015), whose continuous counterpart is widely used in medical applications nowadays, mainly due to the great flexibility of its hazard rate curves. As pointed out by Mazucheli et al. (2018), the DB-XIII distribution preserves some properties from its origin, and so we acknowledge that it may be interesting to consider such a model to analyze discrete lifetime datasets.
Several statistical techniques are commonly used in medical research to analyze univariate survival datasets with censored observations. Beyond the Cox proportional hazard model, we mention, in this context, the Kaplan-Meier estimator (Kaplan & Meier, 1958) for empirical survival functions and the non-parametric hypothesis tests for treatments comparison, such as Wilcoxon, Log-Rank, or Cox-Mantel tests. However, these methods may not provide accurate results when analyzing discrete lifetimes whose underlying population has long-term survivors. We acknowledge this fact as another excellent motivation for considering the DB-XIII model as the baseline distribution for characterizing lifetimes in survival studies.
By using the model to be proposed, the second goal of this paper is to model data from a retrospective cohort study conducted by Ahmad et al. (2017), which consists of 299 patients with cardiovascular heart failure, who were admitted to the Institute of Cardiology and Allied Hospital – Faisalabad, Pakistan (April-December, 2015). The records include patients aged 40 years or above, having left ventricular systolic dysfunction, belonging to the New York Heart Association (NYHA) classes III and IV. We have considered data of such nature since heart failure includes coronary heart disease, diabetes, high blood pressure, and other conditions that arouse great interest in the medical field.
According to Denolin et al. (1983), heart failure is the pathophysiologic state in which an abnormality of cardiac function is responsible for the failure of the heart to pump blood at a rate commensurate with the requirements of the metabolizing tissues. It is essential to point out that not all conditions that lead to cardiovascular heart failure can be reversed, but treatments can improve the signs and the related symptoms. One of these conditions, for example, is the myocardial ischemia (Buja, 2005) that results from severe impairment of coronary blood supply and produces a spectrum of clinical syndromes. For more details of those irreversible heart conditions and the epidemiology of heart failure, the interested reader should consult (Kannel & Belanger, 1991).
This paper is organized as follows. In Section 2, we present the non-mixture DB-XIII cure rate model and its regression structure. We also describe the adopted Bayesian approach, the associated numerical procedures considered for inferential purposes, and the model validation approach based on the Cox-Snell residuals. In Section 3, we assess the empirical properties of the Bayesian estimators through an intensive Monte Carlo simulation study. In Section 4, we present an exploratory data evaluation of the cardiovascular heart failure dataset, and, in the following, the proposed cure rate model is applied for identifying (possible) risk factors for heart failure. General comments and concluding remarks are addressed in Section 5.
Non-mixture DB-XIII cure rate model
A discrete random variable
for
A non-mixture cure rate model based on the DB-XIII distribution has baseline cdf given by
and so we have
where
In order to evaluate the effect of covariates (and risk factors) on the probability of a subject being a long-term survivor, we have to specify a monotonic, invertible, and twice differentiable link function, say
but the users can adopt other specifications (e.g., probit, log-log, cauchit, among others).
Let
where
In a Bayesian framework, one may assume that no specialized information is available to justify the choice of informative priors for the model parameters. In this context, we specify prior distributions such that, even for moderate sample sizes, the information provided by the data should dominate the prior information. The non-informative prior distributions adopted in this work are given by
where
From this point of view, inferences for vector
Step 1: Choose an initial value Step 2: Generate Step 3: Generate Step 4: Generate Step 5: Repeat Steps 2–4 Step 6: Compute the component-wise Monte Carlo posterior estimate of
where
To apply the proposed Bayesian approach to fit the non-mixture DB-XIII cure rate model, we have adopted the MwG algorithm for MCMC sampling. For each generated sample, a chain with
Model validation procedures play an essential role when evaluating the suitability of any fitted model. In general, residual metrics are widely used in such a context, being those measures responsible for indicating departures from the underlying model assumptions by quantifying the data variability that the fitted model did not accommodate. Assessing model suitability using residual metrics is common nowadays; however, deriving appropriate residuals is not always easy for non-normal models typically adopted for describing cure rates in survival studies. In this way, we will consider here a popular residual metric proposed by Cox and Snell (Cox & Snell, 1968), which can be straightforwardly used in our context to assess the appropriateness of the proposed model when used in the analysis of real datasets. The Cox-Snell residuals are defined by
According to Cox and Snell (1968), if the obtained model fit is adequate, then the Cox-Snell residuals should follow an Exponential distribution with mean equals 1 (if
In this work, we have adopted
The overall procedure for obtaining the Cox-Snell residuals is summarized in the following.
Step 1: Fit the adopted model; Step 2: Estimate the Cox-Snell residuals, Step 3: Estimate the empirical survival function of the Cox-Snell residuals, Step 4: Compute Step 5: Plot
After following those steps, one should evaluate the results carefully. In general, if the displayed dots are close to a straight line with unit slope and intercept equals 0, then the fitted model can be considered adequate.
The empirical properties of an estimator can be accessed through Monte Carlo simulations. In this way, we have conducted an intensive simulation study aiming to evaluate the performance of the proposed Bayesian approach in some specific situations. The simulation process was carry out by generating 500 pseudo-random samples of sizes
The inverse-transform method for discrete distributions (Rubinstein & Kroese, 2008) was considered in the process of generating the pseudo-random samples. As previously described, the adopted Bayesian point estimator is the posterior expected value, and here we have evaluated the performance of such an estimator by assessing its bias (B) and its mean squared error (MSE). Thus, using samples of the model parameters, one may obtain approximate Monte Carlo estimators for these measures as
where
where
The obtained results are presented in Table 1. We have noticed in our study that the biases of
Cardiovascular heart failure
Exploratory data analysis
As previously described, one of the goals of this work is to analyze data from a retrospective cohort study (Ahmad et al., 2017) related to 299 patients with heart failure comprising of 105 women (35%) and 194 men (65%) with more than 40 years old and had left ventricular systolic dysfunction, lying in the NYHA classes III and IV. The follow-up time was 4–285 days, with an average of 130 days. As potential risk factors to explain mortality caused by cardiovascular heart disease, the following variables were considered: age, serum sodium, serum creatinine, gender, smoking, blood pressure (BP), ejection fraction (EF), anemia, platelets, creatinine phosphokinase (CPK) and diabetes. The baseline characteristics for dead and censored patients diagnosed with cardiovascular heart disease are summarized in Table 2.
Results of the simulation study for the non-mixture DB-XIII cure rate model
Results of the simulation study for the non-mixture DB-XIII cure rate model
Baseline characteristics for dead and censored patients diagnosed with cardiovascular heart disease
Kaplan-Meier plots for the OS function (left-panel) and the EF-based survival estimates (right-panel).
Figure 1 depicts the Kaplan-Meier plots of the overall survival (OS) function and the survival of patients with EF
The estimated empirical survival functions illustrate a typical case where one should consider cure rate models in the data analysis. Thus, in the present study, we aimed to identify which risk factors are more likely to affect the chance of cure, describing the nature of the probabilities using the non-mixture DB-XIII cure rate model, which accommodates long-term survivors. In this context, accounting for cure fraction is crucial as it allows us to evaluate the different influences of risk factors on the behavior of cured patients (evidenced by a significant odds ratio) and which risk levels may lead patients to death (evidenced by a non-significant hazard ratio). Therefore, for the
where
This section is dedicated to presenting and discussing the main results obtained from the fit of the non-mixture DB-XIII cure rate model adopted for analyzing the cardiovascular heart disease dataset, using the regression structure provided by Eq. (5). Before fitting the proposed model, we were concerned about the multicollinearity issue, so we have performed a preliminary analysis to evaluate whether the covariates are highly correlated to each other. Thus, in Fig. 2 we present a graphical display of the covariates’ correlation matrix based on the Spearman rank coefficient. One can notice that Gender and Smoking Habit are the most correlated covariates, but the coefficient’ magnitude (
Parameter estimates and 95% HPD intervals from the fitted non-mixture DB-XIII cure rate regression model
Parameter estimates and 95% HPD intervals from the fitted non-mixture DB-XIII cure rate regression model
Graphical display of the covariates’ Spearman Rank correlation matrix.
After running the MwG algorithm and performing tests for diagnosing convergence, the stationarity of the generated chains (after burn-in) was revealed. The obtained Bayesian results are summarized in Table 3, where the posterior parameter estimates and the 95% highest posterior density (HPD) intervals are presented. From the results displayed in Table 3, one can notice that age and EF are statistically significant risk factors for developing cardiovascular heart disease. In this setting, the obtained results suggest that increasing age represents a higher risk for the disease since the estimated hazard ratio is 5.731
Half-Normal plot with simulated envelope for the Cox-Snell residuals.
Our findings are corroborated by Solomon et al. (2005) that investigated the influence of ejection fraction on cardiovascular upshots in heart failure patients. In their study, the authors showed that left ventricular ejection fraction (LVEF) is a powerful predictor for heart failure across a broad spectrum of ventricular functions; however, such a measure does not further assess cardiovascular risks of heart failure patients with EF
Watson et al. (2015) used MicroRNA signatures to differentiate preserved from reduced EF, independent of an echocardiography exam, which constitutes a significant challenge for the medical community heart studies. On the other hand, Reddy et al. (2018) developed noninvasive diagnostic criteria for preserved EF on euvolemic patients with dyspnea since diagnosis of cardiovascular heart failure is admittedly challenging in these cases.
Finally, compared to the results obtained by Ahmad et al. (2017) using the standard Cox proportional hazards model, our results are notably different when it comes to the effect of the risk factors. In the Cox model, we have the following covariates as statistically significant: age, EF, serum creatinine, serum sodium, and anemia. Ahmad et al. (2017) interpreted the results regarding the increasing age similarly to our’s previous conclusion. As for EF, they have pointed out that the hazard rate between patients with EF
This work introduced a non-mixture cure fraction model based on the DB-XIII distribution as an alternative to the standard Cox proportional hazard model. The proposed model was considered to analyze a medical survival dataset related to long-term heart failure patients with left ventricular systolic dysfunction. We found promising results in discovering the significant risk factors affecting the survival times and the subjects’ cure rate.
We have considered a fully Bayesian approach for model estimation. The adopted method was based on the MwG algorithm for sampling pseudo-random values from the posterior distribution of model parameters. We acknowledge the remarkable computational simplicity of estimating cure rate models baseline by the DB-XIII distribution. Furthermore, Bayesian techniques allow incorporating prior information from experts, leading to much more insightful inferential results. Besides, such methods also provide straightforward interpretations based on the estimated model parameters, which is a prominent concern in medical applications.
Using the standard Cox model for the same dataset, we found the results obtained by Ahmad et al. (2017) quite different from ours. This fact is directly related to adopting a regression structure for cure fractions based on a flexible discrete distribution. Our study identified only two significant factors for cardiovascular heart failure: age and EF. Noticeably, the observed differences highlight the importance of choosing adequate statistical models, especially in clinical studies, as the widely used Cox proportional hazards model does not accommodate cure fractions. In general, accounting for cure rates may provide much more accurate model-based results when the empirical survival function forms a plateau, so indicating the existence of long-term survivors.
Footnotes
Acknowledgments
We thank the Co-Editor-in-Chief and the anonymous referee for the careful reading and thoughtful suggestions for improving this work’s content.
