Abstract
Recurrent event outcomes are ubiquitous among clinical trial data which encourages a conventional approach to analysis. Yet a common feature of these data has received less attention, that is, survival times often comprise multiple types of events that may imply a disparity in cost and disease severity. Typically, we neglect this feature of the data by combining event-types or analyzing each type separately, thus ignoring any interdependence among them. This practice may reflect a dearth of readily available methods and software that more appropriately acknowledge the true data structure. We provide a review of the literature on multitype recurrent events and frailty modelling which reflects a renewed interest in the topic over the past decade and the emergence of software for estimation. Thus, a review of available methods seems timely, if not overdue.
Introduction and motivating examples
In clinical trials of chronic diseases, we often have survival data including recurrent events that reflect disease progression and quality of life. The events may be upsetting and inhibiting for patients, such as migraines, hypoglycemic episodes in diabetes, hospital readmission in heart failure and so forth. These events may be classified by type, for example, according to severity for migraines, or whether hypoglycemia occurs during the daytime or nighttime, or whether the heart failure patient is admitted to the emergency department (ED) or hospital. Note that in all of these examples the occurrence of one type of event does not preclude the subsequent occurrence of another, that is, they are not competing risks, yet event-types may be interdependent. Recurrent events will often exhibit types (whether nominal or ordinal) in which case we can describe them as multitype recurrent events (MTREs). To further illustrate this feature of survival data, we describe several instances of MTREs we have encountered in our work.
In the mid-1990s, the Nambour Prevention Trial explored the effect of sunscreen use and beta-carotene on the incidence of nonmelanoma skin cancer in a high risk population in Northern Queensland, Australia (Green et al. 1999). About 1600 participants were randomized in a 2 × 2 factorial design, with new skin cancers detected at regular follow-up visits. Two types of skin cancers were noted: basal cell carcinoma (BCC) and squamous cell carcinoma (SCC). These are distinct histological types with different prognoses and implications (SCCs are a more serious and less common lesion) and, thus, we may not want to combine these event-types using a time-to-first BCC/SCC analysis. Thus, event-types are analyzed separately, using Cox regression, despite the very low failure rate (high censoring rate) especially for SCCs (>90%, possibly influenced by poor compliance), which renders statistical significance unattainable.
A meta-analysis of nine parallel-group and crossover studies (1674 patients) was carried out comparing rates of hypoglycemia between human insulin 30 and biphasic insulin aspart 30 (an insulin analog) in patients with Type 2 diabetes (Davidson et al. 2009). Using negative binomial regression, hypoglycemic episodes were analyzed overall and separately for major, minor and symptoms only, and also for nocturnal and daytime hypoglycemia. Patients may experience multiple episodes during the study period and the rate varies for the different event-types, for example, major episodes are less common than minor episodes. The inverse variance method was used to combine treatment effect estimates from the parallel-group and crossover studies. Although rates of overall episodes were not significantly different between the treatment groups, rates were considerably different for nocturnal and major episodes (these are particularly worrying for patients), although their incidence is low even in a meta-analysis of nine studies (14% and 2%, respectively). Overall, this approach seems ambivalent and simplistic.
Example heart failure readmissions data
Example heart failure readmissions data
More recently, we examined the prognostic value of electrocardiogram (ECG) parameters with respect to heart failure related readmissions in a Canadian cohort of 900 patients with acute heart failure (Gouda et al. 2016). We have up to five years of readmissions data for patients who were recruited in the ED. Roughly 30% of patients had at least one ED visit and 20% had at least one hospital visit during the follow-up period (i.e., after discharge). The study design itself insists a more than nominal distinction between ED and hospital admissions, implying different costs, length of stay and severity. Using Cox regression to analyse time-to-events, patients with and without atrial fibrillation at the index visit showed no difference in ED visits (
When confronted with data like those from these three studies, the statistician will typically either (a) consider event-types separately or (b) analyze them together. Regarding (a), if event-types are not independent, that is, if the occurrence of one type is suggestive of the risk of another type, then separate analyses are inadequate. Also, data may be sparse when separated out leading to equivocal results which may tempt us to consider analyzing types together. Regarding (b), although different event-types may all reflect disease progression, treating them as identical denies a potential disparity in cost, relationship to future outcomes and disease severity. Also, a strong treatment effect on one event-type will be diluted when combined with another event-type that shows a weaker effect. Thus, both approaches seem crude: we lose information if we analyze separately and we mask information if we combine them together. A more optimal modelling approach is needed.
Although a number of review articles of recurrent events analysis are available (Pandeya et al. 2005; Cook and Lawless 2007; Castaneda and Gerritse 2010; Rogers et al. 2014b) as far as we can tell no such overview exists for MTREs. A literature search reveals that models incorporating event-types are scarcely evident in medical journals reporting the results of clinical trials, with the topic confined to statistics journals. This implies limited uptake of the methods proposed; perhaps they are considered too esoteric or difficult to implement? Even among the statistics journals there appears to be a dearth of relevant research compared to the long, continuous publication history for univariate recurrent events. (Chen et al. 2012) stated that ‘statistical methods for handling multiple type recurrent events are relatively limited’ and (Zhu et al. 2010) noted that ‘there does not seem to exist an established method’. Although, with the publication of recent monographs (Cook and Lawless 2007) and programming code made available by authors (Rondeau et al. 2012), there now appears to be renewed interest in the topic (Li et al. 2015) and fewer obstacles to implementation. We focus our attention on proportional hazards frailty models because they are prevalent and a simple extension of more familiar models.
In the next section, we introduce the concept of frailty modelling and present an MTRE model by showing how it relates to common models for the analysis of recurrent events data. (The literature on recurrent events is extensive and we do not intend to provide an overview of that here.) In the third section, we describe multivariate frailty models for MTREs and then recent work regarding extensions of these models for particular scenarios. Finally, we cover software applications and use an example to illustrate how we would fit multivariate frailty model for MTREs in SAS.
Survival data are ubiquitous in clinical trials and this encourages a conventional approach to analysis (Chun et al. 2012). For example, a Cox proportional hazards regression model (Cox 1972) is typically used to examine the influence of covariates such as treatment on survival outcomes.
Yet the Andersen–Gill model assumes events within a patient are independent, meaning that the likelihood of an event at time
Despite these extensions to the original Cox model, an important feature of the event data is still ignored: readmissions are comprised of different types (e.g., ED and hospital visits), and the common approach of combining them should be questioned. Thus, we further extend our model by including random effects for event-types (
The relatedness of the recurrent events models given earlier is apparent from their form, that is, intensity-based, non-homogeneous Poisson process models (NHPP)—non-homogenous because the intensity is not constant in time. The
The assumption of independent events for the Andersen–Gill model implies that the baseline intensity is common for all events; thus a new event is unaffected by earlier events experienced by the patient (an assumption that is obviously violated in the context of heart failure hospitalizations). This differs from model (2.2) which includes random effects for patients (
The intensity-based frailty model (2.3) was described by Abu-Libdeh et al. (1990) (henceforth we will simply refer to as Abu-Libdeh) for the analysis of nonmelanoma skin cancer incidence in a randomized clinical trial of selenium supplements and includes random effects for patients
A number of probability distributions have been adopted in practice (see Lindeboom and Van Den Berg (1994) for some background). According to Cook and Lawless (2007), considerations for the choice of distribution include ‘tractability of the integral..., properties of the full intensity function, and the availability of software’ (see section ahead) and for event-types we desire a multivariate distribution. Abu-Libdeh assumed a gamma distribution for the patient effects (as per negative binomial regression) and a multivariate Dirichlet distribution for event-type effects (like the gamma distribution, the Dirichlet distribution is often used as a prior in Bayesian statistics). With two event-types (e.g., ED and hospitalization), the Dirichlet reduces to a beta distribution. However, it can be shown that, owing to the Dirichlet distribution (with
A good option for our purpose is the multivariate log-normal distribution which is promoted and spelled out by Cook and Lawless (2007) for multiple event-types. It is more apt than the gamma when we have multivariate, correlated frailties. The random effects reflect correlation among events for a patient and the correlation between event-types is deduced from the multivariate distribution. Consider then, this alternative model:
Regarding the formulation of model, as in model (2.3), we may have a common
In this review we have so far neglected the likely possibility of a terminal event (competing risk) which adds further complexity to our model, that is, the follow-up time cannot be treated as independent of the recurrent event process. In the examples cited in the ‘Introduction’, it may be reasonable to treat deaths as uninformative censoring in one case, for example, nonmelanoma skin cancer incidence (thus, the competing risk of death is absent in Abu-Libdeh's analysis), but not in another, for example, repeat heart failure related hospitalizations (see Figure 1 where death follows readmissions). Such terminal events preclude further observations of the event of interest and in the latter example we should not treat these as censored survival times since the likelihood of death increases with the number of ED and hospital visits (Lee et al. 2009); thus these processes are not independent and we would like to account for the association between them (to avoid biased estimates). Other examples of terminal events may include consequences of ineffective treatment such as dropout due to adverse events or a switching of treatment. Liu and Huang (2008) state that ‘over the past decade, there has been a growing research interest in modelling correlated failure times in the presence of informative dropout or a dependent terminal event such as death’ which leads us again to the concept of frailty and random effects.
To introduce the ‘joint frailty’ model, let us first consider the simple case without multiple event-types. Joint frailty models are becoming increasingly popular as a means of accounting for terminal events in heart failure (e.g., Rogers et al. 2014b who promote its use, and Greenberg et al. 2014), and thus frailty as a concept and means of associating events is becoming familiar to clinicians. As Rogers et al. explain: ‘a common frailty term, which can be thought of as an unmeasured indication of the severity of illness that affects both hospitalisation rate and hazard for [cardiovascular] death, induces an association between the two processes’ (2014a). Effectively, this means the processes are jointly modelled and thus our clinical understanding of the recurrent events process and mortality is enhanced. Bear in mind we are now accounting for dependence in three ways: between recurrent events within patients, between different types of recurrent events and between recurrent events and the terminal event. Rogers et al. employed a model for which ‘the individual-specific frailties are assumed to affect the rate of heart failure hospitalisations and the hazard for cardiovascular death in the same way’ and this assumption could be challenged (Rogers et al. 2014b). However, a number of authors have described approaches that allow the frailty to differ between these processes (Mazroui et al. 2013). For the joint and shared frailty, both gamma (Mazroui et al. 2012; Rogers et al. 2014b) and log-normal (McGilchrist and Aisbett 1991; Belot et al. 2014) distributions are typically assumed; perhaps gamma is more common because it yields ‘relatively tractable likelihoods’ (Cook and Lawless 2007), although with the extra complexity of this model the marginal likelihood does not have a closed form and thus ‘using other distributions for the frailty, such as log normal... will not induce more difficulties’ (Rondeau et al. 2007). Suffice to say, the choice tends to be a mathematical one rather than a biological one (Wienke et al. 2003). In any case, authors have confirmed that results are robust to a misspecification of the frailty distribution (Pickles and Crouchley 1995; Mazroui et al. 2012; Belot et al. 2014). Note that if the processes are independent (or rather dependence is entirely captured by the covariates), then the distributional parameter will be close to zero and we may adopt the simpler model. Also, if the terminal event rate is low, we may opt for a more parsimonious model because, for example, the results can be more easily presented.
For the case of MTREs and a terminal event see Cook and Lawless (2007). An MTRE model like (3.1) which incorporates a terminal event as a joint frailty may be written as follows:
Further, recent elaborations of the MTREs model have appeared in the literature such as the simultaneous modelling of longitudinal outcomes (Musoro et al. 2015), time-varying coefficients (Mazroui et al. 2016), ratios of intensity functions for types (Ning et al. 2017), interval censored data (Chen et al. 2005), gap time analysis (Lawless et al. 2001), a Bayesian approach (Lin et al. 2017) and handling of missing event-types (Chen and Cook 2009; Dewanji and Sengupta 2003; Lin et al. 2013). We may also require random effects for clustering levels such as patients within centres (Schaubel and Cai, 2005). Typically, such multi-level models lead to intractable likelihoods with high dimensional integration, that is, the frailty terms cannot be integrated out to obtain a closed form of the marginal likelihood (see (Duchateau and Janssen 2008) for multi-level frailty models). For these more complex frailty modelling approaches, we will want to know whether software is available or the statistician must code their analysis from scratch according to their specific requirements.
Developments described earlier, which have appeared since the publication of Abu-Libdeh's original random effects model, coincide with the increasing availability of more able statistical software required for the estimation of parameters, although ‘the current estimation methods for frailty proportional hazards models are not very satisfactory’ (Liu and Huang 2008) and ‘available software is limited’ (Rondeau and Gonzalez 2005). A majority of the papers cited earlier emerged in the recent literature, that is, within the last five years, although the topic of MTREs has seen renewed interest over the past decade: the statistical monograph of time-to-event data by Cook and Lawless (2007) (cited earlier) with a dedicated chapter on the topic is from 2007. Statistical software, naturally, lags behind. However, in a simple case such as the heart failure study where there are two event-types (ED and hospital), and with the baseline intensity specified (e.g., Weibull), we can use standard likelihood methods to obtain the parameter estimates and inference is straightforward, as follows.
The MTRE model (2.3) is a conditional model, events are independent given the random effects (since the random effects explain the dependence between events), and thus the conditional likelihood may be written out by hand as if the frailties were observed. The marginal likelihood is obtained by integrating, that is, averaging over the random effects (see Collett 2003, Section 11.3) to obtain a likelihood equation that contains the distributional parameters for the random effects and the parameters of interest (i.e.,
In cases that are not so straightforward, for example, when we have more event-types, or multivariate lognormal frailties, or incorporating a terminal event, the marginal likelihood contains high dimensional integrals, analytical integration is not possible and other more sophisticated methods (numerical integration or approximations) are required. The Bayesian approach seems a natural choice, a key feature of which is that parameters are unknown and described by a probability distribution (a prior distribution) like the random effects in the frailty model. Chen et al. (2005) postulate a model for interval censored data with multivariate lognormal frailties and implemented according to a Gibbs sampling algorithm. Likewise Lin et al. (also cited earlier) with three event-types and a terminal event used a Bayesian approach (Lin et al. 2017) (flat priors are assumed, e.g.,
Illustration: Heart failure readmissions
In this section, we describe the thought process for planning and implementing an MTRE analysis. Consider the heart failure study described in the ‘Introduction’ with patients recruited when they presented at the ED. Heart failure related ED and hospital visits subsequent to discharge are the event-types of interest. ED visits are more subjective and we can expect they will show a higher incidence. We examined the predictive ability of atrial fibrillation at the index visit for subsequent ED and hospital visits. A simple analysis of such data using Cox regression of time-to-first readmission or death has been presented elsewhere (Gouda et al. 2016).
The model for repeat visits is as follows:
The random effects, or ‘frailties’, for event-types (
For all-cause mortality, a joint frailty model is assumed:
The correlation
Using additional estimate statements we may test the null hypothesis of no association between mortality and the event types, for example,
Hazard ratio estimates and
-values from the MTRE analysis
Hazard ratio estimates and
-values from the MTRE analysis
The heart failure data are simple in that we have only two event-types. We may wish to extend our analysis of readmissions, for example, by including other primary diagnoses rather than restricting to heart failure (since, e.g., acute myocardial infarction increases the risk of heart failure (Velagaleti et al. 2008) and heart failure may comprise less than half of the total visits experienced (Babayan et al. 2003)). Alternatively, readmissions may be extended to three classifications: ED, ED to hospital (transferred to hospital on the same day or the following day) and hospital. In cardiovascular research, in general, we may encounter more than two event-types, for example, transient ischemic attacks or stroke may be classified by location. Likewise, Abu-Libdeh contemplated incorporating site of lesion into their analysis of BCCs and SCCs. We would, therefore, need to extend the model for more than two event-types, and perhaps considerably more. Data could become thin for these additional types and convergence may become difficult to attain.
The simple modelling approach, that is, a Cox regression of time-to-first event, remains a popular choice in clinical trials of chronic diseases. However, there is a recent push to include all events in the analysis and this has prompted a comparison of modelling strategies to identify which among them is optimal (Rogers et al. 2014b). Models for MTREs have been absent from this discussion even though recurrent events often imply some categorization of events into types. There may be extraneous or practical reasons that dissuade statisticians from allocating an MTRE model as the primary analysis. For example, despite the assistance offered by recent publications and statistical programmers, the statistician is required to invest more time in the early stages of the analysis. Also, a time-to-first analysis provides a simple basis for a power calculation; an MTRE model on the other hand requires data simulations. The time-to-first analysis, by amalgamating outcomes, obviates the issue of multiple testing and the adjustment of the nominal significance level (although an MTRE model could produce a weighted average across outcomes to estimate the net effect too). There may also be some resistance to unfamiliar, esoteric results when hoping to reach a broad audience.
Nevertheless, with the increasing relevance of recurrent events and MTREs in various clinical contexts, and with implementation using standard software now a straightforward matter, it will become increasingly difficult to neglect MTRE modelling. By providing a more complete characterization of the data and an increase in statistical efficiency, MTRE modelling could alter the conclusions drawn from clinical trial or registry data. If it is difficult to pre-specify a distribution for the random effects for event-types, or convergence is a concern, then the MTRE model could provide a supportive or sensitivity analysis, bearing in mind the trade-off between statistical rigour and cogency of results. Ultimately, of course, it will depend on the particular circumstances (e.g., the event rate, extent of follow-up, reliability of data collection, etc.) and the purpose of the analysis. A simulation study evaluating how MTRE models perform under differing circumstances would certainly be informative and aid this decision.
In the introduction, we quoted a Lancet paper where nonmelanoma skin cancers (SCC and BCC) were analyzed separately, yet we see Abu-Libdeh analyzing nonmelanoma skin cancer incidence using an MTRE model. This inconsistency implies more than a mere preference for one statistical approach over another. It implies a contradictory understanding of the clinical condition. Today, with increasing attention given to the associations among event-types, the justification for analyzing separately may no longer be readily accepted. A 2002 paper by the eminent cardiologist Dr Salam Yusuf was titled ‘Choice of Clinical Outcomes in Randomized Trials of Heart Failure Therapies: Disease-specific or Overall Outcomes?’ (Yusuf and Negassa 2002). One concern being that when all hospitalizations are combined it can mask a treatment effect that would be seen on disease-specific outcomes. Although it seems there is a third option we are neglecting, that is, the MTRE model.
