Abstract
Abstract:
Longitudinal and survival data are often collected from clinical studies. Mixed-effects joint models are commonly used for the analysis of such data. Nevertheless, the following issues may arise in longitudinal survival data analysis: (a) most joint models assume a simple parametric mixed-effects model for longitudinal outcome, which may obscure the important relationship between response and covariates; (b) clinical data often exhibits asymmetry so that symmetric assumption for model errors may lead to biased estimation of parameters; (c) response may be missing and missingness may be informative. There is little work concerning all of these issues simultaneously. We develop a Bayesian varying coefficient mixed-effects joint model with skewness and missingness to study the simultaneous influence of these features. The proposed methods are applied to an AIDS clinical data. Simulation studies are conducted to assess the performance of the method.
Keywords
Introduction
Joint analysis of longitudinal and survival data has received a great deal of attention in clinical trial studies. Joint modelling and inferential methods have been proposed for analyzing such type of data under various scenarios (Faucett and Thomas, 1996; Song et al., 2002; Brown and Ibrahim, 2003; Elashoff et al., 2008; Hu et al., 2009; Huang et al., 2011; Li et al., 2012). Most of these joint models assume a mixed-effects model for longitudinal process and a proportional hazard (PH) model for competing risks process. There is little work on analyzing the simultaneous impacts induced by asymmetry and non-ignorable missingness which are commonly observed in practice.
In AIDS studies, it is interesting to study the relationship between two important biomarkers: HIV viral load and CD4 counts. Figure 1 illustrates the correlation between viral load and CD4 counts at different time points from a Multicenter AIDS Cohort Study (MACS) (Section 2.1 has the details of this study). It is clear that the relationship between viral load and CD4 counts is varying across the study period. Therefore, a varying coefficient mixed-effects model is more appropriate than a parametric model for the purpose of investigating the changing relationship.
A common assumption for most models is that model errors are symmetrically distributed, which may lack robustness against departures from normality and outliers. Misleading results may occur by making inferences based on symmetric assumptions (Sahu et al., 2003; Ho and Lin, 2010; Lachos et al., 2011). Although a transformation such as logarithm or square root may be adopted for the skewed data which makes feasible the analysis with symmetric assumption, the interpretation of results may be difficult. Sometime, even transformation does not make the validity of symmetric distribution under transformed scale. Figure 2 displays the distribution of residuals based on OLS estimates of the subject-specific regression from 1598 subjects in the MACS study. It is noted that the model errors are highly skewed. Thus, a normality assumption is not quite realistic for this dataset. To model skewed data, it is more appropriate to adopt asymmetric distributions such as skew-t (ST) and skew-normal (SN) distributions. The validity of any statistical inference is based on the assumption that variables that are intended to be measured are actually measured without missingness. However, due to many factors, data are never perfectly collected. Although clinical studies are designed to collect data from each patient at each assessment time, missing observations in the response are often encountered due to many reasons. Missingness may be informative or non-ignorable if it is related to missing values. Without accounting for missingness in analysis may make the inferential results questionable. Many likelihood-based methods have been proposed for handling non-ignorably missing responses. Approaches using selection models include Baker (1995), Fitzmaurice et al. (1996), Molenberghs et al. (1997) and Ibrahim et al. (2001), among others. Approaches based on pattern-mixture models have been taken by Ekholm and Skinner (1998), Fitzmaurice and Laird (2000) and Birmingham and Fitzmaurice (2002), among others. Follmann and Wu (1995) propose a class of shared parameter models for non-ignorable non-response that can be specified as a random effects model for the primary response, combined with a model for the missingness in which the random effects are treated as covariates. They condition on the data that describes missingness and use a conditional model for inference. Rotnitzky et al. (1998) develop a class of estimators for generalized linear models (GLMs) with non-ignorable missing responses that are based on inverse probability weighted estimating equations.
It is unclear how asymmetry and missing observations which are commonly seen in longitudinal-survival data may interact and influence the inferential procedure, especially under the non-parametric framework. We propose a varying coefficient mixed-effects joint model to account for longitudinal survival data with asymmetry and non-ignorable missingness. We develop a Bayesian approach to simultaneously estimate parameters in the joint model based on the ST distribution. Note that normal (N) and skew-normal (SN) distributions are special cases of ST distribution. When degrees of freedom approach infinity, ST distribution is approximate to SN distribution. If the skewness parameter is zero, SN distribution degenerates to normal distribution. Therefore, we built the joint model using ST distributions since it includes important special cases such as SN and normal distributions. We consider multivariate ST and SN distributions (Sahu et al., 2003), which are convenient for Bayesian inference. Appendix 1 briefly discusses these distributions.
Illustration of varying relationship between viral load and CD4 counts at different times in the MACS. A simple linear regression model was used to quantify their correlation at different study times. Data (’+’) are overlaid with the regression line. The estimates of intercept and slope as well as confidence interval are presented above each plot
Illustration of varying relationship between viral load and CD4 counts at different times in the MACS. A simple linear regression model was used to quantify their correlation at different study times. Data (’+’) are overlaid with the regression line. The estimates of intercept and slope as well as confidence interval are presented above each plot
The rest of article is organized as follows. In Section 2, we describe the dataset that motivated this research and the joint statistical models that account for skewness and missingness are introduced. The Bayesian approach that estimates the parameters in the joint model is presented in Section 3. In Section 4, the proposed joint models and inferential method are applied to MACS data and analysis results are presented. To validate the proposed model and method, simulation studies are carried out in Section 5. The article is concluded with discussion in Section 6.
Motivating data
The data that motivates this study was gathered in MACS, an ongoing prospective study of the natural and treated histories of HIV-1 infection in homosexual and bisexual men conducted by multiple sites (Baltimore, Chicago, Pittsburgh and Los Angeles). A total of 6 972 men at risk of HIV infection have been enrolled in three cohorts (first cohort
Figure 1 shows that the relationship between viral load and CD4 counts varies across time during the treatment. Thus, the relationship can not be modelled by a parametrical function. We adopt a varying coefficient mixed-effect joint model in which the non-linear relationship between viral load and CD4 counts is modelled non-parametrially. Further more, Figure 2 shows that the model errors are highly skewed. Therefore, a normality assumption is not quite realistic for this dataset. Alternatively, we adopt an asymmetric ST distribution (Azzalini and Capitanio, 2003; Sahu et al., 2003; Arellano-Valle and Genton, 2005; Azzalini and Genton, 2008; Jara et al., 2008; Ho and Lin, 2010) rather than a symmetric normal distribution.
Histogram of residual of OLS estimates
Note: The skewed distribution of residuals based on OLS estimates of the subject-specific regression from 6 972 patients in the MACS study
Histogram of residual of OLS estimates
We denote the viral load and CD4 counts for the
In the follow up of a study, a subject may experience multiple types of failure or could be right censored. We assume that there are
The corresponding survival function for the
In clinical studies, it is very common that some patients may miss scheduled visits due to drug resistance or other reasons. The probability of missing data may be related to the missing values which renders informative or non-ignorable missingness. Statistical inferences ignoring missing data mechanism may lead to biased estimation when missing mechanism is informative or non-ignorable (Little and Rubin, 2002). To achieve reliable estimates of parameters, we consider a missing data model which relates the missingness in response to the missing values:
The previously discussed longitudinal process, time-to-event process and missing data process are inherently connected. The computational cost associated with the joint likelihood for the longitudinal data with skewness as well as the competing risks survival data is usually high. Often times, the frequentist's approach fails to converge (Wu, 2002). We propose a fully Bayesian approach for the joint models (2.1), (2.5) and (2.7) which handle longitudinal-competing risk survival data with skew distribution and missingness. The Markov chain Monte Carlo (MCMC) technique is employed to estimate all parameters simultaneously based on the joint likelihood.
Through introduction of a
In the Bayesian framework, we specify the models for the observed data and the prior distributions for the unknown model parameters. Subsequently, based on the posterior distributions, we make statistical inference for the unknown parameters. We denote
We denote
Model specification
The dataset that motivates this study was described in Section (2.1). As shown in Figure 1, the relationship between viral load and CD4 counts is varying during the treatment. In addition, the viral load in log
By comparing Models N through ST, we test if the skew distribution assumed for error terms works better than that of symmetric distribution.
To approximate the time-varying coefficients in model (2.4), we use the same basis functions for
In terms of MACS data, we are interested in two cause-specific time-to-events: HIV death and other death. Thus, the competing risk hazard models for the two risks are:
We take following prior distribution for the parameters in the joint models: (a) fixed-effects are taken to be independent normal distributions
The proposed method is implemented with WinBUGS software (Lunn et al., 2000). The MCMC scheme iterates between the Gibbs sampler and the M–H algorithm for drawing samples from posterior distributions of parameters in the joint models. Based on MCMC samples, we are able to draw statistical inference for interested parameters. Specifically, we are interested in the posterior means (PM) and quantiles. Convergence of MCMC algorithm is assessed by standard tools in WinBUGS software such as trace plots and Gelman–Rubin diagnostics (Gelman and Rubin, 1992). We will run multiple chains. After convergency, we keep sampling for certain period and extract samples after every few iterations. The collected samples will be used to make inferences for the posterior distribution of interested parameters.
When convergence was achieved, for each of the three chains, after an initial number of 50 000 burn-in iterations, every 50th MCMC sample is retained from the next 50 000. Thus, we obtain 3 000 samples of targeted posterior distributions of the unknown parameters for statistical inference.
Gelman–Rubin diagnostic plot based on Model ST for representative parameters from three Markov chains. The middle and bottom curves below the dashed line which indicates value one, stand for the pooled posterior variance and average within-sample variance separately, and the top curve represents their ratio
Figure 3 shows the dynamic Gelman–Rubin diagnostics based on model ST as obtained from the WinBUGS output for the representative parameters where the three curves are given: the middle and bottom curves below the dashed horizontal line (indicating value one) represent the pooled posterior variance and average within-sample variance, respectively, and the top curve above the dashed horizontal line represents their ratio. It is seen that the ratio tends to 1, and both pooled posterior variance and average within-sample variance settle down as the number of iterations increases, indicating that the algorithm has reached convergence.
Tables 1 and 2 present the population PM estimates, standard deviation (SI) as well as associated 95% credible intervals CI for fixed-effects, scale and skewness parameters in the three models. We observe quite distinct estimates among three models. In particular, the estimated variance parameter
Figure 4 examines the goodness-of-fit for the three models. Obviously, comparing to Model N which assumes normal distribution for random errors, Models ST and SN that assume asymmetric distribution for random errors provide a better fit to the observed data. In addition, Q–Q plots–of residuals show fewer outliers from Models ST/SN than Model N. Moreover, Model ST fits data better than Model SN since it accounts for heavy tails. Hence, accounting for skewness and heavy tails in data is essential for better model performance.
Diagnostics of model fitting for the three models. Top panel: fitted versus observed values of log
viral load; bottom panel: Q–Q plots for residuals in three models
Diagnostics of model fitting for the three models. Top panel: fitted versus observed values of log
viral load; bottom panel: Q–Q plots for residuals in three models
To select the best model that fits the data, we use a Bayesian model selection criterion: DIC introduced by Spiegelhalter et al. (2002). In addition, we compute and compare the expected predictive deviance (EPD) for each model. EPD is defined as
The estimated PM for parameters of scale, skewness and the corresponding SD and lower limit (
) and upper limit (
) of
equal-tail CI as well as DIC and EPD values for MACS study
Since Model ST is superior to Models SN/N, we carry out further analysis based on Model ST. To assess how the missing data in response influences modelling outcome, we fit another model (Model NM) in which the missing data mechanism is not taken into account in the joint model. It is observed based on Tables 1 and 2 that there are important differences in the estimates of parameters between Models NM and ST. The estimated DIC/EPD values are larger for Model NM (12 166/0.54) than Model ST (8 117/0.16). In addition, the estimates of parameters
The population estimating curve of varying-coefficients
,
for longitudinal model (2.1) based on the best selected Model ST. The estimates (solid curve) along with the 95% CI (dashed curves) are presented
The population estimating curve of varying-coefficients
,
for longitudinal model (2.1) based on the best selected Model ST. The estimates (solid curve) along with the 95% CI (dashed curves) are presented
The primary objective is to investigate the time-varying relationship between viral load and CD4 counts. The smoothing estimate of population varying-coefficients (
Illustration of four represented individual estimating curve (dashed lines) of
based on Model ST. Population curves (solid lines) are shown for comparison
The mixed-effects modelling approach makes it possible for the parameter estimation at both the population and individual level, simultaneously, since it accounts for both within and between subjects variation. The estimates at the individual level may not follow the pattern of the population level exactly if the between subject variation is large. Figure 6 presents a few patterns for the time-varying relationship between viral load and CD4 counts at the individual level. For comparison purpose, the population estimating curve (
We conduct simulation studies to evaluate the performance of the proposed joint models. The design of simulated data is similar to the joint models used for the real data. Specifically, the longitudinal outcomes are simulated from the mixed-effects model (2.4). The measurement time
Simulation results based on 200 simulated dataset. The true values, bias (%) and coverage rate (%) of 95% CI are presented for each parameter
Simulation results based on 200 simulated dataset. The true values, bias (%) and coverage rate (%) of 95% CI are presented for each parameter
For each simulated dataset, we adopted similar models and MCMC sampling schemes that were used for the real data analysis. Priors with large variances are used in the Bayesian inference. We fit each of the Models ST, SN, N and NM to each simulated dataset and compare their performances. Table 3 presents the simulation results. As can be seen, Models ST (bias ranges from
We developed a Bayesian varying coefficient mixed-effects joint models to deal with longitudinal survival data with asymmetry and missingness. This type of joint modelling approach is useful in many application areas, producing accurate estimation of parameters while accounting for skewness and incompleteness in longitudinal survival data. Although we illustrated the method with an AIDS study, the basic idea of the proposed model and method has broader applications whenever technical conditions are met. The proposed joint modelling approach using skew distributions proves that there is potential gain of efficiency and accuracy in parameter estimation when symmetric distribution does not hold. In particular, we assessed the contribution of asymmetric distributions for model errors to model building and parameter estimation, comparing to a symmetric distribution. The findings suggest that to achieve reliable results for skewed data, it is critical to assume a skew distribution in the joint model. We further investigated how missing data affect modelling outcomes. We found that the informative missing data play an important role in parameter estimation. When data are not completely measured, it is crucial to adopt an appropriate missing data model to avoid biased estimation. The proposed model and method is useful for evaluating the relationship between viral load and CD4 count in AIDS studies which are two important indices for treatment assessment.
Note that the varying coefficient mixed-effects models are not typical non-parametric models. Formally, a non-parametric model contains infinite-dimensional parameters and, therefore, not all varying coefficient mixed-effects model are formally non-parametric. As a matter of fact, the model considered in this article, based on a finite number of spline bases, is in fact parametric.
Sensitivity analysis for different sets of hyper-parameters based on Model ST
Sensitivity analysis for different sets of hyper-parameters based on Model ST
We assume an inverse gamma prior distribution,
In Bayesian analysis, it is critical to perform sensitivity analysis to see if the posterior estimates change significantly when priors are different. Towards this end, we carried out sensitivity analysis by employing a few sets of different values for the hyper-parameters in (3.2) and re-run the MCMC sampling scheme. Basically, we increase the hyper-parameter values adopted in Section 4.1 by 10 and 100 times for Model ST and re-run the analysis. The parameter estimates based on different set of hyper-parameter values are presented in Table 4. For comparison purpose, the results based on original set of hyper-parameter values in Section 4.1 are denoted as ‘Set 1’. The findings show that, except for some numerical fluctuation, the parameter estimating results are essentially same based on the different set of hyper-parameter values. Thus, we are confident that the obtained results are robust against hyper-parameter values. In model (2.1), we adopted the regression spline basis to represent the unknown smoothing function. There are a lot of alternative ways for approximating the unknown function such as local polynomial kernel and smoothing splines. It is interesting to compare the modelling results based on various methods.
The proposed model and method rely on the conditional PH assumption on the survival sub model, which is rather strong and could lead to wrong inferences when the PH assumption does not hold. Our model can be extended to handle clustered data. Clustered data arise frequently from multi-site clinical trials in which each site can be viewed as a cluster, from studies with families data, or from studies with recurrent events for each subject. A simple method is to add an additional random effect in our joint model to adjust for heterogeneity across the clusters. Our computational MCMC algorithm can be adopted without much modification.
According to Associate Editor's suggestion, we test a few alternative models: Model T assume a student
Different versions of multivariate skew distributions have been introduced in the literature (Azzalini and Capitanio, 2003; Sahu et al., 2003; Azzalini and Genton, 2008; Jara et al., 2008). A new class of distributions by introducing skewness in multivariate elliptically distributions were developed in publication (Sahu et al., 2003). The class, which is obtained by using transformation and conditioning, contains many standard families including the multivariate SN and skew-
As we know, a normal distribution is a special case of an SN distribution when the skewness parameter is zero, and the ST distribution reduces to the SN distribution when degrees of freedom are large. For completeness, this Appendix briefly summarizes the multivariate ST distribution introduced by (Sahu et al., 2003) to be suitable for a Bayesian inference since it is built using the conditional method. For detailed discussions on properties of ST distribution, see Sahu et al. (2003). Assume that a
A
Note that the ST distribution presented in (A.7) can be reduced to the following three special cases: (a) as
B.1 Full conditional distributions
The full conditional distribution of the coefficients of the mixed-effects submodel (2.4)
The full conditional distribution of the random effects of the mixed-effects submodel (2.4)
The full conditional distribution of the variance of the mixed-effects submodel (2.4)
The full conditional distribution of the covariance–variance of the random effects of the mixed-effects submodel (2.4)
The full conditional distribution of the skewness parameter of the mixed-effects submodel (2.4)
The full conditional distribution of the degree of freedom of the mixed-effects submodel (2.4)
The full conditional distribution of coefficients of the survival submodel (2.5)
The full conditional distribution of the baseline hazards of the survival submodel (2.5)
The full conditional distribution of the coefficients of the missing data submodel (2.7)
B.2 MCMC implementation
Gibbs sampling was combined with M-H sampling for posterior. Particularly, for the parameters Σ, Gibbs sampling was applied since the full conditional distribution is standard. For the other parameters, the M–H algorithm was applied.
