Pairwise estimation of multivariate longitudinal outcomes in a Bayesian setting with extensions to the joint model

Abstract

Multiple longitudinal outcomes are theoretically easily modelled via extension of the generalized linear mixed effects model. However, due to computational limitations in high dimensions, in practice these models are applied only in situations with relatively few outcomes. We adapt the solution proposed by Fieuws and Verbeke (2006) to the Bayesian setting: fitting all pairwise bivariate models instead of a single multivariate model, and combining the Markov Chain Monte Carlo (MCMC) realizations obtained for each pairwise bivariate model for the relevant parameters. We explore importance sampling as a method to more closely approximate the correct multivariate posterior distribution. Simulation studies show satisfactory results in terms of bias, RMSE and coverage of the 95% credible intervals for multiple longitudinal outcomes, even in scenarios with more limited information and non-continuous outcomes, although the use of importance sampling is not successful. We further examine the incorporation of a time-to-event outcome, proposing the use of Bayesian pairwise estimation of a multivariate GLMM in an adaptation of the corrected two-stage estimation procedure for the joint model for multiple longitudinal outcomes and a time-to-event outcome (Mauff et al., 2020, Statistics and Computing). The method does not work as well in the case of the corrected two-stage joint model; however, the results are promising and should be explored further.

Keywords

pairwise multivariate mixed effects bayesian joint importance sampling

1 Introduction

Analysis of repeated measurement data is typically done using the framework of mixed effects models, which includes the linear mixed effects model (LMM; Laird and Ware, 1982), the non-linear mixed effects model (Davidian and Giltinan, 1995) and the generalized linear mixed effects model (GLMM; Breslow and Clayton, 1993). These models account for the more complex correlation structure of this data, whereby we have multiple sources of variability, and correlation both between and within units of observation/subjects. They also manage well with missing data and unbalanced designs, since subjects often have differing numbers of observations, or data collected at differing time points. These models and extensions thereof have been widely used in the analysis of single or univariate outcomes (Brown and Prescott, 1999; Pinheiro and Bates, 2000; Demidenko, 2004). In practice however, repeated measurements may be collected for multiple outcomes. A number of research questions then arise: we may be interested in determining the joint effect of a variable on multiple outcomes, or in the association structure between multiple outcomes and the evolution thereof, or perhaps in the association of the longitudinal evolutions (Fieuws and Verbeke, 2004). We may further be interested in determining the association between multiple longitudinal outcomes and time-to-event data (the so-called multivariate joint model), that is, estimation of the relative risk of an event of interest incorporating multiple endogenous time-varying covariates. This latter methodology is important for understanding the underlying complexities of disease dynamics and also for improvements in prognostication.

While a number of approaches exist for the analysis of multiple longitudinal outcomes, we focus here on the random effect family of models, and specifically on the extension of the GLMM. These models are easily extended to multivariate outcomes of differing types via the imposition of a joint multivariate distribution on the random effects from each model. Additionally, they allow for direct inferences for the marginal characteristics of each outcome and the interpretation of parameters in the multivariate model as per the univariate model. Estimation of the multivariate GLMM is also straightforward using either maximum likelihood or Bayesian approaches. However, although theoretically any number of outcomes may be modelled, in practice these models are applied only in situations where there are relatively few outcomes. As the number of outcomes increases, the consequent increase in the dimensionality of the random effects becomes computationally prohibitive: in the case of maximum likelihood, we are required to numerically approximate the high-dimensional integral over the random effects, and under a Bayesian approach, the number of parameters to simultaneously sample becomes unreasonably large. Sampling from the multivariate posterior conditional distribution is also not trivial. Fieuws and Verbeke (2006) propose a solution to this high-dimensional problem, by separately maximizing the likelihood of each pairwise bivariate model instead of that of the full multivariate model. The resulting estimates may then be combined, and inference for these combined estimates is obtained via pseudo-likelihood theory.

This article explores an adaptation of this pairwise methodology to the Bayesian setting, whereby we combine the Markov chain Monte Carlo (MCMC) realizations obtained for each pairwise bivariate model for the relevant parameters. We assess the possible use of self-normalized importance sampling (SNIS) theory as described by MacKay (2003), in the re-weighting of each realization of the combined pairwise MCMC samples (in order to more closely approximate the full joint posterior distribution). These weights are given by the target distribution, divided by the proposal distribution, evaluated per MCMC realization of this proposal distribution. We compare the Bayesian pairwise and multivariate approaches in each of several simulation scenarios.

Further, we explore the incorporation of a time-to-event outcome. Mauff et al. (2020) detail a corrected two-stage approach for fitting a multivariate joint model for multiple longitudinal outcomes together with a survival outcome. They split the estimation procedure in two: fitting a multivariate GLMM in Stage I and using the output of this model to fit the survival submodel in Stage II, additionally updating the random effects in this second stage, and using importance sampling weights as a correction factor. While this procedure produces unbiased results and substantially reduces the time required to run the model, it remains limited in terms of the number of longitudinal outcomes that may be included, since it relies on the initial estimation of a multivariate GLMM in the first stage. We therefore further propose the use of the Bayesian pairwise estimation of the multivariate GLMM in Stage I.

The rest of the article is organized as follows. In Section 2, we introduce the standard GLMM for multivariate outcomes and discuss estimation of this model. Section 3 describes the Bayesian adaptation of the pairwise estimation technique. The results of a proof-of-concept (POC) simulation are presented in Section 3.1. We then discuss the details of SNIS as applicable in the pairwise approach in Section 3.2, and briefly detail the corrected two-stage joint model as per Mauff et al. (2020) in Section 4. Section 4.1 provides simulation results for the pairwise adaptation of the corrected two-stage joint model. Finally, in Section 5 we replicate the analysis previously performed by Mauff et al. (2020), now using the pairwise estimation technique, on six longitudinal outcomes from the Bio-SHiFT study; a prospective observational study on chronic heart failure (CHF) patients, aimed at determining whether or not disease progression could be assessed using longitudinal measurements of multiple blood biomarkers.

2 A model for multiple longitudinal outcomes

2.1 The generalized linear mixed effects model

Assuming repeated measurements over time, let $y_{ki}$ denote the $n_{ki}$ longitudinal response vector for the $i$ th subject for the $k$ th outcome, where $y_{kij}$ denotes the observation for the $i$ th subject for the $k$ th response at time point $t_{kij}$ , for $i = 1, \dots, N; j = 1, \dots, n_{ki}; k = 1, \dots, K$ . We then have the standard formulation for a GLMM for the conditional expectation of $y_{ki}$ given a vector of random effects $b_{ki}$ :

g_{k} [E {y_{k i} ∣ b_{k i}}] = η_{k i} = x_{k i}^{⊤} β_{k} + z_{k i}^{⊤} b_{k i},

(2.1)

where $g_{k} (\cdot)$ is a generic link function for the $k$ th outcome, and $x_{ki}$ and $z_{ki}$ are the time dependent design vectors for the fixed effects $β_{k}$ and the random effects $b_{ki}$ , respectively. The dimensionality and composition of these design vectors are allowed to differ between the outcomes. The random effects $b_{i}$ are assumed to follow a multivariate normal distribution with mean zero and unknown positive-definite variance–covariance matrix $D$ , as follows:

\begin{matrix} b_{i} = [\begin{matrix} b_{1 i} \\ b_{2 i} \\ ⋮ \\ b_{Ki} \end{matrix}] \sim MVN (0, D), \end{matrix}

where $b_{ki}$ is the vector of subject-specific random effects for the $k$ th outcome, which has length $p_{k}$ . The matrix $D$ therefore has dimension $\sum_{k = 1}^{K} p_{k} \times \sum_{k = 1}^{K} p_{k}$ .

(2.2) Estimation

The multivariate GLMM provides a very flexible framework for modelling multiple longitudinal outcomes. Both frequentist and Bayesian estimation methods make use of the full joint likelihood derived from the joint distribution of the multiple longitudinal outcomes. In the frequentist paradigm, estimation is based on maximization of this joint likelihood. This is theoretically straightforward despite challenges in the case of non-Gaussian outcomes, where closed-form expressions are not obtainable and analytical approximations to the integrand or methods based on numerical integration are required (Wolfinger and O'Connell (1993); Breslow and Clayton (1993) and Gueorguieva (1999) for the multivariate case, Hedeker and Gibbons (1994, 1996); Pinheiro and Bates (1995, 2000); Fahrmeir and Tutz (1994)). In practice however, the dimensionality of the variance–covariance matrix for the random effects increases substantially as the number of outcomes increases, necessitating the simultaneous estimation of an unfeasibly large number of parameters. Moreover, numerical integration in high dimensions remains very challenging, even for those methods which perform better for higher dimensional data e.g., Monte Carlo methods, and specifically Monte Carlo EM and Monte Carlo Newton Raphson methods (McCulloch, 1997; Booth and Hobert, 1999).

An attractive alternative is Bayesian estimation, using MCMC methods to sample from the posterior distribution of the parameters (Gelfand and Smith, 1990; Robert and Casella, 2004; Gelman et al., 2013; Liu, 2001). For the multivariate GLMM, the posterior distribution is derived under the assumption of full conditional independence, that is, we assume that the longitudinal outcomes $y_{ki}$ are independent of one another conditional on the random effects $b_{i}$ , and that the longitudinal responses of the $i$ th subject, $y_{kij}; j = 1, \dots, n_{ki}$ are independent conditional on the random effects for that subject ( $b_{ki}$ ), for every $k$ . Defining $Ψ$ as the parameter vector for all parameters excluding the random effects and variance–covariance parameters, the complete posterior distribution of the parameters is then given by:

p (Ψ, D, b ∣ y) \propto \prod_{i = 1}^{N} {\prod_{k = 1}^{K} \prod_{j = 1}^{n_{k i}} p (y_{k i j} ∣ b_{k i}, Ψ_{k}) p (Ψ_{k})} p (b_{i} ∣ D) p (D),

(2.2)

where $Ψ_{k}$ is the parameter vector for the $k$ th outcome (excluding the random effects and variance covariance parameters), and

p (y_{k i j} ∣ b_{k i}; Ψ_{k}, D) = \exp {[y_{k i j} ψ_{k i j} (b_{k i}) - c_{k} {ψ_{k \dot{i}} (b_{k i})}] / a_{k} (φ) - d_{k} (y_{k \ddot{i}}, φ)} .

(2.3)

$ψ_{kij} (b_{ki})$ and $ϕ$ denote the natural and dispersion parameters in the exponential family, respectively, and $c_{k} (\cdot)$ , $a_{k} (\cdot)$ and $d_{k} (\cdot)$ are known functions specifying the member of the exponential family.

Non-informative priors are typically used, specifically: independent univariate diffuse normal priors for the vector of fixed effects of each longitudinal submodel and an inverse-Gamma prior for the variance of the error terms, or alternatively, a half-Student's t prior with 3 degrees of freedom (when fitting a model with a normally distributed outcome). The covariance matrix of the random effects may be parametrized in terms of a correlation matrix $Ω$ and a vector of $σ_{d}$ , in which case, for the correlation matrix we use the LKJ-Correlation prior (Lewandowski et al., 2009; Brilleman et al., 2019) with parameter $ψ = 1.5$ , and a half-Student's t prior with 3 degrees of freedom for each element of $σ_{d}$ . Alternatively, we can specify an inverse Wishart prior for the covariance matrix of the random effects.

It is worth noting that in the Bayesian approach, the random effects are also considered model parameters for which we obtain a posterior sample. As a result, we are no longer required to solve the integral over the random effects. However, we are now required to (simultaneously) estimate an additional $N \times \sum_{k = 1}^{K} p_{k}$ parameters, where $p_{k}$ denotes the number of random effects included in the model for the $k$ th outcome. In addition, the issue of the high-dimensional covariance matrix for the random effects persists, and standard sampling techniques for MCMC tend to become less effective as dimensionality increases and the dimensions become more correlated with low rates of acceptance, slow-mixing and highly correlated samples (Guan and Haran, 2016; Girolami and Calderhead, 2011). Alternative sampling techniques have been suggested, for example, Stochastic Newton Sampling (SNS; Mahani et al., 2016), Hybrid or Hamiltonian Monte Carlo (HMC; Betancourt, 2018; Neal, 2011) and extensions thereof, such as Riemann Manifold HMC (Girolami and Calderhead, 2011) or the No-U-Turn sampler (NUTS; Hoffman and Gelman, 2014), which can be implemented via Stan (Stan Development Team, 2014). These techniques are more robust and efficient, requiring fewer samples, and should be faster for larger and more complex models (although this is dependent on a large number of factors; Monnahan et al., 2017). However, they also tend to require substantial tuning, and despite the gains in computational time may still be prohibitively slow in cases where the number of outcomes is very large.

3 The pairwise approach

Fieuws and Verbeke (2006) and Fieuws et al. (2007) propose a pairwise modelling approach as a potential solution to the computational problems encountered with the full multivariate model in the frequentist context. For $K$ outcomes, they suggest replacing the full model by fitting all $\frac{K \times (K - 1)}{2}$ pairwise bivariate models, that is, models for all pairs:

\begin{matrix} (y_{1}, y_{2}), (y_{1}, y_{3}), \dots, (y_{1}, y_{K}) \\ (y_{2}, y_{3}), \dots, (y_{2}, y_{K}) \\ ⋮ \\ (y_{K - 1}, y_{K}) \end{matrix}

(3.1)

separately maximizing the log-likelihood of each pair $r; s$ , for $r = 1, \dots, K - 1$ , $s = r + 1, \dots, K$ . For each pairwise model, there exists a vector of pair-specific parameters ${\tilde{θ}}_{r, s}$ . The vector $\tilde{θ}$ is then the stacked vector containing all pair-specific parameter vectors ${\tilde{θ}}_{r, s}$ . This vector is not equivalent to the vector of parameters $θ$ obtained from the full multivariate model for all responses $K$ . For certain elements in $θ$ , such as the covariance between random effects for the same outcome, there are $(K - 1)$ counterparts in vector $\tilde{θ}$ . In that case, a single estimate is obtained by averaging all of the corresponding pair-specific maximum likelihood estimates. Inference for these parameters is then obtained via general pseudo-likelihood theory.

This technique is easy to fit with standard software, and the resulting estimates are consistent with the maximum likelihood estimates from the full multivariate model (Fieuws and Verbeke, 2006). Inference is somewhat more complicated however, and it should be noted that the estimates obtained are valid only for data that are missing completely at random (and specific cases of missing at random), since missing data might be dependent on specific outcomes not included in every pairwise model, or on combinations thereof.

We propose an adaptation of the work by Fieuws and Verbeke (2006) to the Bayesian context, whereby we fit all of the $\frac{K \times (K - 1)}{2}$ pairwise bivariate models (models for all pairs resulting from $K$ longitudinal outcomes, as in 3.1), now using either MCMC or HMC, and obtain a combined sample from the product of the pairwise posterior distributions.

As is the case for Fieuws and Verbeke (2006), each outcome appears in $K - 1$ pairs; thus, for some of the parameters from the target multivariate distribution $p (Ψ, D, b ∣ y)$ in (2.2), for example, the fixed effect intercept for outcome $k$ , we have $(K - 1) \times M$ realizations from the proposal posterior distribution, where $K$ is the number of outcomes and $M$ is the number of MCMC samples for the corresponding parameter from each pairwise posterior distribution for a model in which outcome $k$ is included. For parameters specific to any individual pairwise bivariate model, we instead have $M$ realizations. For example, in a model for pair $(y_{1}, y_{2})$ in which random intercepts are included for both outcome 1 and outcome 2, the parameter for the covariance between those two random intercepts is specific to that model only, and will thus have only M realizations. Inference is then based on the $(K - 1) \times M$ or $M$ realizations as applicable.

Let $k = 1, \dots, K$ , $r = 1, \dots, K - 1$ , $s = r + 1, \dots, K$ and $q = r + I (r \geq k)$ , where $I (\cdot)$ is the indicator function. We now have ${\tilde{Ψ}}_{r, s}$ , which is the vector of all pair-specific parameters from the model for pair $r; s$ , excluding the random effects and the variance–covariance parameters for the random effects. The vector of random effects for pair $r; s$ is given by ${\tilde{b}}_{r, s}$ , and ${\tilde{D}}_{r, s}$ is the pair-specific variance–covariance matrix from the model for pair $r; s$ . We noted in Equation (2.1) that the dimensionality and composition of the design vectors $x_{ki}$ and $z_{ki}$ are allowed to differ between outcomes. The length/dimensionality of ${\tilde{Ψ}}_{r, s}$ , ${\tilde{b}}_{r, s}$ and ${\tilde{D}}_{r, s}$ are therefore subject to the nature of the outcomes included in the model for pair $r; s$ , and on the fixed and random effect structure for those outcomes. We further introduce the notation ${\tilde{Ψ}}_{k_k q}$ and ${\tilde{b}}_{k_k q}$ , to denote outcome-pair-specific parameter vectors (subsets of ${\tilde{Ψ}}_{r, s}$ and ${\tilde{b}}_{r, s}$ ), that is, parameters relating to the $k$ th outcome from the model for a particular pair which includes that outcome (for which the dimensions are once again outcome dependent). The subindex $q$ relates the specific outcome $k$ to the pair whence it comes. For example, in the case of 3 outcomes, there are two outcome-pair-specific parameter vectors for outcome 1: ${\tilde{Ψ}}_{1_12}$ and ${\tilde{Ψ}}_{1_13}$ . These result from ${\tilde{Ψ}}_{k_k q}$ as follows: $k = 1$ , $r = 1, 2$ , and $q = r + I (r \geq k) = 2, r + I (r \geq k) = 3$ .

We then obtain a combined sample ${{\tilde{Ψ}}^{(m)}, {\tilde{D}}^{(m)}, {\tilde{b}}^{(m)}, m = 1, \dots, M}$ of size M, from the posterior:

p (\tilde{Ψ}, \tilde{D}, \tilde{b} ∣ y) \propto \prod_{r = 1}^{K - 1} {\prod_{k = 1}^{K} p (y_{k} ∣ {\tilde{b}}_{k_k q}, {\tilde{Ψ}}_{k - k q}) p ({\tilde{Ψ}}_{k_k q})} \times \prod_{s = r + 1}^{K} p ({\tilde{b}}_{r, s} ∣ {\tilde{D}}_{r, s}) p ({\tilde{D}}_{r, s}) .

Note that throughout the article, the superscript $(m)$ denotes a realization from the set of $M$ realizations in the MCMC/HMC sample.

For example, where $K = 3$ , we have:

\begin{matrix} p (\tilde{Ψ}, \tilde{D}, \tilde{b} ∣ y) \propto {p (y_{1} ∣ {\tilde{b}}_{1_12}, {\tilde{Ψ}}_{1_12}) p ({\tilde{Ψ}}_{1_12}) \times p (y_{2} ∣ {\tilde{b}}_{2_21}, {\tilde{Ψ}}_{2_21}) p ({\tilde{Ψ}}_{2_21}) \\ \times p (y_{3} ∣ \underset{parameters relating to outcome 3, from the model for outcomes 1 and 3}{\underset{⏟}{{\tilde{b}}_{3_31}, {\tilde{Ψ}}_{3_31})}} p ({\tilde{Ψ}}_{3_31}) \\ \times {p (y_{1} ∣ {\tilde{b}}_{1_13}, {\tilde{Ψ}}_{1_13}) p ({\tilde{Ψ}}_{1_31}) \times p (y_{2} ∣ {\tilde{b}}_{2_23}, {\tilde{Ψ}}_{2_23}) p ({\tilde{Ψ}}_{2_23}) \\ \times p (y_{3} ∣ \underset{parameters relating to outcome 3, from the model for outcomes 2 and 3}{\underset{⏟}{{\tilde{b}}_{3_32}, {\tilde{Ψ}}_{3_32})}} p ({\tilde{Ψ}}_{3_32})} \\ \times p ({\tilde{b}}_{1, 2} ∣ {\tilde{D}}_{1, 2}) p ({\tilde{D}}_{1, 2}) p ({\tilde{b}}_{1, 3} ∣ {\tilde{D}}_{1, 3}) p ({\tilde{D}}_{1, 3}) p ({\tilde{b}}_{2, 3} ∣ {\tilde{D}}_{2, 3}) p ({\tilde{D}}_{2, 3}) . \end{matrix}

3.1 Proof-of-concept simulation study (Scenario A)

In order to evaluate the performance of the pairwise approach, we performed a POC simulation study, comparing the finite sample properties of the pairwise and full multivariate approaches. In this setting, we simulated 300 subjects with a maximum of 9 repeated measurements per subject in 200 simulated datasets. We included $K = 3$ continuous longitudinal outcomes. Time was assumed to be linear in both the fixed and random effects, and each outcome therefore had the form:

\begin{matrix} y_{ki} (t) = η_{ki} (t) + ε_{ki} (t) = β_{k 0} + b_{ki 0} + (β_{k 1} + b_{ki 1}) \times time + ε_{ki} (t), \end{matrix}

Table 1
Eigenvalues, trace, and determinant of the posterior mean variance–covariance matrix for each approach in the POC simulation (Scenario A)

Approach Eigenvalues Trace Determinant

Multivariate (10.30, 5.04, 2.91, 2.61, 0.80, 0.66) 22.31 5.34

Pairwise (10.23, 5.02, 2.91, 2.60, 0.80, 0.65) 22.21 5.31

Approach	Eigenvalues	Trace	Determinant
Multivariate	(10.30, 5.04, 2.91, 2.61, 0.80, 0.66)	22.31	5.34
Pairwise	(10.23, 5.02, 2.91, 2.60, 0.80, 0.65)	22.21	5.31

with $ε_{ki} (t) \sim N (0, σ_{k}^{2})$ , $b_{i} = (b_{1 i}^{T}, b_{2 i}^{T}, b_{3 i}^{T})^{T} \sim N (0, D_{6 \times 6})$ , and $b_{ki} = (b_{ki 0}, b_{ki 1})^{T}$ for $k = 1, \dots, 3$ . Further details are available in the supplementary material.

Per simulation, we obtained the sample marginal posterior distributions for each parameter using the multivariate and pairwise approaches, and calculated the posterior mean and 95% credible intervals. For each parameter, for each approach, we then assessed the relative bias (ratio) and coverage of the 95% credibile interval over all simulations, together with the root mean squared error (RMSE), and the relative efficiency, where the relative efficiency is defined as the ratio of the mean squared error of the multivariate and the pairwise approaches. The main concern lies with the reconstruction of the variance–covariance matrix for the random effects, since in the pairwise approach estimation proceeds independently for each constituent part of the full multivariate matrix. Figure 1 details the results for all parameters for the 200 simulated datasets. We see that the results from the two approaches are nearly identical, with minimal bias for the majority of parameters. Coverage was good, and there was no loss of efficiency. Based on these results, the performance of the pairwise approach is almost identical to that of the multivariate. We also obtain very similar eigenvalues, trace and determinant values for the variance–covariance matrix (Table 1).

3.2 Self-normalised importance sampling

The results thus far are not unexpected, given those obtained by Fieuws and Verbeke (2006). However, despite similarities seen in practice, the multivariate and pairwise approaches remain theoretically distinct. SNIS (MacKay, 2003; Owen, 2013) offers a potential solution. Importance sampling (IS) tells us that given a random variable $X$ with pdf $f (x)$ and some integrable function $h (X)$ :

\begin{matrix} E_{f} (h (X)) = \int f (x) h (x) dx = \int g (x) \frac{f (x)}{g (x)} h (x) dx \\ = \int g (x) ϖ (x) h (x) dx = E_{g} (ϖ (X) h (X)) . \end{matrix}

Figure 1:

Relative bias, RMSE and coverage of the 95% credibile intervals for (A) fixed effect and residual variance parameters and (B) variance–covariance parameters in POC simulation scenario (Scenario A). Ideal values indicated by the vertical dashed lines are 0.95 for coverage, 1 for relative bias and 0 for RMSE. Variance–covariance parameters are represented by row and column position

Then the importance sampling estimate ${\hat{μ}}_{g}$ is given by:

\begin{matrix} {\hat{μ}}_{g} = \frac{1}{n} \sum_{i = 1}^{n} ϖ (X_{i}) h (X_{i}) for X_{i} \sim g, and \\ lim_{n \to + \infty} {\hat{μ}}_{g} = E_{g} (ϖ (X) h (X)), \end{matrix}

which implies that

\begin{matrix} lim_{n \to + \infty} {\hat{μ}}_{g} = E_{f} (h (X)) . \end{matrix}

We may only know $f (x)$ up to some normalizing constant $c$ , $f_{u} (x) = c f (x)$ . The same may be true of $g (x)$ for some normalizing constant $b$ , $g_{u} (x) = bg (x)$ . In that case, we can compute the ratio $ϖ_{u} (x) = f_{u} (x) / g_{u} (x) = (c / b) f (x) / g (x)$ , and use the SNIS estimate:

\begin{matrix} \hat{μ} = \frac{\sum_{i = 1}^{n} ϖ_{u} (X_{i}) h (X_{i})}{\sum_{i = 1}^{n} ϖ_{u} (X_{i})} . \end{matrix}

The $(c / b)$ then cancels, and hence we can use $ϖ (\cdot)$ in the above equation. Thus, in situations where the distribution of interest, $f (X)$ , is intractable and difficult to sample, this technique allows for the use of a proposal distribution $g (X)$ , re-weighted such that it more closely approximates a sample obtained from the target distribution. We then define the normalized weights:

\begin{matrix} {\overset{̅}{w}}^{(m)} = \frac{ϖ^{(m)}}{\sum_{m} ϖ^{(m)}}, \end{matrix}

where $ϖ^{(m)}$ is the target distribution, here the distribution defined in Equation 2.2, divided by the proposal distribution (given by Equation 3), evaluated for each iteration $m = 1, \dots, M$ of the MCMC sample resulting from the proposal distribution. We therefore have the theoretical expression:

\begin{matrix} ϖ = \frac{\prod_{k = 1}^{K} p (y_{k} ∣ b_{k}, Ψ_{k}) p (Ψ_{k}) p (b ∣ D) p (D)}{\prod_{r = 1}^{K - 1} \{\prod_{k = 1}^{K} p (y_{k} ∣ {\tilde{b}}_{k_k q}, {\tilde{Ψ}}_{k_k q}) p ({\tilde{Ψ}}_{k_k q})\} \prod_{s = r + 1}^{K} p ({\tilde{b}}_{r, s} ∣ {\tilde{D}}_{r, s}) p ({\tilde{D}}_{r, s})}, \end{matrix}

for $k = 1, \dots, K$ , $r = 1, \dots, K - 1$ , $s = r + 1, \dots, K$ , and $q = r + I (r \geq k)$ , where $I (\cdot)$ is the indicator function. Note that in the evaluation of the above expression, the numerator must be evaluated for each of the $(K - 1)$ outcome-pair-specific realizations (per iteration) resulting from the MCMC sample for the proposal distribution in the denominator. This has specific implications for the evaluation of the distribution for $p (b ∣ D)$ .

Recall that in the numerator, $p (b ∣ D) = p (b_{1}, \dots, b_{K - 1}, b_{K} ∣ D)$ , and that $D$ has dimension $\sum_{k = 1}^{K} p_{k} \times \sum_{k = 1}^{K} p_{k}$ , where $p_{k}$ is the number of individual random effects included in the model for outcome $k$ . Recall also, that for each $b_{k}$ in $p (b_{1}, \dots, b_{K - 1}, b_{K} ∣ D)$ , we have $(K - 1)$ outcome-pair-specific realizations $b_{k_k q}^{(m)}$ per iteration $m$ . There are then $(K - 1)!^{K}$ unique combinations of $p (b)$ , from which we could choose any 1 as the first combination, following which the choice of the 2nd and 3rd (up to $K - 1$ ) is restricted, in that we may not select the same $b_{k_k q}^{(m)}$ more than once for each $b_{k}$ (per iteration). To illustrate, in the case where we have $K = 3$ outcomes, from the set of 8 possible combinations, choosing, for example, either (ignoring conditioning)

\begin{matrix} p ({\tilde{b}}_{1_12}^{(m)}, {\tilde{b}}_{2_21}^{(m)}, {\tilde{b}}_{3_31}^{(m)}) or p ({\tilde{b}}_{1_13}^{(m)}, {\tilde{b}}_{2_23}^{(m)}, {\tilde{b}}_{3_31}^{(m)}) \end{matrix}

as the first set, necessitates the choice of

\begin{matrix} p ({\tilde{b}}_{1_13}^{(m)}, {\tilde{b}}_{2_23}^{(m)}, {\tilde{b}}_{3_32}^{(m)}) or p ({\tilde{b}}_{1_12}^{(m)}, {\tilde{b}}_{2_21}^{(m)}, {\tilde{b}}_{3_32}^{(m)}), \end{matrix}

as the second set respectively. Defining $A$ as the set of all possible combinations, with size $(K - 1)!^{K}$ , and then $A_{K - 1} = {a \subset A ∣ | a | = K - 1}$ as the subset of $A$ with cardinality $K - 1$ with the above restriction, we have:

\begin{matrix} ϖ^{(m)} = \frac{\prod_{r = 1}^{K - 1} \{\prod_{k = 1}^{K} p (y_{k} ∣ b_{k_k q}^{(m)}, Ψ_{k_k q}^{(m)}) p (Ψ_{k_k q}^{(m)})\} \prod_{l \in A_{K - 1}} p (b_{l}^{(m)} ∣ D_{l}^{(m)}) p (D_{l}^{(m)})}{\prod_{r = 1}^{K - 1} \{\prod_{k = 1}^{K} p (y_{k} ∣ {\tilde{b}}_{k_k q}^{(m)}, {\tilde{Ψ}}_{k_k q}^{(m)}) p ({\tilde{Ψ}}_{k_k q}^{(m)})\} \prod_{s = r + 1}^{K} p ({\tilde{b}}_{r, s}^{(m)} ∣ {\tilde{D}}_{r, s}^{(m)}) p ({\tilde{D}}_{r, s}^{(m)})} \end{matrix}

which simplifies to:

\begin{matrix} ϖ^{(m)} = \frac{\prod_{l \in A_{K - 1}} p (b_{l}^{(m)} ∣ D_{l}^{(m)}) p (D_{l}^{(m)})}{\prod_{s = r + 1}^{K} p ({\tilde{b}}_{r, s}^{(m)} ∣ {\tilde{D}}_{r, s}^{(m)}) p ({\tilde{D}}_{r, s}^{(m)})} . \end{matrix}

Where outcomes are related however, per iteration $m$ of the MCMC sample of the proposal distribution, recreation of the full positive-definite variance–covariance matrix $D_{l}^{(m)}$ required for $p (b_{l}^{(m)} ∣ D_{l}^{(m)})$ is not possible. For example, in the case where $K = 3$ , and where we choose $p ({\tilde{b}}_{1_12}^{(m)}, {\tilde{b}}_{2_21}^{(m)}, {\tilde{b}}_{3_31}^{(m)})$ as the first of 2 combinations for $p (b)$ for a given iteration $m$ , we cannot obtain the covariance for the random effects between outcomes 2 and 3: $b_{2}$ is obtained here from the model for outcomes 1 and 2, and $b_{3}$ from the model for outcomes 1 and 3.

We therefore investigate an alternative approach to obtain the variance–covariance matrix $D_{l}^{(m)}$ . Per iteration of the MCMC sample of the proposal distribution, for each bivariate model-specific variance–covariance matrix, we obtain the inverse (precision matrix), given by ${\tilde{D}}_{r, s}^{- 1 (m)}$ . Using the Cholesky decomposition, we decompose it to give the lower triangular ${\tilde{L}}_{r, s}^{(m)}$ , such that

{\tilde{D}}_{r, s}^{- 1 (m)} = {\tilde{L}}_{r, s}^{(m)} {\tilde{L}}_{r, s}^{T (m)},

where ${\tilde{L}}_{r, s}^{(m)}$ is a lower triangular matrix with real and positive diagonal entries, and ${\tilde{L}}_{r, s}^{T (m)}$ denotes the conjugate transpose of ${\tilde{L}}_{r, s}^{(m)}$ . The matrix ${\tilde{L}}_{r, s}^{(m)}$ is no longer subject to the same constraints as the variance–covariance matrix (or its inverse). Where multiple estimates exist for the elements of $D$ , that is, for the (block) diagonal elements, we are able to average the corresponding $(K - 1)$ realizations from the ${\tilde{L}}_{r, s}^{(m)}$ matrices for each pair $r; s$ per iteration. Using these lower triangular matrices, (and where necessary, the averaged elements thereof) we create a ‘combined’ lower triangular matrix, with dimension $\sum_{k = 1}^{K} p_{k} \times \sum_{k = 1}^{K} p_{k}$ , as in the multivariate case. We then back-transform to obtain the matrix ${\tilde{H}}^{- 1 (m)}$ , and solve for a positive-definite variance–covariance matrix ${\tilde{H}}^{(m)}$ . The equation for the weights then becomes

\begin{matrix} σ^{(m)} = \frac{\prod_{l ε A_{K - 1}} p ({\tilde{b}}_{l}^{(m)} ∣ {\tilde{H}}^{(m)}) p {({\tilde{H}}^{(m)})}^{K - 1}}{\prod_{r = 1}^{K - 1} \prod_{r = 1 s = r + 1}^{K} p ({\tilde{b}}_{r, s}^{(m)} ∣ {\tilde{D}}_{r, s}^{(m)}) p ({\tilde{D}}_{r, s}^{(m)})} \end{matrix}

(3.2)

for $r = 1, \dots, K - 1$ , $s = r + 1, \dots, K$ , and $l \in A_{K - 1}$ .

3.2.1 Proof-of-concept simulation study with importance sampling weights

As an informal check of the validity of (3.2), we evaluate it using the POC simulation setting, now specifying the outcomes as independent. In this setting, ${\tilde{H}}^{(m)}$ is not strictly required: We are able to further factorize the variance–covariance matrix $D_{l}^{(m)}$ into its component block-diagonals (and therefore obtain $p (b_{l}^{(m)} ∣ D_{l}^{(m)})$ ). Using either $D_{l}^{(m)}$ or ${\tilde{H}}^{(m)}$ however, the numerator and denominator are equivalent in this setting, and per simulation, we obtain, as expected:

\begin{matrix} {\overset{̅}{w}}^{(m)} = \frac{ϖ^{(m)}}{\sum_{m} ϖ^{(m)}} = \frac{1}{M} = 0.001, for M = 1000 . \end{matrix}

Using this approach for the three related outcomes in the POC simulation study, we are able to obtain a positive-definite matrix ${\tilde{H}}^{(m)}$ per iteration of the proposal distribution, per simulation. As expected, for each element of the $\tilde{H}$ matrix, the sample marginal distributions are overly peaked and have shorter tails (supplementary material, Section C), but element-wise the posterior mean of this matrix over all simulations is reasonably similar to that obtained using the multivariate approach.

Despite the reasonable performance of the matrix $\tilde{H}$ , the weights obtained as per (3.2) are highly variable. The importance sampling estimates for the various parameters are therefore dominated by a few realizations, corresponding to the few largest weights. While this may be due to the disconnect between the reconstruction of the multivariate distribution for the random effects and the matrix that the distribution conditions on, the instability of importance sampling (and thus SNIS) is a common issue in high-dimensional cases (MacKay, 2003; Ionides, 2008). The stability and imbalance may be improved by directly altering the computed weights, using

\begin{matrix} ϖ^{(m)} = min {ϖ^{(m)}, \sqrt{M} \times {\overset{̅}{ϖ}}^{(m)}}, \end{matrix}

Figure 2:

Relative bias, RMSE and coverage of the 95% credibile intervals for (A) fixed effect and residual variance parameters and (B) variance–covariance parameters in POC simulation scenario (Scenario A) with SNIS. Ideal values indicated by the vertical dashed lines are 0.95 for coverage, 1 for relative bias and 0 for RMSE. Variance–covariance parameters represented by row and column position

where ${\overset{̅}{ϖ}}^{(m)}$ is the average of the original $M$ weights (Ionides, 2008). Implementing this truncated weight, the imbalance improves slightly, but not sufficiently so. Figure 2 allows us to compare the results for the multivariate, unweighted and weighted pairwise approaches. We see that the posterior means obtained using the weighted and unweighted approaches are very similar for the majority of parameters (which is not unexpected, given that the estimates from the multivariate and unweighted pairwise approaches are so close already). The coverage of the 95% credible intervals for the weighted approach is however substantially lower than that shown for the unweighted, which speaks to the imbalance of the weights. Several unfortunately unsuccessful attempts were made to further improve the weights, including, but not limited to, the resampling of the random effects.

3.3 Further simulations

Given the instability of the weights, and the similarity of the results for the multivariate and unweighted pairwise approaches, we have elected to proceed with the unweighted pairwise approach. While the posterior distributions of the two approaches remain theoretically dissimilar, we believe that there is sufficient promise to the pairwise methodology to warrant further investigation of its performance.

We would expect this methodology to work well when the data is close to normal, since the normal distribution may be entirely described using only the first two moments. This is not necessarily true for other distributions however. We therefore performed two additional, more complex simulation studies, aimed at exploring how well the method copes with mixed outcomes (Scenario B), or settings in which there are no continuous outcomes, and a substantially reduced number of repeated measurements per subject (Scenario C). The results indicate that we are able to use the pairwise approach with minimal bias and loss of efficiency. Full details for both scenarios are available in Section A of the supplementary material.

Since we are able to run the pairwise models in parallel, there is a substantial decrease in the time taken to run the full multivariate model, and additionally, the pairwise approach enables us to obtain results for scenarios wherein the number of outcomes fully prohibits the use of the multivariate approach using standard software. The Bayesian adaptation of the pairwise approach also simplifies inference substantially.

4 The multivariate joint model for longitudinal and time-to-event outcomes

In the formulation of a multivariate joint model for multiple longitudinal outcomes and a time-to-event outcome, a GLMM is postulated as per (2.1) to accommodate the multivariate longitudinal outcomes. For the survival process, we denote by $T_{i}^{*}$ the true event time for the $i$ th subject, and to accomodate different types of censoring, we introduce $T_{i}$ and $T_{i}^{U}$ for the observed event times, and $δ \in {0, 1, 2, 3}$ , which denotes the event indicator, with $0$ corresponding to right censoring ( $T_{i}^{*} > T_{i}$ ), $1$ to a true event time ( $T_{i}^{*} = T_{i}$ ), $2$ to left censoring ( $T_{i}^{*} < T_{i}$ ), and $3$ to interval censoring ( $T_{i} < T_{i}^{*} < T_{i}^{U}$ ). We assume that the risk for an event depends on a function of the subject-specific linear predictor $η_{i} (t)$ and/or the random effects. More specifically, we have

\begin{matrix} h_{i} (t ∣ H_{i} (t), w_{i} (t)) = \frac{\lim_{Δ t \to 0} \Pr {t \leq T_{i}^{*} < t + Δ t ∣ T_{i}^{*} \geq t, H_{i} (t), w_{i} (t)}}{Δ t}, t > 0 \\ = h_{0} (t) \exp [γ^{⊤} w_{i} (t) + \sum_{k = 1}^{K} \sum_{l = 1}^{L_{k}} f_{k l} {H_{k i} (t), w_{i} (t), b_{k i}, α_{k l}}] \end{matrix}

(4.1)

where $H_{ki} (t) = {η_{ki} (s), 0 \leq s < t}$ denotes the history of the underlying longitudinal process up to $t$ for subject $i$ and outcome $k$ , $h_{0} (\cdot)$ denotes the baseline hazard function, and $w_{i} (t)$ is a vector of exogenous, possibly time-varying, covariates with corresponding regression coefficients $γ$ . Functions $f_{kl} (\cdot)$ , parameterized by vector $α_{kl}$ , specify which components/features of each longitudinal outcome are included in the linear predictor of the relative risk model.

As noted in the introduction, Mauff et al. (2020) proposed an adaptation of a two-stage estimation procedure for a multivariate joint model for multiple longitudinal outcomes and a time-to-event outcome. They split the estimation procedure in two parts. In Stage I, they fit a multivariate GLMM for the longitudinal outcomes using either MCMC or HMC, and a sample of size M is obtained from the posterior:

p (θ_{y}, b ∣ y) \propto \prod_{i = 1}^{N} {\prod_{k = 1}^{K} \prod_{j = 1}^{n_{k i}} p (y_{k i j} ∣ b_{k i}, θ)} p (b_{i} ∣ θ) p (θ_{y}),

(4.2)

where $p (θ_{y})$ denotes the subset of the parameters that are included in the definition of the longitudinal submodels (including those in the random effects distribution). Using the sample from this first stage, a sample is obtained for the parameters of the survival submodel. Importantly, both the parameters of the survival submodel $θ_{t}$ and the random effects $b$ are updated, that is, a sample is obtained for $θ_{t}$ from the joint posterior distribution given by:

p (θ_{t}, b ∣ \tilde{T}, δ, y, θ_{y}^{(m)}) \propto \prod_{i = 1}^{n} \prod_{k = 1}^{K} \prod_{j = 1}^{n_{k i}} p (y_{k i j} ∣ b_{k i}, θ_{y}^{(m)}) p (b_{i} ∣ θ_{y}^{(m)}) p ({\tilde{T}}_{i}, δ_{i} ∣ θ_{t}, b_{i}, θ_{y}^{(m)}) p (θ_{t}),

(4.3)

where $θ_{t}$ denotes the subset of the parameters that are included in the definition of the survival submodel, and $\tilde{T} = (T; T^{U})$ . They then use importance sampling weights defined by:

\begin{matrix} ϖ^{(m)} = \frac{p (θ_{t}^{(m)}, b^{(m)} ∣ \tilde{T}, δ, y, θ_{y}^{(m)}) p (θ_{y}^{(m)} ∣ y, \tilde{T}, δ)}{p (θ_{t}^{(m)}, b^{(m)} ∣ \tilde{T}, δ, y, θ_{y}^{(m)}) p (θ_{y}^{(m)}, b^{(m)} ∣ y)}, \end{matrix}

where the numerator is the posterior distribution of the multivariate joint model, and the denominator is the corresponding posterior distributions from each of the two stages. This becomes

ϖ^{(m)} = \frac{\prod_{i} \iint p (y_{i} ∣ b_{i}, θ_{y}^{(m)}) p ({\tilde{T}}_{i}, δ_{i} ∣ b_{i}, θ_{y}^{(m)}, θ_{t}) p (b_{i} ∣ θ_{y}^{(m)}) p (θ_{t}) d b_{i} d θ_{t}}{\prod_{}_{i} p (y_{i} ∣ b_{i}^{(m)}, θ_{y}^{(m)}) p (b_{i}^{(m)} ∣ θ_{y}^{(m)})} .

(4.4)

Further details may be found in the original article.

This method of estimation for the multivariate joint model is many orders of magnitude faster than the standard joint estimation. However, it remains bottlenecked by the necessity of fitting the full multivariate GLMM in the first stage. Despite the increase in speed, this method may therefore not accommodate situations where the number of longitudinal outcomes is very large. We are thus interested in whether or not the multivariate GLMM in Stage I may instead be estimated using the Bayesian pairwise approach outlined in this article.

The results obtained thus far using the pairwise approach are promising, even without the application of the SNIS correction. However, given that we are as yet unable to obtain a theoretically correct posterior distribution, we investigate its usefulness in the context of the joint model in a purely exploratory manner. We therefore do not adapt the entire corrected two-stage procedure for the joint model to incorporate the posterior distribution detailed in 3, and instead proceed as follows: per iteration $m$ of the sample obtained using the pairwise approach, we randomly select 1 of the $K - 1$ models for each outcome, and use only the realizations resulting from those models for the parameters of interest. For example, in the case of $K = 3$ outcomes, for any iteration $m$ , we could use realizations for parameters relating to outcome 1 and outcome 2 from the model for outcomes 1 and 2, and use realizations for parameters relating to outcome 3 from the model for outcomes 1 and 3, or from the model for outcomes 2 and 3. Since this approach will not work for the variance–covariance matrix of the random effects, we use the matrix ${\tilde{H}}^{(m)}$ , as previously defined. We then use this adapted sample to replace that from the multivariate GLMM in Stage 1 of the corrected two-stage approach.

4.1 Simulation Scenarios D and E

Simulation Scenarios D and E include 3 and 6 longitudinal outcomes, respectively, together with a survival outcome. For the survival outcome, adjusting for a binary group allocation, we used

\begin{matrix} h_{i} (t) = h_{0} (t) exp [γ_{1} \times group + \sum_{k = 1}^{K} α_{k} η_{ki} (t)] \end{matrix}

where

\begin{matrix} η_{ki} (t) = β_{k 0} + b_{ki 0} + (β_{k 1} + b_{ki 1}) \times time \end{matrix}

for each outcome $k$ . Further details are available in Section A of the supplementary material. In each scenario, we compared the finite sample properties of the corrected two-stage multivariate joint model, using both the multivariate GLMM and the pairwise approach in the first stage. We did this both with and without the use of the importance sampling weights defined for the joint model in Equation (4.4) (although updating the random effects in the second stage each time). With three longitudinal outcomes (Scenario D), bias is minimal for the majority of parameters, and RMSE is low. However, where the importance sampling weights for the joint model are applied, the coverage of the 95% credible intervals is poor for both the multivariate and pairwise approaches. This is once again due to large variability in the weights. Similar results are obtained for Scenario E; however, in this scenario, the variance–covariance parameters resulting from the pairwise approach are substantially different to those obtained using the multivariate approach. Prior to fitting the joint model, the matrix $\tilde{H}$ was markedly different from the original multivariate variance–covariance matrix, possibly due to the small scale of the variance–covariance matrix used to simulate the data. As a result, we had very poor convergence for these parameters in the estimation of the joint model using the pairwise approach in the first stage.

Figure 3:

Longitudinal evolution of continuous biomarkers (log base 2) for a randomly selected subset of individuals from the Bio-SHiFT cohort study that did/did not experience the primary event of interest

5 Bio-SHiFT study

In Mauff et al. (2020), the authors presented an analysis of the Bio-SHiFT study. The analysis focused on the association between the composite primary event of interest (consisting of hospitalization for heart failure, cardiac death, LVAD placement and heart transplantation) and each of six longitudinal biomarkers: Cystatin C (CysC), urinary N-acetyl-beta-D-glucosaminidase (NAG), kidney-injury-molecule (KIM)-1, N-terminal proBNP (NT-proBNP), cardiac troponin T (HsTNT) and C-reactive protein (CRP) (Figure 3).

As detailed in Mauff et al. (2020), for each of CysC, NAG and KIM-1 ( $k = 1, 2, 3$ ), we fit

\begin{matrix} y_{ki} (t) = η_{ki} (t) + ε_{ki} (t) = β_{k 0} + b_{ki 0} + (β_{k 1} + b_{ki 1}) \times time + ε_{ki} (t) . \end{matrix}

For the remaining outcomes ( $k = 4, 5, 6$ ), we have

\begin{matrix} y_{ki} (t) = η_{ki} (t) + ε_{ki} (t) = (β_{k 0} + b_{ki 0}) + \sum_{p} (β_{kp} + b_{kip}) B_{kn} (time, λ_{p}) + ε_{ki} (t), \end{matrix}

where $ε_{ki} (t) \sim N (0, σ_{k}^{2})$ , and $B_{kn} (time; λ_{p})$ denotes the B-spline basis matrix for a natural cubic spline of time with two internal knots placed at the 25th and 75th percentiles of the follow-up times for NT-probnp ( $p = 1, 2, 3$ ), and one internal knot placed at the 50th percentile of the follow up times for each of HsTNT and CRP ( $p = 1, 2$ ). Boundary knots were set at the 5th and 95th percentiles. We assume a multivariate normal distribution for the random effects, $b_{i} = (b_{1 i}^{T}, b_{2 i}^{T}, \dots, b_{6 i}^{T})^{T} \sim MVN (0, D)$ , where $D$ is a $16 \times 16$ unstructured variance covariance matrix. For the survival process, we fit the below model, using the global–local ridgetype shrinkage prior described in the previous article:

\begin{matrix} h_{i} (t) = h_{0} (t) exp [γ^{⊤} w_{i} (t) + \sum_{k = 1}^{K} α_{k} η_{ki} (t)], \end{matrix}

and included the baseline variables: age, sex, NYHA class (class III/IV vs. class I/II), use of diuretics, presence or absence of ischemic heart disease (IHD), diabetes mellitus, BMI and the estimated glomerular filtration rate (eGFR) value.

We compared the results obtained for this multivariate joint model using the corrected two-stage approach with the full multivariate GLMM, to those obtained using the pairwise approach as described, with and without the importance sampling weights for the joint model (once again updating the random effects in both cases). Full results are available in the supplementary material. As before, we are unable to obtain 95% credible intervals for the weighted results. We therefore limit our comparisons to the unweighted case: once again, we see very similar results regardless of which approach is used. However, as in simulation Scenario E, the variance–covariance parameters for the two approaches differ quite substantially. These parameters did not converge for the joint model when using the pairwise approach.

6 Discussion

In this article, we presented a Bayesian adaptation of the pairwise approach introduced by Fieuws and Verbeke (2006). We demonstrated that our approach works satisfactorily in the case of multiple longitudinal outcomes, with minimal bias, RMSE and good coverage even in scenarios where we have mixed or non-continuous outcomes. The Bayesian version is easy to implement, and inference is simpler than in the frequentist case. The use of this technique allows for faster computational times, since we are able to run the pairwise models in parallel. Moreover, it allows for the fitting of models in scenarios where the number of outcomes is so many as to entirely prohibit the use of the multivariate approach. We also explored the use of importance sampling weights in an attempt to obtain a theoretically proper multivariate posterior distribution. This was not successful: weights were obtained after several modifications but suffered from instability in the simulation settings presented here.

Incorporation of a time-to-event outcome was demonstrated using the corrected two-stage approach introduced in Mauff et al. (2020). The pairwise approach appears to work satisfactorily in the joint model context in only one of three analyses presented here. Comparing the unweighted results for the pairwise and multivariate approaches for simulation Scenario D, we found that they were very similar, with minimal bias, RMSE and good coverage of the 95% credible intervals for the majority of parameters, including the variance–covariance matrix of the random effects. However, the weights calculated for the joint model were once again unstable, which resulted in poor coverage. Interestingly, this occurred for both the multivariate and pairwise approaches. We anticipated potentially inferior results using the pairwise approach in the weighted case, since the formulation of these weights assumes a proper multivariate posterior distribution for the longitudinal submodel, which we do not have. However, the pairwise approach does not appear to have performed any worse than the multivariate. It is worth noting that the weights (while theoretically important) provide little in the way of meaningful practical change to the results for the corrected two-stage joint model, provided that the random effects are updated in the second stage.

In settings with a time-to-event outcome and a larger number of longitudinal outcomes however (simulation Scenario E and the analysis of the Bio-SHiFT data), comparing the unweighted results for the two approaches quite clearly demonstrates very large biases and RMSE for the variance–covariance parameters obtained using the pairwise approach. In these settings, the matrix $\tilde{H}$ resulting from the Cholesky decomposition process described in the article was substantially different from the original multivariate variance–covariance matrix. We therefore saw very poor convergence for these parameters in the estimation of the joint model using the pairwise approach.

The large differences observed for these variance–covariance matrices may be due to issues in the construction of the matrix. Alternatively, the scale of the matrix prior to Cholesky decomposition may play a role. It could also be that the particular simulation settings chosen here were problematic: simulation of joint models with multiple longitudinal outcomes is challenging, especially the choice of values for the variance–covariance matrix. Interestingly, the remaining parameters of the longitudinal and survival submodels seem to be fairly robust to changes in the variance–covariance matrix, and consequently, the results for most of the remaining parameters from both the longitudinal and survival submodels were once again very similar for the two approaches.

A further issue of note is the instability of the importance sampling weights for both the multiple longitudinal outcomes case and the corrected two-stage joint model. The weights have previously been shown to work in the context of the joint model (Mauff et al., 2020). Additional exploration of the sensitivity of these weights is therefore warranted, as is further work on the incorporation of time-to-event outcomes when using the pairwise approach.

Footnotes

Supplementary materials

All relevant supplementary material and code referred to herein may be found through the link: http://www.statmod.org/smij/archive.html.

Acknowledgements

The first and last authors acknowledge support by the Netherlands Organization for Scientific Research VIDI grant nr. 016.146.301.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

References

Betancourt

(2018) A conceptual introduction to Hamiltonian Monte Carlo. arXiv preprint:170102434.

Booth

Hobert

(1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo em algorithm. Journal of the Royal Statistical Society, Series B, 61, 265–85.

Breslow

Clayton

(1993) Approximate inference in generalized linear mixed models. Journal of the American Statistical Association, 88, 9–25.

Brilleman

Crowther

Moreno-Betancur

Novik

Dunyak

AlHuniti

Fox

Hammerbacher

Wolfe

(2019) Joint longitudinal and time-to-event models for multilevel hierarchical data. Statistical Methods in Medical Research, 28, 3502–15.

Brown

Prescott

(1999) Applied Mixed Models in Medicine. New York, NY: John Wiley and Sons.

Davidian

Giltinan

(1995) Nonlinear Models for Repeated Measurement Data. London: Chapman and Hall.

Demidenko

(2004) Mixed Models: Theory and Application. New York, NY: John Wiley and Sons.

Fahrmeir

Tutz

(1994) Multivariate Stati- stical Modelling Based on Generalized Linear Models. New York, NY: Springer-Verlag.

Fieuws

Verbeke

(2004) Joint modelling of the multivariate longitudinal profiles: Pitfalls of the random-effects approach. Statistics in Medicine, 23, 3093–3104.

10.

Fieuws

Verbeke

(2006) Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal proles. Biometrics, 62, 424–31.

11.

Fieuws

Verbeke

Molenberghs

(2007) Random-effects models for multivariate repeated measures. Statistical Methods in Medical Research, 16, 387–97.

12.

Gelfand

Smith

(1990) Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.

13.

Gelman

Carlin

Stern

Dunson

Vehtari

Rubin

(2013) Bayesian Data Analysis, 3rd Edition. Boca Raton, FL: CRC Press.

14.

Girolami

Calderhead

(2011) Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 123–214.

15.

Guan

Haran

(2016) A computationally efficient projection-based approach for spatial generalized linear mixed models. Journal of Computational and Graphical Statistics, 27. doi: 10.1080/10618600.2018.1425625.

16.

Gueorguieva

(1999) Models for repeated measures of a multivariate response. PhD Dissertation, Department of Statistics, University of Florida, Gainesville.

17.

Hedeker

Gibbons

(1994) A random- effects ordinal regression model for multilevel analysis. Biometrics, 50, 933–44.

18.

Hedeker

Gibbons

(1996) Mixor: A computer program for mixed-effects ordinal regression analysis. Computer Methods and Programs in Biomedicine, 49, 157–76.

19.

Hoffman

Gelman

(2014) The no-u-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. The Journal of Machine Learning Research, 15, 1593–1623.

20.

Ionides

(2008) Truncated importance sampling. Journal of Computational and Graphical Statistics, 17, 295–311.

21.

Laird

Ware

(1982) Random effects models for longitudinal data. Biometrics, 38, 963–74.

22.

Lewandowski

Kurowicka

Joe

(2009) Generating random correlation matrices based on vines and extended onion method. Journal of Multivariate Analysis, 100, 1989–2001.

23.

Liu

(2001) Monte Carlo Strategies in Scientific Computing. New York, NY: Springer-Verlag.

24.

MacKay

(2003) Information Theory, Inference, and Learning Algorithms. Cambridge: Cambridge University Press.

25.

Mahani

Hasan

Jiang

Sharabiani

(2016) Stochastic Newton sampler: R package sns. Journal of Statistical Software, Code Snippets, 74, 1–33.

26.

Mauff

Steyerberg

Kardys

Boersma

Rizopoulos

(2020) Joint models with multiple longitudinal outcomes and a time-to-event outcome. Statistics and Computing doi:10.1007/s11222-020-09927-9.

27.

McCulloch

(1997) Maximum likelihood algorithms for generalized linear mixed models. Journal of the American Statistical Association, 92, 162–70.

28.

Monnahan

Thorson

Branch

(2017) Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods in Ecology and Evolution, 8, 339–48.

29.

Neal

(2011) MCMC Using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, edited by Brooks

Gelman

Jones

Meng

, pages 113–162. Boca Raton, FL: CRC Press.

30.

Owen

(2013) Monte carlo theory, methods and examples. URL https://statweb.stanford.edu/owen/mc/Ch-var-is.pdf (last accessed 17 August 2020).

31.

Pinheiro

Bates

(1995) Approximation to the loglikelihood function in the nonlinear mixed effects model. Journal of Computational and Graphical Statistics, 4, 12–35.

32.

Pinheiro

Bates

(2000) Mixed-Effects Models in S and S-PLUS. New York, NY: Springer-Verlag.

33.

Robert

Casella

(2004) Monte Carlo Statistical Methods. New York, NY: Springer-Verlag.

34.

Stan Development Team (2014) Stan: A C++ library for probability and sampling, version 2.8.0. URL http://mc-stan.org/ (last accessed 17 August 2020).

35.

Wolfinger

O'Connell

(1993) Generali- zed linear mixed models: A pseudo- likelihood approach. Journal of Statistical Computation and Simulations, 48, 233–43.