Abstract
In medicine, multiple continuous outcomes are often repeatedly measured on each subject over time to assess disease severity. Usually, it is of interest to investigate the association between those outcomes, which may be measured at different time points, resulting in unbalanced data. The multivariate linear mixed-effects model (MLMM) is a popular framework for this analysis. It considers the unbalanced nature of the data and accounts for the association of the outcomes via the random effects, often assuming a multivariate normal distribution. However, measuring and understanding the degree of connection between longitudinal outcomes remains challenging. We propose to enhance the MLMM by incorporating various interpretable association structures. Specifically, we consider that multiple longitudinal outcomes are related to the primary outcome through their current value, cumulative effect (total or partial), or both. Our research is motivated by Pompe disease, a rare, inheritable, progressive metabolic myopathy. Clinically, it is important to investigate how patient-reported outcome measures (primary outcomes) are associated with physical outcomes to determine whether improvements in physical outcomes are accompanied by improvements in health-related quality of life and other patient experiences. We found a positive association between them. The proposed models are fitted under the Bayesian framework using Hamiltonian Monte Carlo.
Keywords
Abbreviations
ERT: enzyme replacement therapy
HMC: Hamiltonian Monte Carlo
Introduction
In numerous clinical studies, multiple longitudinal outcomes are collected at regular intervals from each subject to monitor the progression of disease severity. Improving decision-making in clinical settings requires a thorough investigation into the potential relationships among different outcomes. For instance, in prospective Pompe disease studies, which is a rare, inheritable, and progressive metabolic myopathy, the interest lies in the association between continuous physical outcomes and continuous patient-reported outcome measures (PROMs). Knowledge of how these outcomes are associated is essential for assessing their reflection in the patient’s health-related quality of life (hrQoL).
Our study is motivated by Pompe disease data obtained from two prospective follow-up studies conducted at the Center for Lysosomal and Metabolic Diseases, Erasmus MC University Medical Center, Rotterdam-the national referral center for Pompe disease in the Netherlands.1,2 Integrating such data presents several challenges. In Figure 1, we present the evolution of one of the primary clinical outcomes, Forced Vital Capacity in the upright seated position expressed as a percentage of its predicted values (

Plot of Forced vital Capacity in the upright seated position expressed as a percentage of its predicted values (
The multivariate linear mixed-effects model (MLMM) is a popular framework for jointly analyzing multiple continuous longitudinal outcomes while accounting for the unbalanced nature of the data. This model accounts for the association between the longitudinal outcomes through the random effects, often assuming a multivariate normal distribution.5–11 Computational challenges may appear due to the dimension of the variance–covariance matrix of the joint random effects when the number of outcomes and (or) the number of random effects per outcome increases. Fieuws and Verbeke 12 proposed using the pairwise approach for modeling four or more outcomes where the standard MLMM would become computationally challenging. In particular, they suggested fitting all possible bi-variate models and the average over the duplicate parameter estimates to be calculated. This approach offers greater flexibility than assuming the complete multivariate linear mixed-effects model, potentially resulting in increased variability and some efficiency loss for the shared parameters. Another method to address the association among multiple longitudinal outcomes is to link the error terms rather than the random effects.5,13–17 However, this can lead to challenges related to high dimensionality, especially when assuming numerous longitudinal outcomes or complex random effects structures. Moreover, connecting the outcomes via error terms might pose difficulties when measurements are taken at dissimilar time points.
Motivated by questions regarding Pompe disease, our research aims to establish relationships between continuous longitudinal outcomes from different data sources. While the aforementioned methods connect the different outcomes, a direct interpretation of the association parameters is not always feasible. It is crucial for clinicians to quantify the relationship between the outcomes. Moreover, the degree of association of random effects does not necessarily imply a clinical improvement reflective of the patient’s condition, especially in cases where complex structures are assumed. One possible approach to quantifying and interpreting the relationship between outcomes is to include the functional forms of one outcome as endogenous covariates in the model for the primary outcome of interest. This solution offers flexibility by allowing for different specifications of the outcomes that align with their biological interpretation. A time-varying covariate is considered endogenous if its value at time
It has been previously recognized that different ways exist to associate an outcome with a time-varying covariate, for example, by relating them at the same time point or assuming a lag effect between the covariate and the impact on the outcome.
18
Different association structures are investigated and extensively discussed in the literature of joint models for longitudinal and survival data.23,24 The longitudinal outcomes can be related via the random effects and connected to the hazard, assuming different functional forms. In particular, their current value at a specific time point
To investigate the strength of the association between the continuous longitudinal outcome
This article is constructed as follows: Section 2 introduces the standard multivariate linear mixed-effects model and extensions of it by assuming different association structures between the longitudinal outcomes. Section 3 presents the Bayesian approach, including the likelihood and priors of the parameters. An application of the extended multivariate linear mixed-effects model to the Pompe disease data is given in Section 4. Section 5 presents our simulation study and its results. Section 6 contains a discussion.
In many studies, multiple repeated measurements are collected from each subject for various types of outcomes, which may be recorded at different time points. Mixed effects models are commonly used to analyze longitudinal data and can be extended to address multiple longitudinal outcomes.
Multivariate linear mixed-effects model
The multivariate mixed-effects model accounts for the association between multiple longitudinal outcomes through the random effects, often by assuming a multivariate normal distribution. Motivated by our application on Pompe disease data, where both the clinical outcome and the patient-reported outcomes are measured on a continuous scale, we focus on the multivariate linear mixed-effects model. Let’s assume that we have
Connecting longitudinal outcomes through random effects does not directly measure the strength of the association and lacks clinical relevance, especially when a complex random effects structure is involved. Therefore, in many applications, one of the outcomes is considered the primary outcome of interest, while the other outcomes serve as time-varying predictors. Without loss of generality, we assume the
In particular, we assume different functional forms of the longitudinal outcomes

Graph of associations between
Given the above, the proposed multivariate linear mixed-effects model of the primary outcome takes the form:
Connecting the outcomes through different functional forms and random effects may lead to computational difficulties and make it challenging to interpret their associations, hence assuming independent (unstacked random effects vector) may be preferred and then
In longitudinal studies with time-varying covariates, several challenges emerge regarding their modeling. From a clinical perspective, accurately characterizing the association is important to reflect biological assumptions, as it can significantly impact the study results. Thus, it is essential to determine which summary measure of the covariate is linked to the outcome of interest. Table 1 summarizes various forms of associations guided by our clinical context. Each of the
Functional forms of longitudinal outcome(s) associated with the outcome of main interest in the multivariate linear mixed-effects model.
Functional forms of longitudinal outcome(s) associated with the outcome of main interest in the multivariate linear mixed-effects model.
A straightforward way to summarize a longitudinal outcome is to assume its underlying value at a specific time point. In this context, the underlying values of the longitudinal outcomes
The present value holds biological significance for our application and can be readily understood by clinicians. This association parameter interprets that a unit increase in the
In specific research contexts, relying solely on a single value may not adequately capture the overall outcomes. For patients with Pompe disease, consistently low physical scores over an extended period (such as six months to a year) can hold more clinical significance than isolated instances of low scores. For example, prolonged periods of physical pain or fatigue have a greater impact on patient-reported outcomes compared to brief moments of severe pain. Therefore, incorporating historical data on outcomes provides deeper insights into disease progression and better informs physicians about the effectiveness of interventions. Thus, we can assume that the longitudinal outcomes
In alternative contexts, varying functional forms may be more suitable for modelling the relationship between longitudinal outcomes and the primary outcome. For example, considering the trajectory’s slope (i.e., the rate of change of an outcome relative to another) or weighted cumulative effect (assuming that the more recent values have a greater influence)30,31 could be relevant. Additionally, in certain clinical applications, a combination of different functional forms and association parameters might be applicable. The choice of association structure(s) should align with biological assumptions
Estimation and likelihood
Under the Bayesian framework and using the Hamiltonian Monte Carlo (HMC) algorithm,32–36 we estimate the parameters of the model and the inference is based on the posterior distribution of these parameters. If
Priors, posterior and diagnostics
We assume that the parameters
Application
Data
Our motivation stems from two prospective observational cohort studies conducted at the Center for Lysosomal and Metabolic Diseases, Erasmus MC University Medical Center in Rotterdam—the national referral center for Pompe disease in the Netherlands.1,2 Pompe disease, a rare, progressive metabolic myopathy, has been the focus of research. Since 2006, Enzyme Replacement Therapy (ERT) with recombinant human alpha-glucosidase (alglucosidase alfa, Myozyme) has been approved as a treatment for Pompe disease. Notably, ERT has demonstrated beneficial effects on various aspects, including physical outcomes (such as motor performance, muscle strength, and pulmonary function), patient-reported outcomes (PROMs), and overall survival.38–44
Multiple longitudinal physical and patient-reported outcomes are available from the two prospective observational cohort studies. Data are combined, and only data from Dutch adult patients with Pompe disease included in both studies, treated with ERT and assessed for physical outcomes during ERT (population of study interest) are retained. One of the most important physical outcomes is Forced Vital Capacity in the upright seated position,
In previous work, Yuan et al.
48
found that
Table 2 presents some of the demographic and clinical characteristics of the Dutch adult patients with Pompe disease eligible for the analyses. For the extended multivariate linear mixed-effects model of the
Characteristics of the patients eligible for the multivariate linear mixed-effects model of the Forced vital capacity in the upright position expressed as percentage of its predicted values (
) with the Physical Component Summary (PCS) score (n
100) and the Rasch-Built Pompe-Specific Activity R-PAct scale (n
94), respectively.
Characteristics of the patients eligible for the multivariate linear mixed-effects model of the Forced vital capacity in the upright position expressed as percentage of its predicted values (
In the model of PCS and R-PAct, we include the variables sex, the standardized age at the start of ERT (

Graph illustrating the relationships between variables and outcomes within the context of Pompe disease.

Graph illustrating the relationships between variables and outcomes within the context of Pompe disease.
The submodel of the clinical outcome
For the error terms, we set
Vague priors are used for all parameters. Specifically, we assume that
From the multivariate linear-mixed effects models considering both the current value and (total or partial) cumulative effect of the
Based on the multivariate linear-mixed effects models considering only the current value of
The effect plots of the average patient and the trace plots are presented in the Supplemental material Part C Figures 1-14. As average patient, we define a female patient who has median standardized disease duration at start of ERT, median standardized age at start of ERT and the current value of
Simulation Study
Objective
In practice, the linkage between outcomes remains unclear, often leading to ignoring their association or misspecifying the relationship of the random effects. With our simulation study, we aim to confirm firstly the unbiasedness of estimates resulting from the proposed extensions of the multivariate linear mixed-effects model, and secondly, to explore potential biases arising from disregarding additional association parameters (current value, total cumulative effect, or both) or misspecifying the association of random effects. Additionally, we aim to examine how other parameters may be affected when the association between outcomes is not appropriately addressed.
Simulated data
To mimic our application, we assume two continuous longitudinal outcomes
Design
The extended multivariate linear mixed-effects model of the simulation study takes the form
In the model of the variable
The actual values of the coefficients are set to be
Analysis
First, we fit the model using the same specifications as those that generated the simulated data set. Next, we fit models in which either the associations within the random effects are misspecified or the additional association parameters—such as current value, cumulative effect, or both—are omitted. Our primary focus is on evaluating the impact of these misspecifications on the estimations. Table 3 provides an overview of the scenarios explored. The model that generated the simulated data is referred to as the simulation model. In Scenario I, the same model is used for both simulation and fitting. In Scenario II, the association between the random effects of the simulation model is misspecified when fitting the data, and in Scenario III, the additional association parameter of the simulation model is disregarded when fitting the data. We explained the models denoted as V+D, C+D, V+C+D, V+D1D2, C+D1D2 and V+C+D1D2 in the final paragraph of the objective subsection within the simulation study section. The models denoted as D and D1D2 refer to multivariate linear mixed-effects models which do not include functional forms of
Vague priors are used for all parameters. In particular, we assume that
Out of the 3600 generated data sets, 6.8% were excluded due to convergence issues (
It is observed that all fitted models in Scenario II (where the relationship between the random effects is misspecified) yield unbiased estimates for all parameters, regardless of the inclusion of the extra association parameter in the model. However, an exception occurs when the simulation model is V+D1D2, but the model V+D is fitted instead. In this case, the variance of the random slope (
The models in Scenario III produce biased estimates for most parameters. Specifically, all models yield biased estimates for the coefficients and variances of the random effects included in the linear mixed-effects model for outcome
Scenario III models fitted on data generated by C+D and V+C+D yield biased estimates for most parameters. Specifically, the Scenario III model fitted on data generated by C+D results in biased estimates for all parameters except
Overall, similar results are observed when calculating the bias and MSE values of the parameters. Notably, in the Scenario III models, the variance–covariance matrix parameters exhibit greater bias and higher MSE values compared to the other parameters discussed in this section (Supplemental material Part E, Tables 1

Box plots of the estimated current value of the outcome

Box plots of the estimated variance of the random slope in the model of
In summary, we conclude that when the relationship between the random effects is incorrectly specified, the models generally produce unbiased estimates for most parameters. However, if the additional association parameter (whether current value, cumulative effect, or both) is excluded, biased estimates arise for many parameters. Specifically, the V+C+D model results in biased estimates for nearly all parameters when the additional association parameters are omitted. Therefore, it may be more prudent to consider including multiple association parameters rather than disregarding a relevant one and potentially excluding any clinically irrelevant parameters later on. Finally, incorporating a full variance–covariance matrix
Discussion
This study introduces an extension of the standard multivariate linear mixed-effects model within the Bayesian framework using Hamiltonian Monte Carlo. In the standard model, the random effects account for the interdependence among multiple longitudinal outcomes by assuming a multivariate normal distribution. In the current study, we assume that the primary longitudinal outcome of interest is the K-
Analysis of Pompe disease data using the extended model indicates that PROMs are associated only with the current value of
Based on our simulation study findings, we recommend fitting the multivariate linear mixed-effects model with various association structures, such as current values and total cumulative effects, linking the longitudinal outcomes, and subsequently eliminating association structures that are deemed less critical. This recommendation aligns with Andrinopoulou et al. (2016)
23
, who advocate for including multiple association structures in the model when uncertainty exists regarding the most appropriate association structure or when no specific structure is of particular interest. Including a full variance–covariance matrix and association parameters can introduce unnecessary complexity when the data-generating simulation model has identical and non-identical specifications (Supplemental material Part F, Tables 25
The proposed extended model requires substantial computational resources, and their runtime can vary considerably depending on dataset characteristics (e.g., size, degree of imbalance, and measurement frequency), model complexity (e.g., correlation structure among random effects and number of association parameters), hardware specifications, and implementation details such as parallelization strategies. Unbalanced datasets, in particular, add complexity because they require additional steps to predict longitudinal trajectories at non-aligned time points. While these models offer flexibility and richer inference, their computational intensity may limit practical applicability for very large datasets. Future research should focus on improving scalability and efficiency, for example, by exploring alternative estimation engines (e.g., JAGS), leveraging parallel computing, and investigating dimension-reduction. Such developments will be essential to ensure that these models remain feasible for real-world applications involving high-dimensional or unbalanced data.
Also, future research could explore these extensions within the context of generalized mixed-effects models (e.g., logistic models) and Bayesian shrinkage techniques could be employed to identify the most suitable association structure for the interdependence of outcomes.
In this study, the coefficients measuring the strength of the association between longitudinal outcomes are kept constant, but outcomes are time-dependent and their associations may vary over time. For example, in Pompe disease, the clinical outcome
Supplemental Material
sj-pdf-1-smm-10.1177_09622802261455689 - Supplemental material for Bayesian multivariate linear mixed-effects models with varied association structures
Supplemental material, sj-pdf-1-smm-10.1177_09622802261455689 for Bayesian multivariate linear mixed-effects models with varied association structures by Aglina Lika, Dimitris Rizopoulos, Michelle E Kruijshaar, Ans T van der Ploeg and Eleni-Rosalina Andrinopoulou in Statistical Methods in Medical Research
Footnotes
Acknowledgments
We thank the Sophia Children's Hospital for providing the Pompe data and are grateful to all the clinical staff who assisted in collecting the physical outcomes over time.
ORCID iDs
Ethical approval and informed consent statements
Both studies were approved by the ethics committee of the Erasmus MC University Medical Center and have been performed in accordance with the 1964 Declaration of Helsinki and its later amendments. Written informed consent was obtained from all participants prior to their inclusion.
Funding
The Pompe disease study was supported by Sanofi-Genzyme; ZonMW-The Netherlands Organization for Health Research and Development (projects 152001005, 80-83600-98-13007, and 05-09-2007); Prinses Beatrix Spierfonds (projects OP07-08, W. OR13-21, and W.OR15-10); TKI – Health Holland (project LSHM16008); SSWO-Sophia Children’s Hospital Foundation (project 687). Several of the authors of this publication are members of the European Reference Networks for Hereditary Metabolic Disorders (Metab-ERN) and/or for Rare Neuromuscular Diseases (EURO-NMD) and/or of the Netherlands Neuromuscular Center (NL-NMD).
Declaration of conflicting interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: A. van der Ploeg received funding for research, clinical trials and providing advise to various industries working on ERT or next-generation therapies in the field of Pompe disease, other lysosomal storage diseases, and neuromuscular disorders under agreements with Erasmus MC University Medical Center and the relevant industry. The other authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
