Abstract
Abstract:
Index measures are commonly used in medical research and clinical practice, primarily for quantification of health risks in individual subjects or patients. The utility of an index measure is ultimately contingent on its ability to predict health outcomes. Construction of medical indices has largely been based on heuristic arguments, although the acceptance of a new index typically requires objective validation, preferably with multiple outcomes. In this article, we propose an analytical tool for index development and validation. We use a multivariate single-index model to ascertain the best functional form for risk index construction. Methodologically, the proposed model represents a multivariate extension of the traditional single-index models. Such an extension is important because it assures that the resultant index simultaneously works for multiple outcomes. The model is developed in the general framework of longitudinal data analysis. We use penalized cubic splines to characterize the index components while leaving the other subject characteristics as additive components. The splines are estimated directly by penalizing nonlinear least squares, and we show that the model can be implemented using existing software. To illustrate, we examine the formation of an adiposity index for prediction of systolic and diastolic blood pressure in children. We assess the performance of the method through a simulation study.
Keywords
Introduction
Index measures are commonly used in medical research and clinical practice. By combining information from an array of observed characteristics into a single value, an index quantifies a certain important yet unobserved trait in a given subject. With a few exceptions, currently used medical indices are mostly developed on empirical grounds. The acceptance of an index, however, depends on its conceptual validity and its ability to predict health outcomes. Previously, we described a single-index model for the construction of indices that correlate with a given outcome (Wu and Tu, 2013). The current article extends that method to situations of multiple outcomes. This extension is practically important because no indices are considered truly useful unless they work with multiple outcomes.
The purpose of this article is to present a research tool that aids the development of index measures by directly linking the index functions to pre-specified health outcomes through a multivariate single-index model. In presenting the method, we discuss a general approach for model development as well as related model-fitting procedures. To illustrate, we construct an adiposity index for predicting systolic and diastolic blood pressure (SBP and DBP) in children.
A multivariate single-index model
Univariate single-index model
Univariate single-index models take a very simple form: suppose
In practice, values of the index coefficients
Specification of a multivariate single-index model
We construct an index
Let
Here,
In this model, we include a covariate matrix
Model (2.1) presents a system of simultaneous equations for
In the following sections, we use p-splines to estimate the non-parametric index functions, which allow us to present the model in a mixed effect model format. We fitted the spline models by minimizing the weighted penalized least square functions. The random effects and random errors are calculated via best linear prediction and restricted maximum likelihood (REML) methods based on the observed data.
Writing the index values as
It is well known that p-spline can be expressed in a mixed model representation with unpenalized (fixed) and penalized (random) components (Ruppert et al., 2002). We write
Writing
To estimate the index parameters
Robinson (1991) described an alternative method for deriving the best linear unbiased prediction (BLUP) of
For given values of
This is equivalent to solving a weighted penalized least square problem
When fitting a
Mathematically, this is also equivalent to solving a generalized-weighted penalized least square problem
Herein,
We consider the computation of index components in a joint model setting with multivariate longitudinal data. The estimation process has three steps:
Step 1: Set the initial values of the index parameters to
Step 2: Given a specific set of values of
Step 3: We iteratively obtain
Maximization of the likelihood function in Step 2 is implemented by using the
Confidence interval estimate of the mean responses
We construct confidence intervals for the mean responses. Suppose
Because
Simulation
We conduct an extensive simulation study to assess the finite sample performance of the proposed method under various parameter settings. We report the bias and standard errors (SEs) of the parameter estimates. We also compare the estimated index function curves and the true index curves. The overall fitness of the model is characterized by mean square error (MSE).
Data generation
We consider a scenario involving three correlated outcome variables. In each simulation, data are generated from a trivariate normal distribution. The three true index functions are
The independent variables
We consider four different sample sizes: Number of subjects varies between
Performance assessment
We compare the estimated values of parameters against the true values. The parameter estimation results, including the mean values of the parameter estimates (Mean), SE, bias and MSE, are summarized in Table 1. The simulation shows that the estimated coefficient values are close to the true values in all cases, and the SEs of the estimates are generally small. In addition, correlated structures and heteroscedasticity among the outcomes are correctly exhibited. Not surprisingly, MSE of each parameter estimates decreases as sample size and the number of repeated measures increase.
Parameter estimation: True
Parameter estimation: True
Figure 1 presents the average cubic-spline estimates for the three correlated outcomes and the corresponding
Estimated index functions and the corresponding confidence bands. Solid curves are the true functions; the dashed curves are the average cubic-spline fit over 200 simulations. The dot-dashed curves are the corresponding 2.5% and 97.5% quantiles.
In summary, the simulation study shows that both the index components and curvature of index functions are accurately recovered; other parameters associated with the multivariate linear models are also reliably estimated. The coverage probabilities of the confidence band are close to the nominal level, thus confirming that the proposed algorithm works well in tested data settings.
In the current simulated datasets, only three positively correlated outcomes are considered. A separate simulation is conducted to evaluate the performance of the proposed method with a larger number of outcomes (six) with both positive and negative correlations. Again, the method performs as expected (additional simulation results are shown in Table 2).
Parameter estimation: Models for six outcomes with both positive and negative correlations. True index functions are all the same for the six outcomes:
To illustrate the proposed method, we construct an adiposity index based on waist girth (WG) and subscapularis skinfold (SS) for the prediction of SBP and DBP. Data were obtained from a prospective observational study; participants were children recruited from schools in Indianapolis, Indiana. Detailed study protocol was described by Tu et al., (2009). Briefly, blood pressure, WG and SS were assessed repeatedly during the course of follow-up. Preliminary data exploration showed that both WG and SS were positively associated with SBP and DBP.
We assume that the index takes the form
We fit the model using a subset of the study data, where all subjects had at least seven follow-up visits.
The example dataset included 468 children (224 males). The mean age of the children at study entry was 13 years. Besides WG and SS as index components, we included in the model age and sex as fixed effect covariates. A random subject effect was also included to accommodate the within-subject correlations.
Spline estimates and
confidence bands for the systolic and diastolic blood pressure, stratified by sex. Solid curves are for systolic blood pressure and dashed curves are for diastolic blood pressure.
Spline estimates and
confidence bands for the systolic and diastolic blood pressure, stratified by sex. Solid curves are for systolic blood pressure and dashed curves are for diastolic blood pressure.
Summary of the fitted index models
Penalized parameter estimates of
Penalized parameter estimates of
Model fitting results are presented in Table 3. The associations between the values of the new index and SBP and DBP in male and female subjects are graphically presented in Figure 2. The derived index is positively correlated with both SBP and DBP. The
Scientifically, our data showed that both WG and SS are positively associated with the elevation of blood pressure. But, WG has a greater contribution to blood pressure than SS, as indicated by the magnitude of their respective index coefficients, which are at an approximately 2:1 ratio. In the current study, WG is an approximation of the abdominal fat, while SS measures subcutaneous fat. Previously published data have consistently shown that body fat distribution plays an important role in the development of obesity-related hypertension, and that increased visceral adiposity (such as that measured by WG), more than subcutaneous adiposity, contributes to increased risk of hypertension (Chandra et al., 2014). Although the precise mechanisms have not been fully elucidated, they most likely involve the increase of insulin resistance (Fain et al., 2004; Fox et al., 2007) and possibly the alteration of the renin–aldosterone axis (Yu et al., 2013). Regardless of the mechanisms, our data once again confirm the harm of central fat cumulation.
Derivation of useful medical indices that correlate with multiple health outcomes is an issue of great practical importance. In this research, we provide a new tool to assist index development. By extending the partially linear single-index model to a multivariate setting, we have developed a single-index model that allows investigators to analytically derive clinical indices that work for multiple clinical outcomes.
In this article, we present the basic construction of the index model, as well as related model fitting procedures. Our simulation study shows that the new method has an excellent performance in estimation accuracy and computational efficiency. The model formulation is quite general and can accommodate longitudinal measures of multiple outcomes. Besides the index function, the model also includes other fixed and random effects. The index functions are modelled by cubic splines and estimated using a penalized least square method. As we have shown in our simulation studies, both index components and curvature of the index functions are recovered accurately. The relatively narrow confidence bands associated with fitted curves further attest to the model's estimation efficiency. Finally, as an index development tool, the method can be implemented in most computing platforms with existing software; thus, it has the potential to be used by practitioners in a wide variety of applications.
Footnotes
Acknowledgments
This work was supported by National Institutes of Health Grants RO1-HL095086, U54 CA 190151, and P30 HS024384.
Appendix A
Consider a general multivariate p-spline model with
We consider fitting
Here
Matrix
Using the Lagrange multiplier method, the above minimization is equivalent to solving
This has the solution
On the other hand, if we assume
Indeed,
Extension to random effect models, where the subject-specific random vector
Appendix B
Example data are from an ongoing study. We do not have permission to publish the raw data. But, we will be happy to assist the interested parties to obtain data access with signed data use agreement. For further details, please contact Wanzhu Tu at Indiana University School of Medicine (
For implementation, we use the
The dataset is structured as a data frame (named
We obtain the initial values for index component parameters
We set up object
We assume that the random subject effect is normally distributed with a compound symmetry variance matrix, and each one of the random effects for penalized elements follows a normal distribution with an identity variance matrix
We define a function
