A multivariate single-index model for longitudinal data

Abstract

Abstract:

Index measures are commonly used in medical research and clinical practice, primarily for quantification of health risks in individual subjects or patients. The utility of an index measure is ultimately contingent on its ability to predict health outcomes. Construction of medical indices has largely been based on heuristic arguments, although the acceptance of a new index typically requires objective validation, preferably with multiple outcomes. In this article, we propose an analytical tool for index development and validation. We use a multivariate single-index model to ascertain the best functional form for risk index construction. Methodologically, the proposed model represents a multivariate extension of the traditional single-index models. Such an extension is important because it assures that the resultant index simultaneously works for multiple outcomes. The model is developed in the general framework of longitudinal data analysis. We use penalized cubic splines to characterize the index components while leaving the other subject characteristics as additive components. The splines are estimated directly by penalizing nonlinear least squares, and we show that the model can be implemented using existing software. To illustrate, we examine the formation of an adiposity index for prediction of systolic and diastolic blood pressure in children. We assess the performance of the method through a simulation study.

Keywords

mixed effect model multivariate setting penalizing nonlinear least squares P-spline single-index model

1 Introduction

Index measures are commonly used in medical research and clinical practice. By combining information from an array of observed characteristics into a single value, an index quantifies a certain important yet unobserved trait in a given subject. With a few exceptions, currently used medical indices are mostly developed on empirical grounds. The acceptance of an index, however, depends on its conceptual validity and its ability to predict health outcomes. Previously, we described a single-index model for the construction of indices that correlate with a given outcome (Wu and Tu, 2013). The current article extends that method to situations of multiple outcomes. This extension is practically important because no indices are considered truly useful unless they work with multiple outcomes.

The purpose of this article is to present a research tool that aids the development of index measures by directly linking the index functions to pre-specified health outcomes through a multivariate single-index model. In presenting the method, we discuss a general approach for model development as well as related model-fitting procedures. To illustrate, we construct an adiposity index for predicting systolic and diastolic blood pressure (SBP and DBP) in children.

2 A multivariate single-index model

2.1 Univariate single-index model

Univariate single-index models take a very simple form: suppose $Y$ is the outcome of interest, and $X \in ℝ^{d}$ is a $d$ -dimensional vector of independent variables, a single-index model takes the form $E (Y | X) = η (α^{T} X)$ , where $α$ is the coefficient vector of $X$ , and $η (\cdot)$ is an unspecified nonlinear function. Through $α^{T} X$ , one reduces the dimension of the independent variables from $d$ to 1; and through $η (\cdot)$ , one retains the ability to accommodate potential nonlinearity in the relationship between $X$ and $Y$ .

In practice, values of the index coefficients $α$ and the functional form of the index function $η (\cdot)$ are estimated from observed data (Li, 1991). There is a sizable literature on the estimation of $α$ and $η (\cdot)$ , including semi-parametric weighted least square methods (Härdle et al., 1993; Ichimura, 1993), the average derivatives estimation (Stoker, 1986; Härdle and Stoker, 1989), the minimum average variance estimation (Xia et al., 2002; Xia and Hardle, 2006) and p-spline method (Yu and Ruppert, 2002). In our previous work, we used penalized likelihood method to fit a univariate single-index model for longitudinal data (Wu and Tu, 2013). In the current work, we extend this model-fitting approach to a multivariate setting.

2.2 Specification of a multivariate single-index model

We construct an index $α^{T} x_{ij}$ based on a $d$ -dimensional vector of independent variables $x_{ij} \in ℝ^{d}$ , and we link the index function to $M$ different outcomes, for $i = 1, 2, \dots, N$ subjects; we assume that the $i$ th subject has $j = 1, \dots, n_{i}$ longitudinal observations.

Let $Y_{m} = (y_{m; i 1}, \dots, y_{m; {in}_{i}})_{1 \leq i \leq N}^{T}$ , $\forall m = 1, \dots, M$ be the response vectors for the $m$ th outcome. We write the model for the $m$ th outcome as follows:

Y_{m} = η_{m} (X^{*} α) + W ψ_{m} + B_{m} + Ξ_{m}, \forall m = 1, \dots, M .

(2.1)

Here, $α$ is a $d \times 1$ vector of index coefficients for $x_{ij} = (x_{ij 1}, \dots, x_{ijd})^{T}$ , and $X^{*} = [x_{ij}^{T}]_{1 \leq j \leq n_{i}; 1 \leq i \leq N}$ is a matrix of all index elements.

In this model, we include a covariate matrix $W = [w_{ij}^{T}]_{1 \leq j \leq n_{i}; 1 \leq i \leq N}$ , as well as its coefficients $ψ_{m} \in ℝ^{q}$ , where $w_{ij} = (w_{ij 1}, \dots, w_{ijq})^{T}$ are the fixed effects in the $i$ th subject at the $j$ th measurement. For maximal flexibility, we let the index functions $η_{1}, \dots, η_{M}$ be outcome-specific, although in specific applications one may restrict the index functions to a common form across all outcomes. We further assume that the index functions are twice-differentiable smoothing functions. Similarly, we let $B_{m} = (b_{m; 1} \otimes 1_{n_{1}}^{T}, \dots, b_{m; N} \otimes 1_{n_{N}}^{T})^{T}$ and $Ξ_{m} = (ε_{m; i 1}, \dots, ε_{m; {in}_{i}})_{1 \leq i \leq N}^{T}$ , $\forall m = 1, \dots, M$ be the vectors of subject-specific random effects and random errors, where $b_{m; i}$ and $ε_{m; {in}_{i}}$ are subject-specific random effect and random errors for the $m$ th outcomes, respectively. Here, $\otimes$ represents the tensor-product and $1_{n_{i}}$ is a vector of $1$ s of length $n_{i}$ .

Model (2.1) presents a system of simultaneous equations for $M$ longitudinal outcomes. To link these equations in a unified structure, we use a shared random effect ${\tilde{B}}_{i} = (b_{1; i}, \dots, b_{M; i})^{T}$ and we assume that ${\tilde{B}}_{i}$ follows a multivariate normal distribution, written as MVN $(0, Σ_{b})$ . We further assume that the diagonal elements of $Σ_{b}$ are $σ_{b_{m}}^{2}$ , and off-diagonal elements are $ρ_{st} σ_{b_{s}} σ_{b_{t}}$ for $m, s, t = 1, \dots, M$ . Such a formulation gives an intuitive interpretation: $ρ_{st}$ is the correlation of the paired outcomes $y_{s; ij}$ and $y_{t; ij}$ , whereas $σ_{b_{s}}^{2}$ and $σ_{b_{t}}^{2}$ are the corresponding variance components of the random effects. Meanwhile, the diagonal elements $σ_{b_{m}}^{2}$ are the variance components corresponding to the $m$ th outcome. Of note, such a formulation induces not only dependency among multiple outcomes at a given time point but also longitudinal correlations within the same outcome. Similarly, the random error vector, ${\tilde{Ξ}}_{i} = (ε_{1; ij}, \dots, ε_{M; ij})^{T}$ , is assumed to be independent of ${\tilde{B}}_{i}$ and follows a zero-mean Gaussian distribution with covariance matrix $Σ_{ε}$ . Here, $Σ_{ε}$ is a positive definite matrix whose components are determined by the underlying serial correlations within and between the $M$ outcomes. Herein, we assume that the errors are independent, although this constraint could be relaxed to allow a dependent structure. We write $Σ_{ε}$ , which has a diagonal form $Σ_{ε} = σ_{ε}^{2}$ diag $(1, δ_{2}, \dots, δ_{M})$ . The dispersion parameters $σ_{ε_{r}}^{2}, r = 2, \dots, M$ are expressed as products of a common variance component and outcome-specific scale parameters $δ_{r}$ , i.e., $σ_{ε_{r}}^{2} = δ_{r} σ_{ε}^{2}$ .

In the following sections, we use p-splines to estimate the non-parametric index functions, which allow us to present the model in a mixed effect model format. We fitted the spline models by minimizing the weighted penalized least square functions. The random effects and random errors are calculated via best linear prediction and restricted maximum likelihood (REML) methods based on the observed data.

2.3 Mixed effect model representation and estimation

Writing the index values as $v = X^{*} α$ , we express $η_{m} (\cdot)$ as cubic spline functions $η_{m} (v) = \sum_{p = 1}^{3} γ_{m; p} v^{p} + \sum_{k = 1}^{K} γ_{m; 3 + k} (v - κ_{m; k})_{+}^{3}$ , where ${κ_{m; k}}_{k = 1}^{K}$ and $(v - κ_{m; k})_{+}^{3}$ are $K$ knots and truncated cubic function basis. For simplicity, we write the splines as $η_{m} (v) = G (v) γ_{m}$ , where $γ_{m} = (γ_{m; 0}, \dots, γ_{m; 3 + K})^{T}$ are the spline coefficients and $G (v) = (1, v, v^{2}, v^{3}, (v - κ_{1})_{+}^{3}, \dots, (v - κ_{K})_{+}^{3})$ are the basis functions. We choose cubic spline because of its simple implementation and twice differentiability (O'Sullivan, 1986; Eilers and Marx, 1996).

It is well known that p-spline can be expressed in a mixed model representation with unpenalized (fixed) and penalized (random) components (Ruppert et al., 2002). We write $G_{F}^{i} = (1, v_{ij}, v_{ij}^{2}, v_{ij}^{3})_{1 \leq j \leq n_{i}}$ and $G_{R}^{i} = ((v_{ij} - κ_{1})_{+}^{3}, \dots, (v_{ij} - κ_{K})_{+}^{3})_{1 \leq j \leq n_{i}}$ , so that $G_{F} = [G_{F}^{i}]_{1 \leq i \leq N}$ and $G_{R} = [G_{R}^{i}]_{1 \leq i \leq N}$ represent the ‘fixed’ effects and ‘random’ effects, with corresponding parameter vectors $γ_{Fm} = (γ_{m; 0}, \dots, γ_{m; 3})^{T}$ and $γ_{Rm} = (γ_{m; 4}, \dots, γ_{m; 3 + K})^{T}$ . By combining the model-fixed parameter vector $ψ_{m}$ and subject-specific random effect vector $b_{m} = (b_{m; 1}, \dots, b_{m; N})$ , we have $X = [I_{M} \otimes G_{F}, I_{M} \otimes W]$ and $Z = [I_{M} \otimes 1_{R}, I_{M} \otimes G_{R}] = [Z_{B}, Z_{R}]$ , where $I_{M}$ is the identity matrix of dimension $M$ and $1_{R} =$ diag $[1_{n_{i}}]_{1 \leq i \leq N}$ .

Writing $Y = (Y_{1}^{T}, \dots, Y_{M}^{T})^{T}$ , $ε = (Ξ_{1}^{T}, \dots, Ξ_{M}^{T})^{T}$ , we express the multivariate outcome model as $Y = X β + Z u + ε$ , where the fixed parameter vector $β = (γ_{Fm}^{T}, ψ_{m}^{T})_{1 \leq m \leq M}^{T}$ consists of model parameters of both parametric and non-parametric components; the random effects vector $u = (b_{m}^{T}, γ_{Rm}^{T})_{1 \leq m \leq M}^{T} = (B^{T}, γ^{T})^{T}$ contains parameters of random effects (includes penalized elements). The random effects $u$ follow a multivariate normal distribution MVN $(0, Σ_{u})$ , with $Σ_{u} =$ diag $(Σ_{b} \otimes I_{N},$ diag $(Γ_{1}, \dots, Γ_{M}))$ = diag $(Σ_{B}, Σ_{Γ})$ . The $K \times K$ submatrices, $Γ_{m}$ , control the amount of smoothing in the estimation of index function of $η_{m}$ , such that $Γ_{m} \sim$ MVN $(0, σ_{γ_{m}} \otimes I_{N})$ . The random errors follow a multivariate normal distribution MVN $(0, R)$ , with $R = Σ_{ε} \otimes I_{N^{*}}$ , where $N^{*} = \sum_{i = 1}^{N} n_{i}$ is the number of total multivariate observations.

To estimate the index parameters $α$ , we impose the usual constraints $∥ α ∥ = 1$ and $α_{1} > 0$ to ensure identifiability. For convenience, we write $θ = (α^{T}, β^{T})^{T}$ . Writing $τ = (σ_{γ_{m}}, σ_{b_{m}}, ρ_{st}, σ_{ε}, δ_{r})_{1 \leq m, s, t \leq M, 2 \leq r \leq M}^{T}$ and $e = Z u + ε$ , we express the multivariate model as $Y = X θ + e$ , where $e \sim$ MVN $(0, V)$ , with $V = Z Σ_{u} Z^{T} + R$ being a function of the variance components $τ$ . The computational methods follow the general framework described by Lindstrom and Bates (Lindstrom and Bates, 1988). Specifically, the standard estimators for $θ$ is the generalized least squares estimator $\hat{θ} (τ) = (X^{T} V^{- 1} X)^{- 1} X^{T} V^{- 1} Y$ and the posterior mean $\hat{u} (τ) = Σ_{u} Z^{T} V^{- 1} (Y - X \hat{θ} (τ))$ given $V$ . We used REML estimators for the variance components $τ$ , since they take into account the loss in degrees of freedom from the estimation of $θ$ . Estimation of variance components can, thus, be conducted via an iterative expectation-maximization (EM) algorithm on the transformed profile log-likelihoods given by $ℓ_{R} (τ) \propto \log | V | + \log | X^{T} V^{- 1} X | + (Y - X \hat{θ})^{T} V^{- 1} (Y - X \hat{θ})$ .

Robinson (1991) described an alternative method for deriving the best linear unbiased prediction (BLUP) of $β$ . A simple albeit ad hoc way is to obtain the ‘joint maximum likelihood estimate’ of both the fixed and random effects $β$ and $u$ using ‘Henderson's justification’ (Henderson, 1975). In the linear mixed model framework, $Y | u \sim MVN (X β + Z u, R)$ , $u \sim MVN (0, Σ_{u})$ and $[u, ε]^{T} \sim MVN (0, diag (Σ_{u}, R))$ . Maximizing the likelihood of $(Y, u)$ over unknown parameters $β$ and $u$ leads to

(Y - X β - Z u)^{T} R^{- 1} (Y - X β - Z u) + u^{T} Σ_{u}^{- 1} u .

(2.2)

For given values of $τ$ , the estimates of $β$ , $B$ and $γ$ are obtained by minimizing weighted penalized least square function

(Y - X β - Z_{B} B - Z_{R} γ)^{T} R^{- 1} (Y - X β - Z_{B} B - Z_{R} γ) + B^{T} Σ_{B}^{- 1} B + γ^{T} Σ_{Γ}^{- 1} γ,

where

Σ_{B}

is the variance–covariance matrix of the subject-specific random vector

B

, and

Σ_{Γ}

is the penalized smoothing matrix of

γ

This is equivalent to solving a weighted penalized least square problem

\begin{matrix} \hat{ζ} = \underset{β, B, γ}{argmin} {(Y - X β - Z_{B} B - Z_{R} γ)^{T} R^{- 1} (Y - X β - Z_{B} B - Z_{R} γ) + B^{T} Σ_{B}^{- 1} B + \sum_{m = 1}^{M} λ_{m}^{2 p} ∥ γ_{Rm} ∥^{2}} \end{matrix}

(2.3)

When fitting a $p$ th-order spline, the smoothing parameters, $λ_{m}$ , control the amount of trade-off between the goodness of fit of $η_{m}$ and the smoothness by imposing a penalty on the coefficients of $γ_{Rm}$ .

Mathematically, this is also equivalent to solving a generalized-weighted penalized least square problem

{\hat{ζ}}^{*} = \underset{β, γ}{argmin} {(Y - X β - Z_{R} γ)^{T} V^{- 1} (Y - X β - Z_{R} γ) + \sum_{m = 1}^{M} λ_{m}^{2 p} ∥ γ_{Rm} ∥^{2}}

(2.4)

Herein, $V = Z_{B} Σ_{B} Z_{B}^{T} + R$ is a working covariance matrix, which depends on one or more parameters in $τ$ in the case of heteroscedastic and correlated errors. Compared to the linear mixed model representation, the smoothing parameters in the penalized weighted least square equation yield $λ_{m}^{2 p} = 1 / σ_{γ_{m}}^{2}$ . See Appendix A for a sketch of the derivation of this expression.

2.4 Computation

We consider the computation of index components in a joint model setting with multivariate longitudinal data. The estimation process has three steps:

Step 1: Set the initial values of the index parameters to ${\hat{α}}^{(0)}$ . In the absence of information of the unknown parameters, we use ordinary least square estimates from the linear mixed effect model $Y = X β + Z u + ε$ as initial values. To ensure model identifiability, we normalize ${\hat{α}}^{(0)}$ so that $∥ {\hat{α}}^{(0)} ∥ = 1$ . We restrict the first element of ${\hat{α}}^{(0)}$ to a positive value.

Step 2: Given a specific set of values of ${\hat{α}}^{(0)}$ , we calculate the index ${v_{i}^{(0)} = (X_{i}^{*} {\hat{α}}^{(0)}) : i = 1, \dots, N}$ and obtain the BLUP estimates of $θ^{(0)} = (β^{(0) T} | α^{(0) T})^{T}$ . Hence, the likelihood given by equation (2.2) is maximized over the parameter of interest. Equivalently, the penalized weighted least square values given by equation (2.3) are minimized and denoted as ${\hat{L}}^{(0)}$ .

Step 3: We iteratively obtain $θ^{(k)}$ and ${\hat{L}}^{(k)}$ by updating the new index values $v_{i}^{(k)}$ till ${\hat{L}}^{(k)} - {\hat{L}}^{(k - 1)}$ converges to zero. This step involves the entire domain of $α$ in the optimization procedure. The knots used for the basis functions depend on $α$ since they are sample quantiles on ${X_{i} α : i = 1, \dots, N}$ .

Maximization of the likelihood function in Step 2 is implemented by using the R function lme(). The standard varFunc classes included in the nlme library are used to model the heteroscedastic variance functions across multivariate outcome measurements (Pinheiro and Bates, 2000). Step 3 is implemented by using the R function optim() to ensure the penalized weighted least squares function converges over the entire domain of $α$ . See Appendix B for additional details of the model-fitting algorithm.

2.5 Confidence interval estimate of the mean responses

We construct confidence intervals for the mean responses. Suppose $C_{x} = [X_{x}, Z_{x}]$ and ${\hat{y}}_{x} = X_{x} \hat{β} + Z_{x} \hat{u} = C_{x} \hat{ζ}$ , where $\hat{β}$ and $\hat{u}$ are the estimated BLUP of $β$ and $u$ and ${\hat{y}}_{x}$ is the estimated BLUP of $y_{x} = X_{x} β + Z_{x} u$ . Variability of the predicted value can be written as $var [{\hat{y}}_{x} | u] = [X_{x} Z_{x}] Cov ([\hat{β}, \hat{u}]^{T} | u) [X_{x} Z_{x}]^{T} = C_{x} Cov ([\hat{β}, \hat{u}]^{T} | u) C_{x}^{T}$ , and $(\tilde{β}, \tilde{u})$ can be expressed as $(C^{T} R^{- 1} C + Λ)^{- 1} C^{T} R^{- 1} y$ , where $C \equiv [X, Z]$ and $Λ =$ diag $(0, Σ_{u}^{- 1})$ .

Because $Cov (y | u) = R$ , we have $Cov ([\hat{β}, \hat{u}]^{T} | u) = (C^{T} R^{- 1} C + Λ)^{- 1} C^{T} R^{- 1} C (C^{T} R^{- 1} C + Λ)^{- 1}$ , which suggests that $Cov ([\hat{β}, \hat{u}]^{T} | u) ≅ (C^{T} {\hat{R}}^{- 1} C + \hat{Λ})^{- 1} C^{T} {\hat{R}}^{- 1} C (C^{T} {\hat{R}}^{- 1} C + \hat{Λ})^{- 1}$ . As a result, we have $\hat{st . dev .} [{\hat{y}}_{x} | u] = \sqrt{C_{x} {Cov}^{- 1} ([\hat{β}, \hat{u}]^{T} | u) C_{x}^{T}}$ . It then follows that an approximate $100 (1 - α) %$ confidence interval can be written as ${\hat{y}}_{x} \pm z_{1 - \frac{α}{2}} \cdot \hat{st . dev .} [{\hat{y}}_{x} | u]$ .

3 Simulation

We conduct an extensive simulation study to assess the finite sample performance of the proposed method under various parameter settings. We report the bias and standard errors (SEs) of the parameter estimates. We also compare the estimated index function curves and the true index curves. The overall fitness of the model is characterized by mean square error (MSE).

3.1 Data generation

We consider a scenario involving three correlated outcome variables. In each simulation, data are generated from a trivariate normal distribution. The three true index functions are $η_{1} (v) = e^{v}$ , $η_{2} (v) = v$ and $η_{3} (v) = v^{2}$ , which are chosen to represent both linear and nonlinear effects of the index on the outcomes. Specifically, the three outcome variables $(Y_{1; ij}, Y_{2; ij} and Y_{3; ij})$ are generated from the following models:

\{\begin{matrix} y_{1; ij} = exp (α_{1} x_{1 ij} + α_{2} x_{2 ij}) + β_{1} z_{i} + b_{1 i} + ε_{1; ij}, \\ y_{2; ij} = (α_{1} x_{1 ij} + α_{2} x_{2 ij}) + β_{2} z_{i} + b_{2 i} + ε_{2; ij}, \\ y_{3; ij} = (α_{1} x_{1 ij} + α_{2} x_{2 ij})^{2} + β_{3} z_{i} + b_{3 i} + ε_{3; ij} . \end{matrix}

The independent variables $x_{1 ij}$ and $x_{2 ij}$ are generated from Uniform $[0, 1]$ . Binary variable $z_{i}$ is generated from a Bernoulli distribution with Pr $(z_{i} = 1) = 0.3$ . The random subject effects $(b_{1 i}, b_{2 i}, b_{3 i})^{T} \sim$ MVN $(0, Σ_{b})$ , where $Σ_{b}$ has elements $σ_{1}^{2} = 0.4$ , $σ_{2}^{2} = 0.2$ and $σ_{3}^{2} = 0.3$ and $ρ_{12} = 0.25$ , $ρ_{13} = 0.50$ and $ρ_{23} = 0.75$ , which are chosen to represent various levels of association among the multiple outcomes (Hinkle et al., 2003). The random error vectors $(ε_{1; ij}, ε_{2; ij}, ε_{3; ij})^{T}$ are generated from MVN $(0, Σ_{ε})$ to introduce heteroscedasticity. For simplicity, only independent errors are considered, with $σ_{ε}^{2} = 0.1$ and two scale parameters $δ_{2} = 0.8$ and $δ_{3} = 0.6$ .

We consider four different sample sizes: Number of subjects varies between $N = 50$ and $N = 100$ ; each subject is assumed to have $n_{i} = 5$ or $n_{i} = 10$ repeated observations. For each sample size, we generate 200 datasets. We use 20 knots to fit cubic spline models. For well-behaved functions, such a number of knots is considered sufficient to ensure the desired flexibility (Crainiceanu et al., 2005). In each iteration, the knots are computed and selected at equally spaced quantiles of the estimated index values $v$ (Ruppert et al., 2002). The choice of the smoothing parameters $λ_{1}$ , $λ_{2}$ and $λ_{3}$ in this procedure is based on the mixed model representation and computed by the inverse of the estimated variability of truncated line functions.

3.2 Performance assessment

We compare the estimated values of parameters against the true values. The parameter estimation results, including the mean values of the parameter estimates (Mean), SE, bias and MSE, are summarized in Table 1. The simulation shows that the estimated coefficient values are close to the true values in all cases, and the SEs of the estimates are generally small. In addition, correlated structures and heteroscedasticity among the outcomes are correctly exhibited. Not surprisingly, MSE of each parameter estimates decreases as sample size and the number of repeated measures increase.

Table 1:
Parameter estimation: True $(α_{1}, α_{2}) = \frac{1}{\sqrt{5}} (1, - 2) = (0.4472, - 0.8944), β_{1} = 1.6, β_{2} = 0.5, β_{3} = - 2.7, ρ_{12} = 0.25, ρ_{13} = 0.50, ρ_{23} = 0.75, δ_{2} = 0.8, δ_{3} = 0.6$

$n_{i} = 5$ $n_{i} = 10$

Parameter N Mean SE Bias MSE( $10^{- 4}$ ) Mean SE Bias MSE( $10^{- 4}$ )

$α_{1}$ 50 0.4387 0.0030 -0.0085 0.8158 0.4414 0.0021 -0.0058 0.3837

100 0.4462 0.0030 -0.0010 0.0721 0.4438 0.0015 -0.0034 0.1353

$α_{2}$ 50 -0.8974 0.0014 -0.0030 0.1131 -0.8967 0.0010 -0.0023 0.0646

100 -0.8941 0.0012 0.0003 0.0163 -0.8958 0.0007 -0.0014 0.0252

$β_{1}$ 50 1.5998 0.0148 -0.0002 2.1926 1.6138 0.0135 0.0138 3.7421

100 1.5912 0.0106 -0.0088 1.8852 1.5864 0.0097 -0.0136 2.8092

$β_{2}$ 50 0.5061 0.0095 0.0061 1.2699 0.4936 0.0099 -0.0064 1.3924

100 0.4959 0.0068 -0.0041 0.6325 0.5035 0.0071 0.0035 0.6224

$β_{3}$ 50 -2.6948 0.0116 0.0052 1.6168 -2.7099 0.0119 -0.0099 2.3841

100 -2.7053 0.0084 -0.0053 0.9869 -2.7049 0.0089 -0.0049 1.0286

$ρ_{12}$ 50 0.2442 0.0107 -0.0058 1.4827 0.2566 0.0098 0.0066 1.3909

100 0.2423 0.0071 -0.0077 1.0912 0.2533 0.0075 0.0033 0.6650

$ρ_{13}$ 50 0.4974 0.0079 -0.0026 0.6924 0.5045 0.0077 0.0045 0.7885

100 0.5045 0.0059 0.0045 0.5474 0.4963 0.0058 -0.0037 0.4709

$ρ_{23}$ 50 0.7447 0.0055 -0.0053 0.5802 0.7421 0.0053 -0.0079 0.8977

100 0.7496 0.0038 -0.0004 0.1430 0.7533 0.0032 0.0033 0.2155

$δ_{1}$ 50 0.8047 0.0081 0.0047 0.8649 0.8075 0.0049 0.0075 0.8001

100 0.8063 0.0059 0.0063 0.7406 0.8002 0.0038 0.0002 0.1449

$δ_{2}$ 50 0.5947 0.0054 -0.0005 0.5717 0.6050 0.0040 0.0050 0.4152

100 0.6013 0.0044 0.0013 0.2082 0.5980 0.0025 -0.0020 0.1023

		$n_{i} = 5$	$n_{i} = 10$
$α_{1}$	50	0.4387	0.0030	-0.0085	0.8158	0.4414	0.0021	-0.0058	0.3837
	100	0.4462	0.0030	-0.0010	0.0721	0.4438	0.0015	-0.0034	0.1353
$α_{2}$	50	-0.8974	0.0014	-0.0030	0.1131	-0.8967	0.0010	-0.0023	0.0646
	100	-0.8941	0.0012	0.0003	0.0163	-0.8958	0.0007	-0.0014	0.0252
$β_{1}$	50	1.5998	0.0148	-0.0002	2.1926	1.6138	0.0135	0.0138	3.7421
	100	1.5912	0.0106	-0.0088	1.8852	1.5864	0.0097	-0.0136	2.8092
$β_{2}$	50	0.5061	0.0095	0.0061	1.2699	0.4936	0.0099	-0.0064	1.3924
	100	0.4959	0.0068	-0.0041	0.6325	0.5035	0.0071	0.0035	0.6224
$β_{3}$	50	-2.6948	0.0116	0.0052	1.6168	-2.7099	0.0119	-0.0099	2.3841
	100	-2.7053	0.0084	-0.0053	0.9869	-2.7049	0.0089	-0.0049	1.0286
$ρ_{12}$	50	0.2442	0.0107	-0.0058	1.4827	0.2566	0.0098	0.0066	1.3909
	100	0.2423	0.0071	-0.0077	1.0912	0.2533	0.0075	0.0033	0.6650
$ρ_{13}$	50	0.4974	0.0079	-0.0026	0.6924	0.5045	0.0077	0.0045	0.7885
	100	0.5045	0.0059	0.0045	0.5474	0.4963	0.0058	-0.0037	0.4709
$ρ_{23}$	50	0.7447	0.0055	-0.0053	0.5802	0.7421	0.0053	-0.0079	0.8977
	100	0.7496	0.0038	-0.0004	0.1430	0.7533	0.0032	0.0033	0.2155
$δ_{1}$	50	0.8047	0.0081	0.0047	0.8649	0.8075	0.0049	0.0075	0.8001
	100	0.8063	0.0059	0.0063	0.7406	0.8002	0.0038	0.0002	0.1449
$δ_{2}$	50	0.5947	0.0054	-0.0005	0.5717	0.6050	0.0040	0.0050	0.4152
	100	0.6013	0.0044	0.0013	0.2082	0.5980	0.0025	-0.0020	0.1023

Figure 1 presents the average cubic-spline estimates for the three correlated outcomes and the corresponding $2.5 %$ and $97.5 %$ quantiles based on $200$ simulated datasets. As depicted, the average cubic-spline fit obtained from the proposed procedure correctly captures both nonlinear and linear features of the true functions (exponential function in Figure 1a, linear function in Figure 1b and square function in Figure 1c). The average integrated MSE over the three fitted index functions is $7.24 \times 10^{- 2}$ , the average integrated squared bias is $0.46 \times 10^{- 2}$ and the average integrated variance is $6.78 \times 10^{- 2}$ . At the same time, both $2.5 %$ and $97.5 %$ quantiles are close to the true curves, showing a low level of variation in the estimates. As expected, wider confidence bands are observed for $z = 1$ compared to $z = 0$ , reflecting the relative variability from the categorical covariate, $z$ , with Bernoulli probability $p = 0.3$ .

Figure 1:

Estimated index functions and the corresponding confidence bands. Solid curves are the true functions; the dashed curves are the average cubic-spline fit over 200 simulations. The dot-dashed curves are the corresponding 2.5% and 97.5% quantiles.

In summary, the simulation study shows that both the index components and curvature of index functions are accurately recovered; other parameters associated with the multivariate linear models are also reliably estimated. The coverage probabilities of the confidence band are close to the nominal level, thus confirming that the proposed algorithm works well in tested data settings.

In the current simulated datasets, only three positively correlated outcomes are considered. A separate simulation is conducted to evaluate the performance of the proposed method with a larger number of outcomes (six) with both positive and negative correlations. Again, the method performs as expected (additional simulation results are shown in Table 2).

Table 2:

Parameter estimation: Models for six outcomes with both positive and negative correlations. True index functions are all the same for the six outcomes: $η (α^{T} X) = exp (α^{T} X)$ . Results are based on $200$ simulations

	$N = 50$				$N = 100$
Parameter	Mean	SE	Bias	MSE( $10^{- 4}$ )	Mean	SE	Bias	MSE( $10^{- 4}$ )
$α_{1} = 0.5547$	0.5500	0.0207	-0.0047	4.5058	0.5545	0.0150	-0.0002	2.2504
$α_{2} = - 0.8320$	-0.8346	0.0014	-0.0026	0.0872	-0.8319	0.0009	0.0001	0.0082
$ρ_{56} = 0.6$	0.5846	0.0069	-0.0154	2.8477	0.5861	0.0052	-0.0139	2.2025
$ρ_{46} = 0.6$	0.5928	0.0066	-0.0072	1.9540	0.5877	0.0049	-0.0123	1.7530
$ρ_{45} = 0.6$	0.5856	0.0077	-0.0144	2.6665	0.5928	0.0054	-0.0072	0.8100
$ρ_{36} = - 0.2$	-0.2148	0.0104	-0.0148	3.2720	-0.2051	0.0068	-0.0051	0.7225
$ρ_{35} = - 0.2$	-0.2024	0.0118	-0.0024	1.4500	-0.1966	0.0068	0.0034	0.5780
$ρ_{34} = - 0.2$	-0.2110	0.0107	-0.0110	2.3549	-0.1999	0.0070	0.0001	0.4901
$ρ_{26} = - 0.2$	-0.1792	0.0119	0.0208	5.7425	-0.1979	0.0072	0.0021	0.5625
$ρ_{25} = - 0.2$	-0.1845	0.0121	0.0155	3.8667	-0.1900	0.0069	0.0100	1.4761
$ρ_{24} = - 0.2$	-0.1997	0.0115	0.0003	1.3234	-0.1968	0.0070	0.0032	0.5924
$ρ_{23} = 0.4$	0.4068	0.0102	0.0068	1.5028	0.3938	0.0062	-0.0062	0.7688
$ρ_{16} = - 0.2$	-0.1762	0.0112	0.0238	6.9188	-0.1928	0.0072	0.0072	1.0368
$ρ_{15} = - 0.2$	-0.1697	0.0122	0.0303	1.0669	-0.1930	0.0063	0.0070	0.8869
$ρ_{14} = - 0.2$	-0.1906	0.0122	0.0094	2.3720	-0.1958	0.0067	0.0042	0.6253
$ρ_{13} = 0.4$	0.3851	0.0102	-0.0149	3.2605	0.4010	0.0063	0.0010	0.4069
$ρ_{12} = 0.4$	0.4023	0.0090	0.0023	1.8629	0.4079	0.0061	0.0079	0.9962
$δ_{1} = 0.80$	0.8083	0.0087	0.0083	1.4458	0.7999	0.0063	-0.0001	0.3970
$δ_{2} = 0.75$	0.7714	0.0096	0.0214	5.5012	0.7530	0.0058	0.0030	0.4264
$δ_{3} = 0.70$	0.7111	0.0085	0.0111	1.9546	0.7047	0.0056	0.0047	0.5345
$δ_{4} = 0.65$	0.6654	0.0076	0.0154	2.9492	0.6590	0.0051	0.0090	1.0701
$δ_{5} = 0.60$	0.6178	0.0069	0.0178	3.3533	0.6079	0.0046	0.0079	0.8357

4 Application

To illustrate the proposed method, we construct an adiposity index based on waist girth (WG) and subscapularis skinfold (SS) for the prediction of SBP and DBP. Data were obtained from a prospective observational study; participants were children recruited from schools in Indianapolis, Indiana. Detailed study protocol was described by Tu et al., (2009). Briefly, blood pressure, WG and SS were assessed repeatedly during the course of follow-up. Preliminary data exploration showed that both WG and SS were positively associated with SBP and DBP.

We assume that the index takes the form $f [α_{1} log (WG) + α_{2} log (SS)]$ without specifying the values of $α_{1}$ and $α_{2}$ . The functional forms of $η_{s} (\cdot)$ and $η_{d} (\cdot)$ are to be estimated from the observed data:

\{\begin{matrix} {SBP}_{ij} = η_{s} (α_{1} log {WG}_{ij} + α_{2} log {SS}_{ij}) + β_{s}^{T} W_{ij} + U_{i}^{s} + ε_{ij}^{s} \\ {DBP}_{ij} = η_{d} (α_{1} log {WG}_{ij} + α_{2} log {SS}_{ij}) + β_{d}^{T} W_{ij} + U_{i}^{d} + ε_{ij}^{d} \end{matrix}

We fit the model using a subset of the study data, where all subjects had at least seven follow-up visits.

The example dataset included 468 children (224 males). The mean age of the children at study entry was 13 years. Besides WG and SS as index components, we included in the model age and sex as fixed effect covariates. A random subject effect was also included to accommodate the within-subject correlations.

Figure 2:

Spline estimates and $95 %$ confidence bands for the systolic and diastolic blood pressure, stratified by sex. Solid curves are for systolic blood pressure and dashed curves are for diastolic blood pressure.

Table 3:

Summary of the fitted index models

	Parameter (Standard Error)
Response	logWG	logSF	Age	Male
SBPa	0.8951 (0.0006)	0.4460 (0.0014)	$-$ 0.1569 (0.0520)	6.9844 (0.6781)
DBP^b	0.8951 (0.0006)	0.4460 (0.0014)	$-$ 0.1646 (0.0513)	1.0307(0.6059)

Note:

Penalized parameter estimates of $λ$ is 0.8671.

Penalized parameter estimates of $λ$ is 0.8623.

Model fitting results are presented in Table 3. The associations between the values of the new index and SBP and DBP in male and female subjects are graphically presented in Figure 2. The derived index is positively correlated with both SBP and DBP. The $95 %$ pointwise confidence band, stratified by sex, is quite narrow, suggesting generally good precision of the mean estimates. The residual mean squared error (ReMSE) of the model is 54.1410. Using a modified $R^{2}$ for mixed effect models developed by Xu (2003), we obtained an $R^{2}$ value of $0.44$ , which conveys a sense of the proportion of variance explained by the model. The index functions in the current application are all well behaved and the number of parameters are relatively small, in comparison of the sample size. So, the risk of overfitting is minimal.

Scientifically, our data showed that both WG and SS are positively associated with the elevation of blood pressure. But, WG has a greater contribution to blood pressure than SS, as indicated by the magnitude of their respective index coefficients, which are at an approximately 2:1 ratio. In the current study, WG is an approximation of the abdominal fat, while SS measures subcutaneous fat. Previously published data have consistently shown that body fat distribution plays an important role in the development of obesity-related hypertension, and that increased visceral adiposity (such as that measured by WG), more than subcutaneous adiposity, contributes to increased risk of hypertension (Chandra et al., 2014). Although the precise mechanisms have not been fully elucidated, they most likely involve the increase of insulin resistance (Fain et al., 2004; Fox et al., 2007) and possibly the alteration of the renin–aldosterone axis (Yu et al., 2013). Regardless of the mechanisms, our data once again confirm the harm of central fat cumulation.

5 Discussion

Derivation of useful medical indices that correlate with multiple health outcomes is an issue of great practical importance. In this research, we provide a new tool to assist index development. By extending the partially linear single-index model to a multivariate setting, we have developed a single-index model that allows investigators to analytically derive clinical indices that work for multiple clinical outcomes.

In this article, we present the basic construction of the index model, as well as related model fitting procedures. Our simulation study shows that the new method has an excellent performance in estimation accuracy and computational efficiency. The model formulation is quite general and can accommodate longitudinal measures of multiple outcomes. Besides the index function, the model also includes other fixed and random effects. The index functions are modelled by cubic splines and estimated using a penalized least square method. As we have shown in our simulation studies, both index components and curvature of the index functions are recovered accurately. The relatively narrow confidence bands associated with fitted curves further attest to the model's estimation efficiency. Finally, as an index development tool, the method can be implemented in most computing platforms with existing software; thus, it has the potential to be used by practitioners in a wide variety of applications.

Footnotes

Acknowledgments

This work was supported by National Institutes of Health Grants RO1-HL095086, U54 CA 190151, and P30 HS024384.

Appendix A

Consider a general multivariate p-spline model with $K$ knots:

y_{m; i} = f_{m} (x_{i}) + ε_{m; i}, i = 1, \dots, N; \forall m = 1, \dots, M,

where the

i

th response of the

m

th outcome

y_{m; i}

is generated by function

f_{m}

with a covariate

x_{i}

plus a random error

ε_{m; i}

. We assume

ε_{m; i} \sim i . i . d . N (0, σ_{ε}^{2} δ_{m})

for each fixed

m

. We estimate each smooth function

f_{m}

by using a p-spline with degree of

p

, written as: (A.1)

f_{m} (x_{i}) = β_{0}^{m} + β_{1}^{m} x_{i} + \dots + β_{p}^{m} x_{i}^{p} + \sum_{k = 1}^{K} β_{pk}^{m} (x_{i} - κ_{k})_{+}^{p} .

We consider fitting $p$ th-degree spline by penalized least squares. We denote $y_{m} = (y_{m; 1}, \dots, y_{m; N})^{T}$ , $x_{i} = (1, x_{i}, \dots, x_{i}^{p})$ , $z_{i} = ((x_{i} - κ_{1})_{+}^{p}, \dots, (x_{i} - κ_{K})_{+}^{p})$ , $X_{m} = (x_{1}^{T}, \dots, x_{N}^{T})^{T}$ , $Z_{m} = (z_{1}^{T}, \dots, z_{N}^{T})^{T}$ , $β_{m} = (β_{0}^{m}, \dots, β_{p}^{m})^{T}$ , $γ_{m} = (β_{p 1}^{m}, \dots, β_{pK}^{m})^{T}$ and $ζ_{m} = (β_{m}^{T}, γ_{m}^{T})^{T}$ . We further define $Y = (y_{1}^{T}, \dots, y_{M}^{T})^{T}$ , $C = (I_{M} \otimes X_{m}, I_{M} \otimes Z_{m}) \equiv (X ∣ Z)$ and $ζ = [(β_{m}^{T})_{1 \leq m \leq M}, (γ_{m}^{T})_{1 \leq m \leq M}]^{T}$ , so the weighted penalized least squares fit can be written as

minimize (Y - C ζ)^{T} V^{- 1} (Y - C ζ) subject to ζ_{m}^{T} D ζ_{m} < d_{m} .

Here $V$ is the heterostatic random errors matrix with $V = Σ_{ε} \otimes I_{N}$ , and $Σ_{ε}$ is the variance–covariance matrix, whose diagonal elements are $σ_{ε_{1}}^{2}, \dots, σ_{ε_{M}}^{2}$ and the off-diagonal elements are the pairwise covariances.

Matrix $D$ has a specific structure,

D = [\begin{matrix} 0_{(p + 1) \times (p + 1)} & 0_{(p + 1) \times K} \\ 0_{K \times (p + 1)} & I_{K \times K} \end{matrix}]

which corresponds to the constraints on

γ_{m}

, so that

∥ γ_{m} ∥^{2} < d_{m}

Using the Lagrange multiplier method, the above minimization is equivalent to solving $ζ$ to minimize

(Y - C ζ)^{T} V^{- 1} (Y - C ζ) + λ_{1}^{2 p} ζ_{1}^{T} D ζ_{1} + \dots + λ_{M}^{2 p} ζ_{M}^{T} D ζ_{M}

for some

λ_{m} > 0

. In other words, each term

λ_{m}^{2 p} ζ_{m}^{T} D ζ_{m}

penalizes the fit of function

f_{m}

associated with

m

th outcome

y_{m}

. This is equivalent to solving the following minimization problem with regard to

ζ

(Y - C ζ)^{T} V^{- 1} (Y - C ζ) + ζ^{T} Λ ζ,

where

Λ

has a block-diagonal form,

Λ = diag (0 \cdot 1_{M (p + 1)}, λ_{1}^{2 p} \cdot 1_{K}, \dots, λ_{M}^{2 p} \cdot 1_{K})

This has the solution (A.2)

\hat{ζ} = (C^{T} V^{- 1} C + Λ)^{- 1} C^{T} V^{- 1} Y .

On the other hand, if we assume $β_{pk}^{m} \sim N (0, σ_{γ_{m}}^{2})$ , the aforementioned multivariate p-spline model (equation A.1) corresponds to the mixed effect model framework. Considering BLUP in the mixed model criterion, we have (A.3)

[\begin{matrix} \hat{β} \\ \hat{γ} \end{matrix}] = (C^{T} V^{- 1} C + Q)^{- 1} C^{T} V^{- 1} Y,

where

C \equiv (X, Z)

and

Q \equiv [\begin{matrix} 0 & 0 \\ 0 & Σ_{γ}^{- 1} \end{matrix}]

Indeed, $Cov (γ) = [{blockdiagonal}_{1 \leq m \leq M} Σ_{γ_{m}}]$ , where $Σ_{γ_{m}} =$ diag $(σ_{γ_{m}}^{2} \cdot 1_{K})$ and $γ_{m} \sim N (0, Σ_{γ_{m}})$ . Therefore, if we set $λ_{m}^{2 p} = 1 / σ_{γ_{m}}^{2}$ , the solution of equations A.2 and A.3 are equivalent.

Extension to random effect models, where the subject-specific random vector $B$ has variance–covariance matrix $Σ_{B} = (Σ_{b} \otimes I_{N})$ , does not change the equivalence relationship between $λ_{m}^{2 p}$ and $σ_{γ_{m}}^{2}$ . This is quite simplistic and straightforward because such a model assumes

Cov (γ) = [\begin{matrix} B^{- 1} & 0 \\ 0 & Σ_{γ}^{- 1} . \end{matrix}]

The block-diagonal structure ensures the invariant relationship of

λ_{m}^{2 p} = 1 / σ_{γ_{m}}^{2}

in random effect models.

Appendix B

Example data are from an ongoing study. We do not have permission to publish the raw data. But, we will be happy to assist the interested parties to obtain data access with signed data use agreement. For further details, please contact Wanzhu Tu at Indiana University School of Medicine (wtu1@iu.edu).

For implementation, we use the lme function from the R package nlme to fit the application data under the proposed multivariate single-index model setting (Pinheiro et al., 2015). The Comprehensive R Archive Network (CRAN, retrieved from https://cran.r-project.org/) provides all downloadable code and documentation.

The dataset is structured as a data frame (named bpdataall), where one subject (indicated by Id) may have multiple visits. Two columns of blood pressure measures (one for systolic and one for diastolic) are indicated by type. Other columns include index components lgwaistgir, lgsubscap and subject-level covariates gender, age.

We obtain the initial values for index component parameters ${\hat{α}}^{(0)}$ by fitting linear models in R using the lm function. After ${\hat{α}}^{(0)}$ is normalized, we can compute the initial index values xsim. Furthermore, we follow Durbán (Durbán et al., 2005) to set the location of the knots used to construct the basis of the two p-splines and the design matrix:

fix.x < - cbind(age, gender)

nknots < - 20

knots < - quantile(unique(xsim), seq(0,1,len=(nknots+2)))[-c(1,nknots+2)]

Z < - outer(xsim, knots, ’-’)

Z.fit1 < - Z**3 * (Z>0)

Z.fit1.bin < - rbind(Z.fit1, matrix(rep(0, nknots*length(y)/2), length(y)/2, nknots))

Z.fit2.bin < - rbind(matrix(rep(0, nknots*length(y)/2), length(y)/2, nknots), Z.fit1)

null.zero < - rep(0,length(Id))

null.one < - rep(1,length(Id))

zero.4 < - matrix(rep(0,length(Id)*4),length(Id),4)

zero.2 < - matrix(rep(0,length(Id)*2),length(Id),2)

design.x1 < - cbind(null.one, xsim, I(xsim**2), I(xsim**3), zero.4, fix.x, zero.2)

design.x2 < - cbind(zero.4, null.one, xsim, I(xsim**2), I(xsim**3), zero.2, fix.x)

design.x < - rbind(design.x1, design.x2)

We set up object fmla, which contains a symbolic model formula: as.formula(paste(”y $\sim$ ”, paste(dimnames(design.x)[[2]], collapse= ”+”), ”-1”)). To perform the model-fitting procedure at each iterative step, the following command is executed after loading the nlme package in an R environment:

fit2.bp < - lme(fmla, data=bpalldata,

random=list(subject=pdSymm( $\sim$ type-1),

Id=pdBlocked(list(pdIdent( $\sim$ Z.fit1.bin-1),pdIdent( $\sim$ Z.fit2.bin-1)))),

weights=varIdent(form= $\sim$ 1|type),

control=lmeControl(opt=”optim”)

)

We assume that the random subject effect is normally distributed with a compound symmetry variance matrix, and each one of the random effects for penalized elements follows a normal distribution with an identity variance matrix $Σ_{u} =$ diag $(Σ_{B}, Σ_{Γ})$ ; the combined random effect $u$ distribution is specified by argument random= list(), where two random components are included, subject=pdSymm( $\sim$ type-1) and Id=pdBlocked(list(pdIdent( $\sim$ Z.fit1.bin-1), pdIdent( $\sim$ Z. fit2.bin-1)))). We allow different variances for SBP and DBP via weights= varIdent(form= $\sim$ 1|type). The estimation algorithm is controlled by control=lmeControl(opt=”optim”), where the optimizer uses a quasi-Newton method (also known as BFGS algorithm) (Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970).

We define a function pwresim which retains the negative log-likelihood from lme at each iterative step, and we use this user-defined function as one argument in function optim from stat package. The optimization is specified by the command optim(pars, pwresim,...). optim evaluates the function with that argument and performs minimization with a one-dimensional pars=c( $α_{1}, α_{2}$ ).

References

Broyden

(1970) The convergence of single-rank quasi-newton methods. Mathematics of Computation , 24, 365–82.

Chandra

Neeland

Berry

Ayers

Rohatgi

Das

Khera

MeGuire

de Lemos

Turer

(2014) The relationship of body mass and fat distribution with incident hypertension: Observations from the dallas heart study. Journal of the American College of Cardiology , 64, 997–1002.

Crainiceanu

Ruppert

Wand

(2005) Bayesian analysis for penalized spline regression using winbugs. Journal of Statistical Software , 14, 1–24.

Durbán

, Harezlak

Wand

Carroll

(2005) Simple fitting of subject-specific curves for longitudinal data. Statistics in Medicine , 24, 1153–67.

Eilers

Marx

(1996) Flexible smoothing with B-splines and penalties. Statistical Science , 11, 89–121.

Fain

Madan

Hiler

Cheema

Bahouth

(2004) Comparison of the release of adipokines by adipose tissue, adipose tissue matrix, and adipocytes from visceral and subcutaneous. Endocrinology , 145, 2773–82.

Fletcher

(1970) A new approach to variable metric algorithm. Computer Journal , 13, 317–22.

Fox

Massaro

Hoffmann

Pou

Maurovich-Horvat

Liu

Vasan

Murabito

Meigs

Cupples

D'Agostino

O'Donnel

(2007) Abdominal visceral and subcutaneous adipose tissue compartments: Association with metabolic risk factors in the framingham heart study. Circulation , 16, 39–48.

Goldfarb

(1970) A family of variable metric updates derived by variational means. Mathematics of Computation , 24, 23–26.

10.

Härdle

Hall

Ichimura

(1993) Optimal smoothing in single-index models. The Annals of Statistics , 21, 157–78.

11.

Härdle

Stoker

(1989) Investing smooth multiple regression by the method of average derivatives. Journal of the American Statistical Association , 84, 986–95.

12.

Henderson

(1975) Best linear unbiased estimation and prediction under a selection model. Biometrics , 31, 423–47.

13.

Hinkle

Wiersma

Jurs

(2003) Applied Statistics for the Behavioral Sciences . Boston, MA: Houghton Mifflin.

14.

Ichimura

(1993) Semiparametric least squares (SLS) and weighted sls estimation of single-index models. Econometrica , 58, 71–120.

15.

(1991) Sliced inverse regression for dimension reduction. Journal of the American Statistical Association , 86, 316–42.

16.

Lindstrom

Bates

(1988) Newton-Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association , 83, 1014–22.

17.

O'Sullivan

(1986) A statistical perspective on ill-posed inverse problems. Statistical Science , 1, 502–18.

18.

Pinheiro

Bates

(2000) Mixed-effects Models in S and S-PLUS . Rensselaer, NY: Springer.

19.

Pinheiro

Bates

DebRoy

Sarkar

R Core Team (2015) nlme: Linear and nonlinear mixed effects models. Retrieved 5 May 2015, from http://CRAN.R-project.org/package=nlme. (R package version 3.1–122).

20.

Robinson

(1991) That BLUP is a good thing: The estimation of random effects. Statistical Science , 6, 15–32.

21.

Ruppert

Wand

Carroll

(2002) Semiparametric Regression . Cambridge: Cambridge University Press.

22.

Shanno

(1970) Conditioning of quasi-Newton methods for function minimization. Mathematics of Computation , 24, 647–56.

23.

Stoker

(1986) Consistent estimation of scaled coefficients. Econometrica , 54, 1461–81.

24.

Eckert

Saha

Pratt

(2009) Synchronization of adolescent blood pressure and pubertal somatic growth. The Journal of Clinical Endocrinology and Metabolism , 94, 5019–22.

25.

(2013) Development of a pediatric body mass index using longitudinal single-index models. Statistical Methods in Medical Research (Epub on 8 January 2013).

26.

Xia

Hardle

(2006) Semi-parametric estimation of partially linear single index models. Journal of Multivariate Analysis , 97, 1162–84.

27.

Xia

Tong

Zhu

(2002) An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society: Series B , 64, 363–410.

28.

(2003) Measuring explained variation in linear mixed effects models. Statistics in Medicine , 22, 3527–41.

29.

Ruppert

(2002) Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association , 97, 1042–54.

30.

Eckert

Liu

Pratt

(2013) Adiposity has unique influence on the renin-aldosterone axis and blood pressure in black children. The Journal of Pediatrics , 163, 1317–22.

A multivariate single-index model for longitudinal data

Abstract

Abstract:

Keywords

1 Introduction

2 A multivariate single-index model

2.1 Univariate single-index model

2.2 Specification of a multivariate single-index model

2.5 Confidence interval estimate of the mean responses

3 Simulation

3.1 Data generation

3.2 Performance assessment

Estimated index functions and the corresponding confidence bands. Solid curves are the true functions; the dashed curves are the average cubic-spline fit over 200 simulations. The dot-dashed curves are the corresponding 2.5% and 97.5% quantiles.

Figure 2:

Spline estimates and 95 % confidence bands for the systolic and diastolic blood pressure, stratified by sex. Solid curves are for systolic blood pressure and dashed curves are for diastolic blood pressure.

Footnotes

Acknowledgments

Appendix A

Appendix B

References

Spline estimates and $95 %$ confidence bands for the systolic and diastolic blood pressure, stratified by sex. Solid curves are for systolic blood pressure and dashed curves are for diastolic blood pressure.