A Spatial-Correlated Multitask Linear Mixed-Effects Model for Imaging Genetics

Abstract

Imaging genetics aims to uncover the hidden relationship between imaging quantitative traits (QTs) and genetic markers [e.g., single nucleotide polymorphism (SNP)] and brings valuable insights into the pathogenesis of complex diseases, such as cancers and cognitive disorders (e.g., Alzheimer’s disease). However, most linear models in imaging genetics did not explicitly model the inner relationship among QTs, which might miss some potential efficiency gains from information borrowing across brain regions. In this work, we developed a novel Bayesian regression framework for identifying significant associations between QTs and genetic markers while explicitly modeling spatial dependency between QTs, with the main contributions as follows. First, we developed a spatial-correlated multitask linear mixed-effects model to account for dependencies between QTs. We incorporated a population-level mixed-effects term into the model, taking full advantage of the dependent structure of brain imaging-derived QTs. Second, we implemented the model in the Bayesian framework and derived a Markov chain Monte Carlo (MCMC) algorithm to achieve the model inference. Further, we incorporated the MCMC samples with the Cauchy combination test to examine the association between SNPs and QTs, which avoided computationally intractable multitest issues. The simulation studies indicated improved power of our proposed model compared with classical models where inner dependencies of QTs were not modeled. We also applied the new spatial model to an imaging dataset obtained from the Alzheimer’s Disease Neuroimaging Initiative database (https://adni.loni.usc.edu). The implementation of our method is available at https://github.com/ZhibinPU/spatialmultitasklmm.git.

Keywords

Bayesian inference imaging genetics linear mixed-effects model spatial dependency

1. INTRODUCTION

Imaging genetics uncovers potential risk genes by analyzing their effects on organisms (e.g., brain structure and lung cancer) and further explains for the pathogenic mechanism of target disorders (Gerber et al., 2009). The past decade has witnessed discovery of numerous gene markers in this research field. The single nucleotide polymorphism (SNP) rs1344706 was shown to impact gray matter volumes in several brain regions and further contribute to schizophrenia (Lencz et al., 2010); rs2365715, rs3762515, and rs67827860 were found associated with white matter lesions causing main depressive disorders (Elliott et al., 2018); rs4348791 ranked the top in the relationship analysis, with phenotypes derived from the left caudate nucleus, which accounted for impaired neural development (Wang et al., 2022). Genome-wise association studies (GWAS) have been widely used in imaging genetics by replacing the one-hot encoding (case–control status) with quantitative traits (QTs) derived from images. Unlike univariate phenotype in classical GWAS, QTs derived from images usually are multidimensional and spatial correlated. Some studies applied voxel-wise approaches which treat imaging traits separately, such as voxel-wise GWAS (vGWAS) and its faster version—a univariate linear model (LM) for critical correlations between each SNP and voxel (Stein et al., 2010; Hibar et al., 2011; Huang et al., 2015). However, QTs in these models were considered independent, potentially overlooking interactive connections among them. Besides, QTs in most imaging genetic studies were summarized from a voxel-wise level into a coarse level, for example, region of interest (Wang et al., 2012; Greenlaw et al., 2017; Zhu et al., 2014). This “trick” reduces the computational burden by sacrificing partial spatial information in the whole-brain images to some extent.

Besides the massive univariate analysis, the multivariate analysis has also been widely used in genetic studies to explore the joint effect of correlated SNPs on a single phenotype. These models usually require a significant reduction in the dimensionality of data; thus, sparse regression technologies are introduced. One line of research is adding regularization to select variables of interest: only a relatively small subset of massive genetic variants has significant effects on the phenotype of interest. This is usually achieved by posing $L_{1}$ or $L_{2}$ norm constraints on the regression model, such as the Group Lasso (Vounou et al., 2010; Yuan and Lin, 2006). The sparsity in these models is often controlled by a hyperparameter named regularization factor.

Another line of research performs decomposition techniques and then iterates in a greedy way to obtain the coefficient matrix, such as the low-rank linear regression model (L2RM) (Kong et al., 2020). To be precise, sparse constraints are imposed on the coefficient matrix, where sparsity is implicitly determined by singular values instead of regularization. These multivariate methods for GWAS offers new benefits over univariate framework: they have not only brought enhanced predictive performance due to correlated genotypes but also alleviated the multiple testing problem to avoid the loss of efficiency in handling high-throughput datasets.

Despite these advantages, a limitation of most association studies in imaging genetics is only furnishing a point estimate of the coefficient matrix but dismissing the uncertainty. Little and Rubin (1987) proposed a Bayesian multivariate normal model that obtained high efficiency by fitting covariance between QTs. However, the performance suffers in datasets with high relatedness due to the ignorance of correlations between samples. Dahl et al. (2016) proposed a Bayesian multiple-phenotype mixed model, which uses matrix normal distributions with the Kronecker products of row and column covariance matrices. Though this approach considered the correlations between both samples and traits and achieved better performance over other approaches on five simulated datasets, it mainly focused on the imputation of missing phenotypes rather than on the efficiency of computing effect sizes. Besides, Song et al. (2022) proposed a Bayesian spatial model to accommodate the correlation structures in brain images by a proper bivariate conditional autoregressive (CAR) process. The model was designed for assessing the association between a moderate number of QTs and SNPs (a few hundred to a few thousand), and it may lose its efficiency in handling large-scale input data. Nevertheless, these models are valuable attempts to perform Bayesian inference in imaging genetic studies.

Association tests often identify significant loci associated with given phenotypes from a statistical perspective. The test statistics usually vary from model to model and depend on the model assumptions. Univariate approaches obtain a standard p value concerning effect size for each SNP–QT pair, where the effect size is 0 under the null hypothesis (Stein et al., 2010; Hibar et al., 2011). Test statistics are carefully designed to avoid computationally intractable multitest issues in a multivariate background. For example, the sequence kernel association test (SKAT) only fits the null model where phenotypes are regressed on the covariates alone (Wu et al., 2011). Furthermore, Cao et al. (2022) proposed a test named “overall” to boost its power by aggregating the information from three types of traditional association tests including SKAT. In the Bayesian framework, each effect size is considered as a random variable rather than a fixed scalar, and the association tests are quite different. One feasible choice is to use the credible interval of each effect size approximated from the posterior distribution, and significance is determined by whether “0” is within the confidence interval (Hespanhol et al., 2019; Lu et al., 2012; Eberly and Casella, 2003). The difficulty in dealing with the p values in the Bayesian framework is that the samples obtained from the posterior distribution of each effect size are usually dependent, not satisfying the independent assumptions commonly used in the frequentist hypothesis tests. Recently, Liu et al. (2020) and Liu et al. (2024) used the Cauchy combination test (CCT) to aggregate a set of individual p values into a single test statistic via Cauchy transformations, and these p values are not necessarily independent. Inspired by this, we adopted the CCT into our model and used the credible interval approach as a benchmark.

In this work, we developed a novel Bayesian regression framework for identifying significant associations between QTs and genetic markers while modeling the spatial dependency among QTs explicitly. Our work has three-fold primary contributions as follows. First, we incorporated a population-level mixed-effects term into the LM, taking full advantage of the dependent structure of brain imaging QTs. This fits the fact that an SNP is interlinked to multiple QTs by pleiotropy—a common biological phenomenon. The population-level mixed-effects term avoids the unidentifiable issue triggered by the individual-level mixed-effects term. Second, we implemented the model in the Bayesian framework and derived a Markov chain Monte Carlo (MCMC) algorithm to do the model inference. Further, we incorporated the MCMC samples with the CCT in an ensemble framework to explore the significant association between SNPs and QTs, which avoided computationally intractable multitest issues. With this model, we can perform the association analysis between a set of QTs and a given SNP simultaneously instead of testing one QT at a time. The spatial model can effectively utilize spatial information and boost its power in association studies. Our simulation studies indicated improved power of our method concerning the metric area under the receiver operating characteristics (ROC) curve (AUC; Ling et al., 2003; Huang and Ling, 2005) compared with the traditional LM. In addition, though we validated our method in a simulation study containing moderate QTs and SNPs, our method can be easily extended to a large-scale of SNP dataset by parallelly allocating disjoint SNP subdatasets to multiple compute servers.

2. METHODS

2.1. A Bayesian spatial-correlated multitask regression model

A LM explains how the outcome variable varies over predictors by a linear function. Let $y_{i}$ be the observed response value of individual $i$ , and $x_{i}$ be the one-dimensional predictor of individual i, $i = 1, \dots, n$ . The distribution of outcomes $y_{i}$ given the predictor $x_{i}$ ( $i = 1, \dots, n$ ) and coefficients $β_{0}, β_{1}$ , is normally distributed with variance $σ^{2}$ and mean

E (y_{i} | β_{1}, β_{0}, x_{i}) = x_{i} β_{1} + β_{0},

(1)

where

β_{1}

is a scalar, representing the effect size of predictor

x_{i}

concerning the response variable, and

β_{0}

is the interception term. This univariate LM implicitly assumes that

y_{i}, i = 1, \dots, n

are independent. Thus, Eq. (1) can be equalized as

y_{i} | β_{1}, β_{0}, x_{i} \sim N (x_{i} β_{1} + β_{0}, σ^{2}) .

(2)

However, grouping factors, such as populations, species, and regions, exist widely in biological data, which cause the data points not to be truly independent. Thus, a linear mixed model (LMM) is developed to deal with structured data by modeling the relationship of outcomes within groups:

E (y_{i} | β_{1}, β_{0}, x_{i}, b_{i}) = x_{i} β_{1} + β_{0} + b_{i},

(3)

where

b_{i}

is a mixed-effects term, modeling the potential inner linkage of

y_{1}, \dots, y_{n}

. Compared with an LM, an LMM models the dependency among

y_{i}

explicitly, which boosts its power in association studies.

In this work, we extended the traditional univariate LMM by proposing a multitask univariate spatial regression model in the Bayesian framework to accommodate the dependency among phenotypes. To ensure the completeness, we used the LM as a comparison. To be specific, let $y_{i} \in R^{p}$ be the vector of p phenotypes of ith individual, and let $x_{i}$ be the observed value of individual i given the specified predictor. Start with a simple LM,

y_{i} = x_{i} β_{1} + β_{0} + ϵ_{i}

(4)

where

ϵ_{i} \sim MVN (0, σ^{2} I_{p})

is the noise term,

β_{1} = {(β_{11}, \dots, β_{1 p})}^{T}

and

β_{0} = {(β_{01}, \dots, β_{0 p})}^{T}

with

β_{1 j}

and

β_{0 j}

represent the effect size and intercept term of the predictor concerning phenotype j (

j = 1, \dots, p

), respectively. The equivalent expression of Eq. (4) is

y_{i} | σ^{2}, β_{1}, β_{0} \sim MVN (x_{i} β_{1} + β_{0}, σ^{2} I_{p}) .

(5)

To accommodate the dependency among phenotypes, we incorporated a mixed-effects term $b_{i}$ in the LM,

y_{i} = x_{i} β_{1} + β_{0} + b_{i} + ϵ_{i},

(6)

where

b_{i} \sim MVN (0, Σ)

, where

Σ

is an unknown positive definite matrix representing the dependency among phenotypes. However,

b_{i}

would intrigue unidentifiable issue in the model inference without further constrains. As

b_{i}

represents the dependency among phenotypes, for individuals from the same population, it’s reasonable to assume that the dependency of all individuals is the same at the population level; thus, Eq. (6) can be adjusted to

y_{i} = x_{i} β_{1} + β_{0} + h + ϵ_{i},

(7)

where

h \sim MVN (0, Σ)

represents the population-level phenotype dependency, and theoretically,

Σ

can take any positive definite matrix. To further simplify the structure, in this work, we set

Σ = σ_{p}^{2} G

, where

σ_{p}^{2}

is unknown, and G is known and can be given by the sample covariance of

y_{1}, \dots, y_{n}

since

Σ

is representing the group-level phenotype dependency. Further, if

y_{i}

p / 2

paired phenotype, G can be given by the Kronecker product of

A \otimes B

, where A and B represent the correlation within paired QTs and between paired QTs, respectively.

In the Bayesian framework, for the LMM described in Eq. (7), we used multivariate normal distributions as prior distributions for parameters $β_{0}, β_{1}$ and Inv-Gamma distributions as priors for $σ_{e}^{2}, σ_{p}^{2}$ . The fully Bayesian LMM model is then given as below,

\begin{array}{l} y_{i} | β_{1}, β_{0}, h, σ_{e}^{2} \sim MVN (x_{i} β_{1} + β_{0} + h, σ_{e}^{2} I_{p}), \\ β_{1} \sim MVN (0, I_{p}), \\ β_{0} \sim MVN (μ_{0}, I_{p}), \\ h | σ_{p}^{2} \sim MVN (0, σ_{p}^{2} G_{p \times p}), \\ σ_{p}^{2} \sim I G (a, b), \\ σ_{e}^{2} \sim I G (c, d), \end{array}

where a, b, c, d, and

μ_{0}

are hyperparameters. Similarly, the fully Bayesian LM model described in Eq. (4) is given as follows:

\begin{array}{l} y_{i} | β_{1}, β_{0}, σ^{2} \sim MVN (x_{i} β_{1} + β_{0}, σ^{2} I_{p}), \\ β_{1} \sim MVN (0, I_{p}), \\ β_{0} \sim MVN (μ_{0}, I_{p}), \\ σ_{e}^{2} \sim I G (c, d) . \end{array}

2.2. MCMC inference

We implemented an MCMC algorithm, i.e., Gibbs sampling in our content, to approximate the posterior of parameters. The basic idea of MCMC is to generate a Markov chain with equilibrium distribution to be the target. Precisely, with the data denoted as y and parameters to be represented by $θ$ , Gibbs sampling is performed to approximate the posterior distribution $p (θ | y)$ by iterative sampling from every single parameter’s full conditional distribution $p (θ_{i}^{(t)} | θ_{1}^{(t + 1)}, \dots, θ_{i - 1}^{(t + 1)}, θ_{i + 1}^{(t)}, \dots, θ_{d}^{(t)}, y)$ . Given the current samples $θ^{(t)} = (θ_{1}^{(t)}, \dots, θ_{d}^{(t)})$ at time t, each parameter at time $t + 1$ can be drawn from the full conditional distribution iteratively:

θ_{i}^{(t + 1)} \sim p (θ_{i}^{(t)} | θ_{1}^{(t + 1)}, \dots, θ_{i - 1}^{(t + 1)}, θ_{i + 1}^{(t)}, \dots, θ_{d}^{(t)}, y) .

(8)

Denote the remaining samples after the burn-in stage as ${θ^{(t)} : t = 1, \dots, m}$ , the posterior distribution of $θ_{i}$ ( $i = 1, \dots, d$ ) can be approximated by $P (θ_{i}) = \frac{1}{m} \sum_{i = 1}^{m} I (θ_{i} = θ_{i}^{(t)})$ , and its point estimate can be estimated by taking the sample average: ${\hat{θ}}_{i} = \frac{1}{m} \sum_{t = 1}^{m} θ_{i}^{(t)}$ .

2.2.1. Gibbs sampling

The joint posterior distribution for all unknown parameters of our proposed LMM is given by:

\begin{array}{l} p (β_{1}, σ_{e}^{2}, σ_{p}^{2}, h, β_{0} | y_{1}, \dots, y_{n}) \propto \prod_{i = 1}^{n} | σ_{e}^{2} I_{p} |^{- 1 / 2} \\ \times \exp {- \frac{1}{2 σ_{e}^{2}} ‖ y_{i} - x_{i} β_{1} - h - β_{0} ‖^{2} - \frac{1}{2} β_{1}^{T} β_{1}} \\ \times | σ_{p}^{2} G |^{- 1 / 2} \exp {- \frac{1}{2 σ_{p}^{2}} h^{T} G^{- 1} h} \\ \times {(σ_{p}^{2})}^{- a - 1} e^{- \frac{b}{σ_{p}^{2}}} {(σ_{e}^{2})}^{- c - 1} e^{- \frac{d}{σ_{e}^{2}}} \\ \times \exp {- \frac{1}{2} {(β_{0} - μ_{0})}^{T} (β_{0} - μ_{0})} . \end{array}

(9)

The full conditional distribution of each parameter is derived as follows:

σ_{e}^{2} | rest \sim I G (c + \frac{n p}{2}, \frac{1}{2} ‖ y_{i} - x_{i} β_{1} - h - β_{0} ‖^{2} + d),

(10)

σ_{p}^{2} | rest \sim I G (a + \frac{p}{2}, \frac{1}{2} h^{T} G^{- 1} h + b),

(11)

β_{1} | rest \sim MVN (μ_{β_{1}}, Σ_{β_{1}}),

(12)

β_{0} | rest \sim MVN (μ_{β_{0}}, Σ_{β_{0}}),

(13)

h | rest \sim MVN (μ_{h}, Σ_{h}),

(14)

where

μ_{h} = {(\frac{n}{σ_{e}^{2}} I_{p} + \frac{1}{σ_{p}^{2}} G^{- 1})}^{- 1} \frac{\sum_{i} (y_{i} - x_{i} β_{1} - β_{0})}{σ_{e}^{2}}

Σ_{h} = {(\frac{n}{σ_{e}^{2}} I_{p} + \frac{1}{σ_{p}^{2}} G^{- 1})}^{- 1}

μ_{β_{1}} = \frac{\frac{1}{σ_{e}^{2}} \sum_{i} (y_{i} - h - β_{0}) x_{i}}{\frac{1}{σ_{e}^{2}} \sum_{i} x_{i}^{2} + 1}

Σ_{β_{1}} = {(\frac{\sum_{i} x_{i}^{2}}{σ_{e}^{2}} + 1)}^{- 1} I_{p}

μ_{β_{0}} = {(\frac{n}{σ_{e}^{2}} + 1)}^{- 1} (\frac{\sum_{i} (y_{i} - x_{i} β_{1} - h)}{σ_{e}^{2}} + μ_{0})

, and

Σ_{β_{0}} = {(\frac{n}{σ_{e}^{2}} + 1)}^{- 1} I_{p}

Algorithm 1 depicts the Gibbs sampling algorithm of the LMM. We refer readers to Supplementary Appendix A1 for the derivation details. The Gibbs sampling algorithm for the full Bayesian LM and the derivation details are also given in Supplementary Appendix A1. The implementation of our method is available at https://github.com/ZhibinPU/spatialmultitasklmm.git.

Recall that n was the sample size, m was the total number of SNPs, and p was the total number of phenotypes. The computational complexity of a single-step iteration of MCMC was $O (n m p^{2})$ for the LM and was $\max {O (n m p^{2}), O (m T_{inv})}$ for the LMM, where $T_{inv}$ represented the computational complexity for computing the inverse of a matrix of size $p \times p$ . Tveit (2003) showed that the lower bound of $T_{inv}$ was $O (p^{2} l o g (p))$ , and the upper bound of $T_{inv}$ was $O (p^{3})$ . When $n < \log (p)$ , the LMM would be more theoretically computationally expensive than the LM in terms with the computational complexity. However, in practice, p (i.e., the number of phenotypes) is often less than n (i.e., the sample size).

3. SIMULATION

In this section, we simulated data in different scenarios to evaluate the performance of our method. We generated d independent SNPs of sample size n with software gG2P, a GWAS simulation tool (Tang and Liu, 2019). Equation (15) was used to generate p-dimensional phenotypes with an underlying trait covariance as well as the random noisy, which were controlled by parameters $σ_{p}^{2}$ , G, and $σ_{e}^{2}$ .

y_{i} = \sum_{j = 1}^{d} x_{i j} β_{j} + h + ϵ_{i}, i = 1, \dots, n .

(15)

We considered an SNP size of $d = 100$ , with effect sizes $β_{j}$ ( $j = 1, \dots, d$ ) being a p-dimensional random vector that follows a mixture of a normal distribution and a Dirac distribution, $β_{j} \sim π_{1} δ (0) + π_{2} M V N (0, I_{p})$ , where $δ (0)$ means the probability mass being zero at every point except at 0, $π_{1} = 0.95$ and $π_{0} = 0.05$ . In other words, among total d SNPs, only about 5% were significant (nonzeros). Besides, the mixed-effects h and random noise $ϵ_{i j}$ were generated from their prior distributions described in Section 2, with $σ_{e}^{2} = {0.1}^{2}$ and $σ_{p}^{2} = {0.2}^{2}$ .

We set the sample size to $n = 50$ (less than the SNP size) and $n = 100$ (equal to the SNP size), respectively. G was set to different values to demonstrate the robustness of our method corresponding to scenarios with independent, weakly, moderately, and strongly dependent phenotypes. We also compared our method with a basic LM with respect to the metric AUC based on credible intervals and CCT p values described in Section 1.

In Cases 1 and 2, we set QTs $p = 2$ , and the dependency matrix G to be $[\begin{matrix} 1 & 0.5 \\ 0.5 & 1 \end{matrix}]$ (moderate) and $[\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}]$ (zero), respectively. We applied the LMM and LM to the simulated data, and we repeated the experiment 100 times. We computed the averaged point-wise sensitivity and specificity, resulting in ROC curves as shown in Figure 1 (moderate) and Figure 2 (zero). Figure 1 showed that improved performance of the LMM over the LM concerning the AUC based on both the credible intervals and aggregated p values of the LMMs were better when there was a moderate dependency among phenotypes. When there was no spatial dependency among QTs, as shown in Figure 2, the performance of both models was similar as LMMs gained no spatial information in this case compared with the LMs.

FIG. 1.

Case 1. ROC curves of LMM and LM based on credible intervals (a and c) and the CCT (b and d) when there was a moderate dependency presented among phenotypes with sample size varies. Curves of QT₁ and QT₂ represent the ROC curves concerning QT₁ and QT₂. The figures in brackets indicated the corresponding AUC for each curve. AUC, area under the curve; CCT, Cauchy combination test; LM, linear model; LMM, linear mixed model; QT, quantitative traits; ROC, receiver operating characteristics.

FIG. 2.

Case 2. ROC curves of LMM and LM based on credible intervals (a and c) and the CCT (b and d) when there was no dependence presented among phenotypes with sample size varies. Curves of QT₁ and QT₂ represent the ROC curves concerning QT₁ and QT₂. The figures in brackets indicated the corresponding AUC for each curve.

In Case 3, we simulated a more sophisticated scenario. The phenotypes could be divided into three pairs, and we used a Kronecker product to represent the dependence among these six phenotypes. The dependency matrix G was set to $[\begin{matrix} 1 & 0.6 \\ 0.6 & 1 \end{matrix}] \otimes [\begin{matrix} 1 & 0.1 & 0.4 \\ 0.1 & 0.6 & 0.1 \\ 0.4 & 0.1 & 0.5 \end{matrix}]$ . We repeated this experiment 100 times and computed the averaged point-wise sensitivity and specificity and obtained the ROC curves as shown in Figure 3. As indicated in Figure 3, the LMM outperformed the LM with concerning metric AUC when complex dependency structures of QTs were presented.

FIG. 3.

Case 3. ROC curves of both the LMM and the LM based on credible intervals (a and c) and the CCT (b and d) when there was a complex dependency presented among phenotypes with sample size varies. Curves of ${QT}_{1}, \dots$ , QT₆ represent the ROC curves concerning ${QT}_{1}, \dots$ , QT₆. The figures in brackets indicated the corresponding AUC for each curve.

In Cases 4 and 5, we set QTs $p = 2$ , and the dependency matrix G to be $[\begin{matrix} 1 & 0.2 \\ 0.2 & 1 \end{matrix}]$ (weak) and $[\begin{matrix} 1 & 0.8 \\ 0.8 & 1 \end{matrix}]$ (strong), respectively. In these two cases, we repeated each experiment 100 and recorded the AUC of both LMM and LM in each experiment.

The AUC results were shown in violin plots in Figure 4, indicating that the LMM performed better than LM with respect to metric AUC. Tables 1 and 2 summarized the mean and standard deviation of the AUCs. We also performed the Wilcoxon Rank Sum and Signed Rank tests to identify if there existed a significant difference concerning metric AUC between the LMM and LM, and the result was reported in Tables 3 and 4. As Tables 3 and 4 have shown, the LMM performed significantly better than the LM with respect to metric AUC based on both the CCT and credible intervals. The computation time for each setting of the simulations was reported in Supplementary Table S1 in Supplementary Appendix A2.

FIG. 4.

Cases 4 and 5. Violin plots of averaged AUCs over QTs when correlations among phenotypes were weak (a and b) and strong (c and d) with sample size $n = 100$ .

Table 1.

Mean and Standard Deviation of Area Under the Curve Values of the 100 Repeated Experiments for Both the Linear Model and Linear Mixed Model Based on the Credible Interval

	LMM			LM
	$Q T_{1}$	$Q T_{2}$	$Q T_{avg}$	$Q T_{1}$	$Q T_{2}$	$Q T_{avg}$
Case 4
Mean	0.84	0.84	0.86	0.80	0.83	0.84
SD	0.11	0.11	0.11	0.11	0.10	0.11
Case 5
Mean	0.85	0.85	0.86	0.78	0.81	0.78
SD	0.10	0.11	0.11	0.11	0.12	0.12

Values of $Q T_{1}, Q T_{2}$ , and $Q T_{avg}$ correspond to the values of averaged AUC over 100 experiments concerning QT₁, QT₂, and their average.

AUC, area under the curve; LM, linear model; LMM, linear mixed model; QT, quantitative traits.

Table 2.

Mean and Standard Deviation of Area Under the Curve Values of the 100 Repeated Experiments for Both the Linear Model and Linear Mixed Model Based on Based on the Cauchy Combination Test

	LMM			LM
	$Q T_{1}$	$Q T_{2}$	$Q T_{avg}$	$Q T_{1}$	$Q T_{2}$	$Q T_{avg}$
Case 4
Mean	0.84	0.84	0.86	0.80	0.82	0.84
SD	0.11	0.11	0.10	0.09	0.10	0.10
Case 5
Mean	0.85	0.85	0.86	0.80	0.81	0.83
SD	0.10	0.10	0.11	0.10	0.11	0.10

Values of $Q T_{1}, Q T_{2}$ , and $Q T_{avg}$ correspond to the values of averaged AUC over 100 experiments concerning QT₁, QT₂, and their average.

Table 3.

P Values of the Wilcoxon Rank Sum and Signed Rank Test of Area Under the Curve of the Linear Model Versus Linear Mixed Model Across the 100 Repeated Experiments Based on the Credible Interval

	$Q T_{1}$	$Q T_{2}$	$Q T_{avg}$
Case 4	$4.46 \times 10^{- 5}$	$3.00 \times 10^{- 2}$	$9.20 \times 10^{- 9}$
Case 5	$2.21 \times 10^{- 9}$	$1.37 \times 10^{- 5}$	$4.27 \times 10^{- 9}$

The alternative hypothesis was that the AUC of LMM was larger than that of the LM. P values of $Q T_{1}, Q T_{2}$ , and $Q T_{avg}$ correspond to the values of concerning AUC of QT₁, QT₂, and their average.

Table 4.

P Values of the Wilcoxon Rank Sum and Signed Rank Test of Area Under the Curve of the Linear Model Versus Linear Mixed Model Across the 100 Repeated Experiments Based on the Cauchy Combination Test

	$Q T_{1}$	$Q T_{2}$	$Q T_{avg}$
Case 4	$5.51 \times 10^{- 6}$	$1.98 \times 10^{- 2}$	$1.38 \times 10^{- 2}$
Case 5	$3.20 \times 10^{- 9}$	$8.76 \times 10^{- 7}$	$3.12 \times 10^{- 5}$

In addition, we explored the performance of the LMM and LM when the true underlying distributions of the mixed-effects term and error term were not normally distributed but heavy tailed, i.e., followed a multivariate t-distribution. We repeated the data simulating process for Cases 1 to 5 according to Eq. 15, and all the data-generating procedures were the same as before except that

ϵ_{i} \sim Multi - t (v, σ_{e}^{2} I_{p}),

h \sim Multi - t (v, σ_{p}^{2} G) .

Here, we set the degree parameter $v = 3$ . The results were shown in Supplementary Figures S1–S5 in Supplementary Appendix A3. We did not observe a significant decrease in performance regarding the AUC metric when the model was mis-specified for the LMM and LM, compared with the correctly specified cases. Again, improved performance of the LMM over the LM in terms of the metric AUC based on both the credible intervals and aggregated p values was observed.

Moreover, in Case 6, we considered the scenarios where the covariance matrix was specified by an adjacent matrix. In this case, G was set to $A + ρ I_{p}$ , where the first term $A = [\begin{matrix} 1 & 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1 & 1 \end{matrix}]$ was an adjacency matrix, with element 1 indicating the existence of dependency between phenotypes. $ρ$ was a small bias (satisfying $ρ > - λ$ , where $λ$ ranges over the eigenvalues of A, and we set it as 0.8) to ensure G was positive definite (Lehmann et al., 2021). We repeated this experiment 100 times, computed the averaged point-wise sensitivity and specificity, and obtained the ROC curves as shown in Figure 5, which indicated that the LMM outperformed the LM concerning the metric AUC when neighborhood structures of QTs were presented.

FIG. 5.

Case 6. ROC curves of the LMM and LM based on credible intervals (a and c) and the CCT (b and d) when the covariance between phenotypes was specified by an adjacent matrix, with element 1 indicating the existence of dependence. Curves of QT₁ and QT₂ represented the ROC curves concerning QT₁ and QT₂. The figures in brackets indicated the corresponding AUC values for each curve.

In summary, the above simulation studies showed that LMM performed better than the LM concerning metric AUC when there was a dependency structure among the phenotypes, and their performance was similar when there was no spatial correlation among QTs. In addition, Figures 1–5 and Tables 1–4 indicated that the association test results based on CCT and the credible intervals were consistent.

4. APPLICATION

We applied our proposed method to an imaging genetics dataset collected from the ADNI-1 database.¹ The imaging genetic data contained 632 individuals. After quality control and imputation, we included 486 SNPs from 33 of the top 40 Alzheimer’s disease candidate genes listed on the AlzGene database as of June 10, 2010 (Song et al., 2022). The imaging-derived QTs included in this application study are thickness of the supramarginal gyrus (Supramarg) and the superior temporal gyrus (SupTemporal) on both the left and right hemispheres. The average correlation between Supramarg and SupTemporal was used to describe the relationship between pairs, and the average correlation between the left and right Supramarg and SupTemporal was used to represent the correlation within each pair. Their Kronecker product was used to describe the dependency among these two QT pairs. To mimic the scenario where the number of SNPs was greater than the sample size, we randomly selected 100 individuals from the 632 individuals. The total computational time for the real data application was 1.08 hours with 5000 MCMC iterations, running with a single core (3.20-GHz AMD Ryzen 7 7735H) on a computing cluster with 16 GB of RAM.

Table 5 listed the 19 SNPs selected from the proposed Bayesian spatial LMM based on the CCT at $α = 0.05$ with a Bonferroni correction. The SNP ID in bold indicated that it was also reported in existing literature, and the corresponding references were listed in the fourth column in Table 5. For example, the genetic marker rs16871157 was also identified in relative studies, such as Song et al. (2022), Kundu and Kang (2016), and Choi et al. (2019). Our model also identified several new genetic sites not reported in previous studies, such as rs212515 and rs1251753, which may provide new insights into Alzheimer’s disease. Among the four phenotypes we used, the SupTemporal (left and right) mainly involved in the production, interpretation, and self-monitoring of language, while dysfunction of this region might cause auditory hallucinations and thought disorder (Sun et al., 2009). The Supramarg (left and right) played an active role in phonological processing during both language and verbal working memory tasks (Deschamps et al., 2014). Among these 19 SNPs, rs2025935 located in complement C3b/C4b receptor 1 (CR1) was the key significant molecular factor to modulate tau pathology and cause reductions in cortical thickness of the superior temporal gyrus. Thus, it was recognized as an important risk locus for late-onset Alzheimer’s disease (Chibnik et al., 2011; Hazrati et al., 2012; Zhu et al., 2015). Death associated protein kinase 1 (DAPK1) was detected as a significant mediator of cell death and synaptic damage in central nervous system; thus, it was related to Alzheimer’s disease (Li et al., 2006; Hazrati et al., 2012; Chen et al., 2019). The detected SNPs rs1014306, rs10780849, rs1105384, and rs1473180 indicated that DAPK1 could affect brain regions in the right hemisphere more significantly than the left part. Besides, SNPs we found located in endothelin converting enzyme 1 (ECE1), like rs212515, rs213023, rs213025, and rs471359, showed significant effect on the left SupTemporal and Supramarg, indicating that these mutations might play a more important role in the left brain hemisphere than the right. This indication in term of spatial differences across the hemispheres was also mentioned in Hoshi et al. (2023). There was little evidence supporting the relatedness between these genetic mutations and Alzheimer’s disease in literature, and these new findings might warrant further investigation. The protein encoded by neural precursor cell expressed, developmentally down-regulated 9 (NEDD9) was identified as one of the signaling proteins in Alzheimer’s disease (Xing et al., 2011; Beck et al., 2014), and sortilin related VPS10 domain containing receptor 1 (SORCS1) was found genetically associated with hippocampal volume or gray matter density changes accounting for apolipoprotein E (APOE) (Xu et al., 2013). As shown by the results, rs3739784 in gene DAPK1, rs12758257 in gene NEDD9, and rs10787010 in gene SORCS1 were significantly correlated with the SupTemporal phenotype (left and right). In addition, it suggested nonsignificant differences in the genetic effects across hemispheres for these mutations.

Table 5.
Alzheimer's Disease Neuroimaging Initiative Study: Selected Single Nucleotide Polymorphisms and the Corresponding Regions of Interest

SNP Gene Phenotype (hemisphere) Reference

rs2025935 CR1 SupTemporal (right) Greenlaw et al., 2017; Zhu et al., 2017; Song et al., 2022

rs1014306 DAPK1 Supramarg (right) Phillips, 2013; Asensio et al., 2022

rs10780849 DAPK1 SupTemporal (right) Phillips, 2013; Song et al., 2022

rs1105384 DAPK1 SupTemporal (right) Kundu and Kang, 2016; Song et al., 2022

rs12378686 DAPK1 Supramarg (right) Watza et al., 2020; Choi et al., 2019

rs1473180 DAPK1 SupTemporal (left) Greenlaw et al., 2017; Song et al., 2022

rs1558889 DAPK1 Supramarg (left) Laumet et al., 2010

rs3028 DAPK1 SupTemporal (right) —

rs3739784 DAPK1 SupTemporal (left, right) Choi et al., 2019; Beaulac et al., 2023

rs12758257 ECE1 SupTemporal (left, right) Beaulac et al., 2023

rs212515 ECE1 SupTemporal (left) —

rs213023 ECE1 SupTemporal (left) —

rs213025 ECE1 SupTemporal (left) —

rs471359 ECE1 Supramarg (left) —

rs10947021 NEDD9 SupTemporal (right) Laumet et al., 2010

rs16871157 NEDD9 Supramarg (left) Kundu and Kang, 2016; Choi et al., 2019; Song et al., 2022

rs6912916 NEDD9 SupTemporal (right) Laumet et al., 2010

rs10787010 SORCS1 SupTemporal (left, right) Greenlaw et al., 2017; Song et al., 2022

rs1251753 SORCS1 Supramarg (right) —

SNP	Gene	Phenotype (hemisphere)	Reference
rs2025935	CR1	SupTemporal (right)	Greenlaw et al., 2017; Zhu et al., 2017; Song et al., 2022
rs1014306	DAPK1	Supramarg (right)	Phillips, 2013; Asensio et al., 2022
rs10780849	DAPK1	SupTemporal (right)	Phillips, 2013; Song et al., 2022
rs1105384	DAPK1	SupTemporal (right)	Kundu and Kang, 2016; Song et al., 2022
rs12378686	DAPK1	Supramarg (right)	Watza et al., 2020; Choi et al., 2019
rs1473180	DAPK1	SupTemporal (left)	Greenlaw et al., 2017; Song et al., 2022
rs1558889	DAPK1	Supramarg (left)	Laumet et al., 2010
rs3028	DAPK1	SupTemporal (right)	—
rs3739784	DAPK1	SupTemporal (left, right)	Choi et al., 2019; Beaulac et al., 2023
rs12758257	ECE1	SupTemporal (left, right)	Beaulac et al., 2023
rs212515	ECE1	SupTemporal (left)	—
rs213023	ECE1	SupTemporal (left)	—
rs213025	ECE1	SupTemporal (left)	—
rs471359	ECE1	Supramarg (left)	—
rs10947021	NEDD9	SupTemporal (right)	Laumet et al., 2010
rs16871157	NEDD9	Supramarg (left)	Kundu and Kang, 2016; Choi et al., 2019; Song et al., 2022
rs6912916	NEDD9	SupTemporal (right)	Laumet et al., 2010
rs10787010	SORCS1	SupTemporal (left, right)	Greenlaw et al., 2017; Song et al., 2022
rs1251753	SORCS1	Supramarg (right)	—

SNP, single nucleotide polymorphisms.

5. DISCUSSION

We proposed a spatial-correlated multitask LMM to uncover potential gene markers associated with multiple correlated imaging QTs simultaneously in the Bayesian framework. The mixed-effects term accounted for the population-level dependency among QTs and enabled the model to make use of the spatial information among QTs and further boost its statistical power in association studies. In this work, we introduced the population-level dependency among QTs to avoid the unidentifiable issue usually triggered by the individual-level mixed-effects term. This might sacrifice some spatial information at the individual level, and how to model the individual-level dependency properly would be one of the directions of our future work. We considered three forms of covariance to depict the population-level dependency, a vanilla form represented by a positive definite matrix, a Kronecker product form accounting for both correlations between phenotypes between pairs and cross pairs, and an adjacency matrix that only uses elements 0 and 1 to represent if there exist correlations between one pair of phenotypes. The Gaussian kernel has been also popular in covariance construction, and it can be considered in the future work. Besides the Gaussian assumption for random errors, we also simulated phenotypes based on a multivariate t-distribution to represent the scenarios where the distribution of phenotypes was mis-assumed. The results indicated that even when the model was mis-specified, the LMM performance was comparable with the cases when the model was not mis-specified concerning the metric AUC, and the LMM outperformed the LM. In addition, applying a transformation to the phenotypes, for example, a log transformation maps the positive-valued QTs to $(- \infty, \infty)$ , is a common practice. However, the resulting $\log (QTs)$ might violate the Gaussian assumption heavily. How to model the transformed phenotypes accurately is a changing work. A possible approach might be the generalization of LMM inferring the transformation function directly from the data (Fusi et al., 2014).

In addition to the standard LM, we also compared the LMM with the Bayesian group sparse multi-task regression (BGSMTR, Song et al., 2022). The BGSMTR was an innovative approach that explicitly modeled phenotypic correlations both within and across cerebral hemispheres using a bivariate conditional autoregressive (CAR) process. This allowed for considering spatial dependencies of imaging QTs. In addition, the BGSMTR tested grouped SNPs, by encouraging sparsity between and within SNP groupings. While this simultaneous estimation could enhance statistical power when SNPs are strongly correlated (e.g., exhibit strong linkage disequilibrium within groups; Greenlaw et al., 2017), it could also incur significant computational overhead. We conducted a direct comparison with the BGSMTR in Case 3, and the results are given in Supplementary Appendix A4. The results (Supplementary Fig. S6 in Supplementary Appendix A4) indicated that the LMM outperformed the BGSMTR concerning the metric AUC when the SNPs were independently simulated. Meanwhile, compared with the LM, significant performance improvements were observed for both the LMM and BGSMTR, as the LM ignores the intertrait dependencies. The computational complexity shows (see Supplementary Table S2 in Supplementary Appendix A4), when $m n < p$ , the two methods exhibit comparable theoretical costs. However, in real-world imaging-genetic studies, where $p ≪ m n$ in general, the LMM would demonstrate significant efficiency. This was empirically validated in Case 3 (Supplementary Fig. S7 in Supplementary Appendix A4). In conclusion, the BGSMTR represented a valuable advanced approach to model structured genetic effects and spatial phenotype dependencies, whereas our LMM provided a balanced trade-off between statistical performance and computational feasibility for high-dimensional imaging-genetic applications.

Besides, Variational Bayes (VB) is also a widely acknowledged technique in Bayesian inference, which often serves as an alternative to MCMC sampling methods (Tran et al., 2021). In the future work, we will consider incorporating the VB results into the model inference to improve the computational efficiency.

AUTHORS’ CONTRIBUTIONS

Z.P.: Conceptualization, data curation, formal analysis, methodology, software, writing—original draft. S.G.: Conceptualization, data curation, formal analysis, methodology, software, project administration, supervision, writing—review and editing. All authors read and approved the final article.

Footnotes

ACKNOWLEDGMENTS

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at .

Data collection and sharing for the ADNI is funded by the National Institute on Aging (National Institutes of Health Grant U19 AG024904). The grantee organization is the Northern California Institute for Research and Education. In the past, ADNI has also received funding from the National Institute of Biomedical Imaging and Bioengineering, the Canadian Institutes of Health Research, and private sector contributions through the Foundation for the National Institutes of Health (FNIH) including generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research &Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics.

The authors thank ShanghaiTech University for supporting this work through the startup fund and the HPC Platform.

AUTHOR DISCLOSURE STATEMENT

No competing financial interests exist.

FUNDING INFORMATION

This project was supported by the Shanghai Science and Technology Program (No. 21010502500), the National Natural Science Foundation of China (12401383), the startup fund of ShanghaiTech University, and the HPC Platform of ShanghaiTech University.

Supplemental Material

References

Asensio

, Ortega-Azorín

, Barragán

, et al. Association between microbiome-related human genetic variants and fasting plasma glucose in a high-cardiovascular-risk Mediterranean population. Medicina, 2022; 58(9):1238.

Beaulac

, Wu

, Gibson

, et al. Neuroimaging feature extraction using a neural network classifier for imaging genetics. BMC Bioinformatics, 2023; 24(1):271.

Beck

, Nicolas

, Kopp

, et al. Adaptors for disorders of the brain? The cancer signaling proteins NEDD9, CASS4, and PTK2B in Alzheimer’s disease. Oncoscience, 2014; 1(7):486–503.

Cao

, Wang

, Zhang

, et al. Gene-based association tests using GWAS summary statistics and incorporating EQTL. Sci Rep, 2022; 12(1):3553.

Chen

, Zhou

, Lee

. Death-associated protein kinase 1 as a promising drug target in cancer and Alzheimer’s disease. Recent Pat Anticancer Drug Discov, 2019; 14(2):144–157.

Chibnik

, Shulman

, Leurgans

, et al. Cr1 is associated with amyloid plaque burden and age-related cognitive decline. Ann Neurol, 2011; 69(3):560–569.

Choi

, Lu

, Beg

, et al.; for the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The contribution plot: Decomposition and graphical display of the RV coefficient, with application to genetic and brain imaging biomarkers of Alzheimer’s disease. Hum Hered, 2019; 84(2):59–72.

Dahl

, Iotchkova

, Baud

, et al. A multiple-phenotype imputation method for genetic studies. Nat Genet, 2016; 48(4):466–472.

Deschamps

, Baum

, Gracco

. On the role of the supramarginal gyrus in phonological processing and verbal working memory: Evidence from RTMS studies. Neuropsychologia, 2014; 53:39–46.

10.

Eberly

, Casella

. Estimating Bayesian credible intervals. Journal of Statistical Planning and Inference, 2003; 112(1–2):115–132.

11.

Elliott

, Sharp

, Alfaro-Almagro

, et al. Genome-wide association studies of brain imaging phenotypes in uk biobank. Nature, 2018; 562(7726):210–216.

12.

Fusi

, Lippert

, Lawrence

, et al. Warped linear mixed models for the genetic analysis of transformed phenotypes. Nat Commun, 2014; 5(1):4890.

13.

Gerber

, Peterson

, Muñoz

, et al. Imaging genetics. J Am Acad Child Adolesc Psychiatry, 2009; 48(4):356–361.

14.

Greenlaw

, Szefer

, Graham

, et al.; Alzheimer’s Disease Neuroimaging Initiative. A Bayesian group sparse multi-task regression model for imaging genetics. Bioinformatics, 2017; 33(16):2513–2522.

15.

Hazrati

L-N

, Van Cauwenberghe

, Brooks

, et al. Genetic association of CR1 with Alzheimer’s disease: A tentative disease mechanism. Neurobiol Aging, 2012; 33(12):2949.e5–2949.e12.

16.

Hespanhol

, Vallio

, Costa

, et al. Understanding and interpreting confidence and credible intervals around effect estimates. Braz J Phys Ther, 2019; 23(4):290–301.

17.

Hibar

, Stein

, Kohannim

, et al.; Alzheimer’s Disease Neuroimaging Initiative. Voxelwise gene-wide association study (vgenewas): multivariate gene-based association testing in 731 elderly subjects. Neuroimage, 2011; 56(4):1875–1891.

18.

Hoshi

, Kobayashi

, Hirata

, et al. Decreased beta-band activity in left supramarginal gyrus reflects cognitive decline: Evidence from a large clinical dataset in patients with dementia. Hum Brain Mapp, 2023; 44(17):6214–6226.

19.

Huang

, Ling

. Using auc and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng, 2005; 17(3):299–310.

20.

Huang

, Nichols

, Huang

, et al.; Alzheimer’s Disease Neuroimaging Initiative. Fvgwas: Fast voxelwise genome wide association analysis of large-scale imaging genetic data. Neuroimage, 2015; 118:613–627.

21.

Kong

, An

, Zhang

, et al. L2rm: Low-rank linear regression models for high-dimensional matrix responses. J Am Stat Assoc, 2020; 115(529):403–424.

22.

Kundu

, Kang

. Semiparametric bayes conditional graphical models for imaging genetics applications. Stat (Int Stat Inst), 2016; 5(1):322–337.

23.

Laumet

, Chouraki

, Grenier-Boley

, et al. Systematic analysis of candidate genes for Alzheimer’s disease in a French, genome-wide association study. J Alzheimers Dis, 2010; 20(4):1181–1188.

24.

Lehmann

, Henson

, Geerligs

, et al. Characterising group-level brain connectivity: A framework using Bayesian exponential random graph models. Neuroimage, 2021; 225:117480.

25.

Lencz

, Szeszko

, DeRosse

, et al. A schizophrenia risk gene, znf804a, influences neuroanatomical and neurocognitive phenotypes. Neuropsychopharmacology, 2010; 35(11):2284–2291.

26.

, Grupe

, Rowland

, et al. Dapk1 variants are associated with Alzheimer’s disease and allele-specific expression. Hum Mol Genet, 2006; 15(17):2560–2568.

27.

Ling

, Huang

, Zhang

. Auc: a better measure than accuracy in comparing learning algorithms. In Advances in Artificial Intelligence: 16th Conference of the Canadian Society for Computational Studies of Intelligence, AI 2003, Halifax, Canada, June 11–13, 2003. Springer; 2003, pp. 329–341.

28.

Little

, Rubin

. Statistical analysis with missing data. John Wiley & Sons; 1987.

29.

Liu

, Xie

. Cauchy combination test: A powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc, 2020; 115(529):393–402.

30.

Liu

, Liu

, Lin

. Ensemble methods for testing a global null. J R Stat Soc Series B Stat Methodol, 2024; 86(2):461–486.

31.

, Ye

, Hill

. Analysis of regression confidence intervals and Bayesian credible intervals for uncertainty quantification. Water Resources Research, 2012; 48(9).

32.

Phillips

. A longitudinal study of changes in the mtDNA of Alzheimer’s disease patients. PhD thesis, University of North Texas Health Science Center: Fort Worth; 2013.

33.

Song

, Ge

, Cao

, et al. A Bayesian spatial model for imaging genetics. Biometrics, 2022; 78(2):742–753.

34.

Stein

, Hua

, Lee

, et al.; Alzheimer’s Disease Neuroimaging Initiative. Voxelwise genome-wide association study (VGWAS). Neuroimage, 2010; 53(3):1160–1174.

35.

Sun

, Maller

, Guo

, et al. Superior temporal gyrus volume change in schizophrenia: A review on region of interest volumetric studies. Brain Res Rev, 2009; 61(1):14–32.

36.

Tang

, Liu

. G2p: A genome-wide-association-study simulation tool for genotype simulation, phenotype simulation and power evaluation. Bioinformatics, 2019; 35(19):3852–3854.

37.

Tran

M-N

, Nguyen

T-N

, Dao

V-H

. A practical tutorial on variational bayes. arXiv Preprint arXiv:2103.013272021.

38.

Tveit

. On the complexity of matrix inversion. Mathematical Note, 2003; 1.

39.

Vounou

, Nichols

, Montana

, Alzheimer’s Disease Neuroimaging Initiative. Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach. Neuroimage, 2010; 53(3):1147–1159.

40.

Wang

, Martins-Bach

, Alfaro-Almagro

, et al. Phenotypic and genetic associations of quantitative magnetic susceptibility in UK biobank brain imaging. Nat Neurosci, 2022; 25(6):818–831.

41.

Wang

, Nie

, Huang

, et al.; For the Alzheimer’s Disease Neuroimaging Initiative. Identifying quantitative trait loci via group-sparse multitask regression and feature selection: An imaging genetics study of the Adni cohort. Bioinformatics, 2012; 28(2):229–237.

42.

Watza

, Lusk

, Dyson

, et al. Copd-dependent effects of genetic variation in key inflammation pathway genes on lung cancer risk. Int J Cancer, 2020; 147(3):747–756.

43.

, Lee

, Cai

, et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet, 2011; 89(1):82–93.

44.

Xing

Y-Y

, Yu

J-T

, Yan

W-J

, et al. Nedd9 is genetically associated with Alzheimer’s disease in a Han Chinese population. Brain Res, 2011; 1369:230–234.

45.

, Xu

, Wang

, et al. The genetic variation of sorcs1 is associated with late-onset Alzheimer’s disease in Chinese Han population. PLoS One, 2013; 8(5):e63621.

46.

Yuan

, Lin

. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2006; 68(1):49–67.

47.

Zhu

, Khondker

, Lu

, et al. Bayesian generalized low rank regression models for neuroimaging phenotypes and genetic markers. Journal of the American Statistical Association, 2014; 109(507):977–990.

48.

Zhu

X-C

, Yu

J-T

, Jiang

, et al. Cr1 in Alzheimer’s disease. Mol Neurobiol, 2015; 51(2):753–765.

49.

Zhu

X-C

, Wang

H-F

, Jiang

, et al.; Alzheimer’s Disease Neuroimaging Initiative. Effect of cr1 genetic variants on cerebrospinal fluid and neuroimaging biomarkers in healthy, mild cognitive impairment and Alzheimer’s disease cohorts. Mol Neurobiol, 2017; 54(1):551–562.