Abstract
Resting-state functional magnetic resonance imaging allows one to study brain functional connectivity, partly motivated by evidence that patients with complex disorders, such as Alzheimer's disease, may have altered functional brain connectivity patterns as compared with healthy subjects. A functional connectivity network describes statistical associations of the neural activities among distinct and distant brain regions. Recently, there is a major interest in group-level functional network analysis; however, there is a relative lack of studies on statistical inference, such as significance testing for group comparisons. In particular, it is still debatable which statistic should be used to measure pairwise associations as the connectivity weights. Many functional connectivity studies have used either (full or marginal) correlations or partial correlations for pairwise associations. This article investigates the performance of using either correlations or partial correlations for testing group differences in brain connectivity, and how sparsity levels and topological structures of the connectivity would influence statistical power to detect group differences. Our results suggest that, in general, testing group differences in networks deviates from estimating networks. For example, high regularization in both covariance matrices and precision matrices may lead to higher statistical power; in particular, optimally selected regularization (e.g., by cross-validation or even at the true sparsity level) on the precision matrices with small estimation errors may have low power. Most importantly, and perhaps surprisingly, using either correlations or partial correlations may give very different testing results, depending on which of the covariance matrices and the precision matrices are sparse. Specifically, if the precision matrices are sparse, presumably and arguably a reasonable assumption, then using correlations often yields much higher powered and more stable testing results than using partial correlations; the conclusion is reversed if the covariance matrices, not the precision matrices, are sparse. These results may have useful implications to future studies on testing functional connectivity differences.
Introduction
Resting-state functional magnetic resonance imaging (rs-fMRI) has become a popular methodology for studying brain functional networks (Biswal, 2012). It holds promise for understanding brain functions and revealing disrupted brain connectivity underlying complex disorders, such as Alzheimer's disease (Huang et al., 2010; Wee et al., 2013). Recently, there has been great interest in group-level network analysis with the focus on estimation (Smith et al., 2012); however, in contrast to more established fMRI data analysis (Zhu et al., 2014), there is a relative lack of studies on drawing statistical inference, particularly for group comparisons in brain networks (Varoquaux and Craddock, 2013). For example, based on a study comparing functional networks between patients with Alzheimer's disease and a control group, even though estimated networks for the two groups may suggest some altered subnetworks, are the identified differences genuine? A rigorous statistical test can address this question before one possibly over-interprets the results based on the two point estimates. This is particularly important given often small sample sizes and high noise levels in rs-fMRI data.
A network is defined in graph theory as a set of nodes (or vertices) and the edges with weights between them. In the case of brain functional connectivity, nodes are spatial regions of interest (ROIs), for example, as obtained from brain atlases or from functional localizer tasks (Smith et al., 2011). A weight (either binary or continuous) is assigned to each edge to measure the association between the two nodes, for example, based on their BOLD time-course signals. Group comparison aims at testing whether the edge weights are different or not across groups. However, there is still a debate on what a continuous measure of pairwise association (or an edge-weight) should be used to characterize functional connectivity, which has important implications to not only estimation but also testing (Varoquaux and Craddock, 2013).
Many functional connectivity studies have used Pearson's (full or marginal) correlation between two nodes' BOLD time-course signals (Azari et al., 1992; Horwitz et al., 1987; Kim et al., 2014; Stam et al., 2007; Supekar et al., 2008), which is easy to calculate based on a sample covariance matrix. However, a drawback of using correlations is that it may not be able to distinguish whether the functional connection between two nodes is direct or not. Namely, a correlation captures the marginal association between two nodes, which may be caused by a third node. This distinction between marginal correlation and true, direct functional connection is very important if one aims at estimating the structure of a network (Huang et al., 2010). It is also relevant to testing because the numbers and magnitudes of nonzero associations may change with the specific measure being used, possibly influencing the final testing result. To overcome this limitation, a number of studies adopted partial correlations (Marrelec et al., 2006; Salvador et al., 2005). The partial correlation quantifies the association between two brain regions, conditioning on the other regions, where a zero partial correlation represents the absence of an edge in the estimated network, indicating conditional independence under the Gaussian assumption. Smith and colleagues (2011) concluded that network estimation using partial correlations outperformed that using correlations (when a suitable regularization was applied). A precision matrix, also called inverse covariance matrix, is useful for estimating partial correlations (Marrelec et al., 2006). Even with a small sample size (and/or high-dimensional data), we can estimate a precision matrix by applying regularization as implemented in the graphical lasso method (Banerjee et al., 2008; Friedman et al., 2008). In other words, the graphical lasso allows identifying not only the network structure (i.e., the zero and nonzero entries in the precision matrix) but also the edge weights for a large number of brain regions with even a small sample size.
However, even with some benefits from using partial correlations for network estimation, it is unknown whether using partial correlations as edge weights gives necessarily higher statistical power than using correlations to test group differences in brain connectivity. A key issue is that “partial correlations are intrinsically harder to estimate” (Varoquaux and Craddock, 2013). For example, with a small sample size, some regularization is necessary for estimating a precision matrix, but not for a covariance matrix, while suitable regularization is not trivial in practice. In addition, the power to test group differences may depend on some other factors, such as the sparsity levels of the networks to be tested. These issues have not been adequately addressed earlier; it is the goal of this article to investigate these issues.
We considered both correlations and partial correlations as edge weights at various sparsity levels of the estimated brain functional networks to test for differences between fetal alcohol spectrum disorder (FASD) patients and controls. Wozniak and colleagues (2013) used correlations to reveal significantly altered network connectivities in children with FASD based on the network measures of characteristic path and global efficiency. Kim and colleagues (2014) compared several statistical tests and concluded that two tests, network-based statistic (NBS) (Zalesky et al., 2012) and an adaptive sum of powered score (aSPU) test (Pan et al., 2014), were complementary to each other with at least one often showing great power in testing group differences in brain connectivity. NBS is a useful test developed in the neuroimaging community for detecting altered subnetworks while attaching a statistical significance. It takes advantage of the earlier assumption that altered edges would form connected subnetworks, and hence is believed to offer high power when the assumption holds. On the other hand, the aSPU test, built on a class of so-called sum of powered score (SPU) tests, does not impose such an assumption, and was found to be complementary to NBS with higher power under some situations when the goal is to assess overall network differences. Also, comparing some global network measures between two groups is a popular way to demonstrate brain connectivity differences (Wozniak et al., 2013). We adopted the aSPU and SPU tests, NBS, and several global network measures to compare brain connectivity between two groups. Our goal is not to directly compare these tests, but to investigate how they perform with the use of correlations and partial correlations to describe brain networks.
We used both the real FASD data and simulated data mimicking the FASD data. Our numerical study confirmed that suitable regularization on estimating covariance and precision matrices would have implications to the power of a test being applied to the estimated correlations or partial correlations. However, it was generally difficult to choose suitable regularization, especially for testing. For example, although cross-validation (CV) performed well in selecting suitable regularization parameters for network estimation, leading to nearly minimal estimation errors, a test using such estimated networks might be low powered. Most importantly, our study showed that the relative power of testing with either correlations or partial correlations depended on the sparsity levels of the true covariance and precision matrices. For example, if the true precision matrices for the two groups were sparse, using correlations as edge weights often gave higher power than using partial correlations in testing group differences. Note that a sparse precision matrix often induces a corresponding nonsparse covariance matrix; given that a precision matrix, but not a covariance matrix, can distinguish between direct and indirect connections in a network, assuming sparse precision matrices seems to be reasonable. On the other hand, the conclusion was the opposite if the true covariance matrices were indeed sparse. These results may have useful implications to future studies comparing functional connectivity.
This article is organized as follows. After introducing data and notation for brain connectivity, we review estimation methods for covariance and precision matrices for brain connectivity, followed by statistical methods for testing group differences in brain functional connectivity. In Application to the FASD Data section, we apply the described methods to the FASD data using either correlations or partial correlations with varying sparsity levels, to examine how the test results change for functional connectivity differences between a group of FASD patients and a control group. In Simulations section, we use simulated data mimicking the FASD data to investigate the effects of edge weights, true or estimated network sparsity levels, and other factors on testing results. We summarize the main conclusions and some related future work in Results section.
Materials and Methods
Data and notation
To test for between-group differences in brain connectivity, we consider a two-group scenario with a binary response/disease indicator and with possible covariates. For the disease status of subject
To compare brain functional connectivity between two groups, one must first estimate a connectivity (adjacency) matrix (or a network) for each subject. The connectivity matrix corresponds to a graph model (Bullmore and Sporns, 2009). Suppose we have N distinct brain ROIs that define the nodes of the networks or graphs, and suppose at each node brain activity is measured as fMRI BOLD time series at M time points. In a Gaussian graphical model, the BOLD signals from N regions across time points t,
The (full) correlation would measure the marginal association of the signals in two ROIs, which can be easily estimated from a sample covariance matrix,
an unbiased estimator of Σ, where
The partial correlations are obtained from the precision (i.e., inverse covariance) matrix Θ
pq
=(θpq)=Σ. If we denote the partial correlation between nodes p and q by ρpq
, it is defined as
With either full correlations or partial correlations, once a symmetric N×N connectivity matrix is estimated for each subject, there are k=N×(N−1)/2 unique pairwise associations in it, since each node is potentially connected with every other node. Accordingly, each subject has k association measures for brain connectivity. Often the association measures (i.e., full correlations or partial correlations) are normalized by applying Fisher's z-transformation and we denote the k continuous association measures of subject i's brain connectivity as
In matrix notation, we denote Yn ×l as a vector for disease indicators, X n ×k as a matrix of pairwise associations between nodes (with each element as a z-transformed correlation or partial correlation), and Z n ×l as a covariate matrix.
Estimating covariance and precision matrices via graphical lasso
Often one is interested in identifying pairs of ROIs that are unconnected in a network, which are conditionally independent; these correspond to zero entries in Θ with zero partial correlations between nodes. As discussed earlier, partial correlations can be estimated from the precision matrix, and a natural way to estimate the precision matrix Θ is to invert S. However, taking the inverse of S will, in general, yield a
Banerjee and colleagues (2008) and Friedman and colleagues (2008) proposed a regularized estimator for a precision matrix Θ. The resulting estimate
over the semi-positive definite Θ
i
, where tr denotes the trace, Si
is the sample covariance based on the subject i's BOLD time series,
in which we also obtain a regularized estimate for the covariance matrix
In our study, to test group differences in brain connectivity, we tried several ways to choose the regularization parameter λi
. First, we tried to choose λi
for each subject i separately. Second, we chose λi
at the group-level; that is, we chose a common
The graphical lasso can be also used to estimate brain connectivity with varying sparsity levels, which not only avoids the difficult issue of choosing a suitable
Testing group differences in brain functional connectivity
We have discussed how to estimate subject-specific networks based on their covariance or precision matrices, from which subject-specific brain connectivity data
SPU and aSPU
The SPU test is a global test originally proposed for the association analysis of genomic data (Pan et al., 2014), but Kim and colleagues (2014) showed that it maintains great power for brain connectivity data. The SPU tests are a family of association tests such that at least one of them is powerful for a given situation. Each SPU test is based on the score vector from a general regression model. Consider a logistic regression model where k functional connections and l covariates are predictors:
The null hypothesis to be tested is
Denote the score vector for
Note that, unless Xij
=0 for all subjects i, the weights of the edge j across the subjects i contribute to the score vector. Given γ≥1, the test statistic of the SPU(γ) test is
where
To draw statistical inference, Pan and colleagues (2014) proposed using permutations: First, we fit the null model to obtain
The power of the SPU(γ) test depends on the choice of γ. Pan and colleagues (2014) proposed an aSPU test that combines the p-values of multiple SPU tests with various values of γ, and its test statistic is defined as
In this article, we considered
To obtain the p-value of aSPU test, we calculate a SPU test statistics TSPU
(γ)
(b) and corresponding p-value
In this article, we focus on the use of SPU tests with γ=1, 2, and∞, since these three cases correspond to some known tests in the neuroimaging research community: SPU(1) test is similar to an fMRI network test proposed in Meskaldji and colleagues (2011); as pointed out in Kim and colleagues (2014), SPU(2) is closely related to multivariate distance matrix regression used by Shehzad and colleagues (2014), Reiss and colleagues (2010), and McArdle and Anderson (2001). SPU(∞) can be regarded as a mass-univariate testing (Nichols and Holmes, 2001). Weighted versions of the SPU and aSPU tests discussed in Kim and colleagues (2014) were also applied to numerical examples here; since they yielded similar results to those of SPU and aSPU tests, we skip their discussion.
R code for SPU and aSPU tests will be available at
Network-based statistic
NBS aims at detecting disrupted subnetworks across groups (Zalesky et al., 2010). It assumes and takes advantage of the proposition that the edges with altered weights cluster together and form some connected subnetworks; it uses the size of the largest altered subnetwork as its test statistic. In the presence of such clustering with disrupted subnetworks, NBS can potentially yield greater power than edge-based tests that ignore such clustering.
For each edge
where the errors eij
are assumed to be independent and identically distributed as N(0, σ2). We formulate a t contrast at each edge separately to test the null hypothesis
where
NBS discovers “supra-threshold edges” by selecting the edges that have test statistics Tj
's exceeding a predetermined threshold, and it identifies the size of the largest such sub-network or cluster that is composed of the connected supra-threshold edges. Denote the size of the largest cluster as s. To draw inference, NBS employs permutations. In each permutation
Such a supra-threshold-cluster test is believed to be more powerful than mass univariate edge-based testing. However, since mass univariate testing is low powered with small sample sizes, even in the presence of clustered edges with changed weights, they may not be detected. Hence, when either true or detected edges with changed weights are isolated from each other, forming no clusters, NBS will lose power (Zalesky et al., 2010). Furthermore, the power of NBS depends on the threshold being used, which is difficult to choose in practice (Kim et al., 2014). For ease of understanding, we follow the notation in Kim and colleagues (2014) where nbs(t) is defined as the NBS test with a predetermined threshold t, representing the tth percentile in absolute values of Tj's. We applied the NBS tests with multiple values of t, showing the power dependence on t; it is noted that, if we want to choose a single t giving the highest power, a multiple-testing adjustment is needed, though we do not pursue it here.
Software for the Network Based Statistic is available at
Global network measures
Brain networks can be characterized by a few neurobiologically meaningful global network measures. Rubinov and Sporns (2010) discussed many global network measures that detect functional integration and segregation, quantify centrality of individual brain regions or pathways, characterize patterns of local anatomical circuitry, and test resilience of networks to insult. Each global network measure is computable with some positive normalized weights wij (i.e., 0≤wij ≤1) for any edge connecting nodes i and j, or with a binary measure denoting the presence or absence of the connection.
Based on partial correlations and correlations, we consider four global network measures; characteristic path length (Charpath), global efficiency (Eglob), local efficiency (Elocal), and mean clustering coefficient (Eclust) to compare the FASD patient group with the controls.
For each subject, all pairwise associations (either correlations or partial correlations) are measured, and a weighted (not binary) network metric such as global efficiency is computed. We test group differences in each network measure based on logistic regression.
Open source Matlab toolbox BCT provides functions to calculate global network measures at
Application to the FASD Data
MRI acquisition and processing
We used the FASD data of Wozniak and colleagues (2013). For the initial MRI data acquisition, a Siemens 3T TIM Trio MRI scanner with a 12-channel parallel array head coil was used. Scans included a structural T1-weighted scan, a resting-state fMRI scan (TR=2000 msec, TE=30 msec, 34 interleaved slices, no skip, voxel size=3.45×3.45×4.0 mm, FOV=220 mm, flip angle=77°, 180 measures), and a field map; Additional details are included in Wozniak and colleagues (2013). During the resting-state scan, participants were instructed to close their eyes and remain still.
The fMRI data were processed with modified “1000 Functional Connectome (TFC)” pre-processing scripts (
The 68 Freesurfer cortical parcellations (34 per hemisphere) were registered to the TFC-processed fMRI data using Freesurfer's bbregister (Greve and Fischl, 2009). The parcellations were dilated during registration. but none were allowed to overlap and voxels outside the TFC brain mask were excluded. ROIs that contained fewer than 10 fMRI voxels for any subject were excluded from the final analysis. This resulted in the exclusion of 6 ROIs (bilateral entorhinal, frontal pole, and temporal pole), leaving a total of 62 ROIs (31 per hemisphere). The mean fMRI time series of all voxels within each ROI were then extracted for each subject. In this paper, we added 12 subcortical ROIs to have a total of 74 ROIs.
Data analysis
Kim and colleagues (2014) applied various statistical tests to compare brain functional connectivity in 24 FASD patients, aged 10–17, with 31 matched controls using resting-state fMRI; more details of the original study can be found in Wozniak and colleagues (2013). The resting-state fMRI time-series signals for each region were measured at 180 time points. They considered N=74 cortical and sub-cortical ROIs and applied Fisher's z-transformation to the Pearson correlations between all pairs of N=74 ROIs for k=2701 edges to test the group differences.
In this paper, both partial correlations and correlations are used as the edge weights in subject-specific networks for testing between-group differences in brain connectivity. The regularization parameter λi
was chosen in two ways. The first was to use a CV-selected group-level value:
Using CV, we found the optimal
p-Values for Testing Group Differences in Brain Connectivity with the Cross-Validation-Selected
aSPU, adaptive sum of powered score; SPU, sum of powered score.
Figure 1 illustrates how the p-value of each test changes with the sparsity level of the estimated networks. We gradually increased both groups' regularization parameters λ
(1) and λ
(2) so that they shared a similar and decreasing connection density, then applied the group comparison tests. For simplicity, in Figure 1, we report a mean value of

p-Values as a function of the regularization parameter for testing brain network differences with the fetal alcohol spectrum disorder data: left panel shows the p-values for testing group differences, using correlations sorted and color coded by statistical tests and each kind. Right panel shows the p-values with using partial correlations. Color images available online at
When using correlations, the p-value of a test tended to increase as λ increased (i.e., as higher regularization was applied to the precision matrix). When using partial correlations, the p-value did not show any consistent pattern: With no or little regularization, the p-values were unstable and fluctuated widely; with λ larger than 40, the p-values decreased and became stable.
Simulations
Using simulated data generated to mimic the FASD data, we compared whether using correlations would perform differently from using partial correlations, in terms of statistical power to detect network differences between two groups. Group differences in networks were generated in two ways: Unstructured differences were scattered across the networks, and structured differences with altered edges formed some clustered subnetworks in brain connectivity. Time-series data were generated from a multivariate Gaussian model, in which the mean vector was all zeros and a true group-level covariance matrix Σ(j) for j=1 or 2 was used. Our goal was to explore the effects of the sparsity levels of the true group-level precision matrices
Simulation set-ups
Set-up 1: unstructured differences in brain networks
The differences between two group-level networks could come from isolated edges forming no topological structures. To create a realistic setting, the true nonsparse covariance matrix for 74 ROIs was estimated for each group of the FASD data as S
(1) and S
(2). From these group-level nonsparse covariance matrices, we generated different levels of sparse precision matrices by applying the graphical lasso. For example, if we tended to have connection density T=0.20 (i.e., having 20% of nonzero entries) for the true precision matrix, we applied different regularization parameters for each group via graphical lasso to create the sparse precision matrices,
With 0≤φ≤1, each
In previous studies, brain connections were found to be indeed sparse (Hilgetag, 2002; Oh et al., 2014): around 0.12 for the mouse brain structural network and around 0.27 for the cat cortex. Although we expected the human brain to have sparse functional connections, to be more general, we considered both sparse and nonsparse networks with T=0.20, 0.30, 0.60, and 0.80 as the connection density (i.e., proportions of nonzero entries) of the true precision matrices.
Set-up 2: structured differences in brain network
Now we consider that the differences between two group-level networks could come from edges comprising a connected subnetwork or cluster, reflecting perhaps a reasonable assumption on the brain's functional segregation. To have a realistic simulation set-up, we adopted the subnetworks detected by NBS for the FASD data as the truths. Specifically, suppose the two sample covariance (or precision) matrices for the two groups were
where
Similarly,
Generating simulated data
In set-up 1, the group-level true covariance matrices,
for each subject in group j. We considered the simulation scenarios with M=180 as the default number of time points for each individual, and with n=50 subjects, including 25 controls and 25 cases. To investigate the effect of the number of time points M, we also considered M=90, 180, and 540. For set-up 2, similarly, we could use either the group-level precision matrices
Estimating networks
With simulated time-series data, we would estimate each subject-specific covariance (or precision) matrix Σ i (or Θ i ), from which we obtained the correlations (or partial correlations) of the brain regions for each subject before testing for their group differences.
As in the FASD data analysis, we chose the regularization parameter λi
in two ways: With each simulated time-series data set, we found
Type I error and power for testing
The correlations and partial correlations were obtained from
Estimation errors in estimating networks
Denote
Similarly, for the true Θ
i
and its arbitrary estimate
In each simulation, we obtained a loss from estimating individuals' networks (Σ i or Θ i ) and averaged the loss based on 1000 replicates.
Results
Set-up 1: unstructured differences in brain networks
Varying connection density of networks
Figure 2 presents representative results with the connection density of the true precision matrix (

Type I error and power for testing unstructured network differences with varying connection density levels of the estimated precision matrices for true network connection density T=0.20; color coded by statistical tests (blue: SPU, green: NBS, red: global network measures). Color images available online at
There might be a concern that setting different regularization parameters for the two groups could give rise to spurious between-group differences, though the controlled type I errors did not support this hypothesis. Nevertheless, we also applied the same value of λ to all the subjects in both groups, and obtained similar results as shown in Figure 3.

Type I error and power for testing unstructured network differences when the same regularization was imposed on all subjects: with varying connection density of the estimated precision matrix when the true network connection density is T=0.20. Note that the unit of x-axis is λ
i
in reverse order; color coded by statistical tests (blue: SPU, green: NBS, red: global network measures). Color images available online at
For the cases of higher true connection density levels at T=0.30, 0.60, and 0.80 with varying estimated connection density levels, the power analyses are provided in Supplementary Figures S1–S3; the general conclusions were the same. Detailed results for type I error rates are provided in Supplementary Tables S1 and S2.
Cross-validation-selected connection density of estimated networks
Figure 4 illustrates the performance of the tests with CV-selected connection density at various connection density levels T of the true precision matrices

Cross-validation estimated networks: type I error and power for testing unstructured network differences at different connection density levels of the true precision matrices, T=0.20, 0.30, 0.60, and 0.80; color coded by statistical tests (blue: SPU, green: NBS, red: global network measures). Color images available online at
Performance in estimating networks
To explore whether our conclusions were unduly influenced by errors in estimating networks, we show Figure 5 to illustrate the performance in estimating networks (i.e., either

Performance of estimating networks with varying connection density levels of the estimated precision matrices for the true network connection density at T=0.20.
Number of time points
Figure 6 illustrates the effects of the number of time points M on statistical power. The true connection density was fixed at T=0.20, and we generated different numbers of BOLD signals for each subject at M=90, 180, or 540; we estimated Σ i and Θ i through 10-fold CV. In Figure 6, type I error rates were close to the nominal level of 0.05. As the number of time points (M) increased, as expected, the power went up with the use of both correlations and partial correlations in SPU and NBS, but not necessarily so in testing with summary network measures. Again, it is observed that using correlations seemed to yield higher power than using partial correlations across most tests. In addition, the power with partial correlations largely depended on the test being applied: Even with a large number of observations M=540, NBS gave low power no more than 0.2, which was much lower than those of SPU(2) and aSPU tests.

Cross-validation estimated networks: type I error and power for testing unstructured network differences with a varying number of time-courses M=90, 180, and 540 for the true network (precision matrix) connection density at T=0.20; color coded by statistical tests (blue: SPU, green: NBS, red: global network measures). Color images available online at
Set-up 2: structured differences in brain networks
Structured differences in sparse precision matrices
Figure 7 depicts empirical type I error rates and power for testing structured network differences in sparse true precision matrices. The patterns of the relative power between using correlations and partial correlations, and across estimated connection density levels, were similar to those observed in Figure 2 with unstructured network differences. In summary, using correlations consistently yielded higher power than using partial correlations; high regularization on estimating precision matrices often improved power; with the use of partial correlations, the SPU(2) and aSPU tests seemed to be more powerful than NBS, while testing with the summary network measures was low powered.

Type I error and power for testing structured network differences with sparse true precision matrices; color coded by statistical tests (blue: SPU, green: NBS, red: global network measures). Color images available online at
Structured differences in sparse covariance matrices
So far, we have almost exclusively considered cases with the true precision matrices with varying sparsity levels. The assumption of sparse precision matrices seems to be reasonable given that a precision matrix can distinguish direct and indirect connections in a network. Nevertheless, since little is known about the true architecture of human brain functional connectivity, we also explored a case with sparse true covariance matrices, not sparse precision matrices. Figure 8 presents the type I error and power for testing structured network differences present in sparse true group-level covariance matrices. Compared with Figure 7, the pattern of the relative power between using correlations and using partial correlations is reversed: Using partial correlations gave much higher power across almost all the tests at any estimated connection density level. Since the true covariance matrices were sparse, high regularization on covariance estimation yielded higher powered testing with correlations; on the other hand, the sparse true covariance matrices induced nonsparse precision matrices, implying that suitable regularization, not necessarily high regularization, on precision matrix estimation led to more powerful testing with partial correlations

Type I error and power for testing structured network differences with sparse true covariance matrices; color coded by statistical tests (blue: SPU, green: NBS, red: global network measures). Color images available online at
An explanation
Figure 9a, b give the distributions of the network edge-wise mean differences between the two groups in terms of correlations or partial correlations; the univariate edge-wise t-statistics showed similar distributions (not shown). It is clear that, if the true precision matrices were sparse, there were much larger differences between the two groups in terms of edge-wise correlations than those in terms of partial correlations (Fig. 9a), suggesting higher power with the use of correlations. On the other hand, if the true covariance matrices were sparse, then an opposite conclusion can be drawn (Fig. 9b).

Distributions of edge-wise mean differences in z-transformed correlations or partial correlations.
Discussion
Our numerical studies have revealed a number of interesting observations. First of all, if the true precision matrices are sparse across the groups, then using correlations often gives higher power and more stable results in testing group differences. On the other hand, if the true covariance matrices are sparse, then using partial correlations often yields more powerful tests. Since often it seems plausible to assume sparse precision matrices, we would recommend the use of marginal correlations; if this assumption is questionable as in practice, then to be safe one might want to try both marginal correlations and partial correlations. Second, optimal estimation of networks does not necessarily lead to high power for testing group differences in brain connectivity. As we observed, CV was successful in optimally estimating networks, giving a network estimate with a nearly minimal error, but did not necessarily yield high power for testing group differences in networks. Third, topological network structures, as group differences, do not seem to change our main conclusions.
For estimation of functional connectivity, partial correlation is known to be attractive in that it can distinguish direct connections from indirect ones. In addition, in the high-dimensional setting, estimating a sparse precision matrix (and using partial correlations) naturally identifies pairs of nodes that are unconnected in the graphical model. These features can provide a useful tool for visualizing the relationships among brain ROIs and for generating biological hypotheses. In our study, along with these attractive features, using partial correlations could maintain high power when high regularization was applied and an appropriate statistical test such as aSPU or NBS was chosen. In practice, however, it is not guaranteed to be able to choose an appropriate sparsity level in estimating the precision matrices to yield high power for any given data. In particular, our simulation results suggested that the conventional CV technique did not perform well in terms of statistical power, though it could estimate the networks accurately. Rather than choosing one single best regularization parameter value (or equivalently, sparsity level) for each estimated covariance or precision matrix, it would be interesting to develop an adaptive test that combines the testing results from using multiple values of the regularization parameter in estimating the covariance and precision matrices for testing group differences in brain functional connectivity. Furthermore, we have not discussed the use of other association measures (Varoquaux and Craddock, 2013); in particular, we have not considered the situation with dynamic functional (or directional) networks (Zhang et al., 2014), for which some new estimators of associations have just been proposed (Lindquist et al., 2014). These are interesting topics to be studied.
Footnotes
Acknowledgments
This research was supported by NIH grants R01GM113250, R01HL105397, R01HL116720, and R01GM081535, and by the Minnesota Supercomputing Institute.
Author Disclosure Statement
No competing financial interests exist.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
