Impact of Independent Component Analysis Dimensionality on the Test–Retest Reliability of Resting-State Functional Connectivity

Abstract

Background:

As resting-state functional connectivity (rsFC) research moves toward the study of individual differences, test–retest reliability is increasingly important to understand. Previous literature supports the test–retest reliability of rsFC derived with independent component analysis (ICA) and dual regression, yet the impact of dimensionality (i.e., the number of components to extract from group-ICA) remained obscure in the current context of large-scale data sets.

Methods:

To provide principled guidelines on this issue, ICA at dimensionalities varying from 25 to 350 was applied to the cortical surface with resting-state functional magnetic resonance imaging data from 1003 participants in the Human Connectome Project. The reliability of two rsFC measures, (within-component) coherence and (between-component) connectivity, was estimated.

Results:

Reliability and its change with dimensionality varied by network: the cognitive (frontoparietal, cingulo-opercular, dorsal attention, and default) networks were measured with the highest reliability, which improved with increased dimensionality until at least 150; the visual and somatomotor networks were measured with lower reliability, which benefited mildly from increased dimensionality; and the temporal pole/orbitofrontal cortex (TP/OFC) network was measured with the lowest reliability. Overall, ICA reliability was optimized at dimensionalities of 150 or above. Compared with two popular binary, nonoverlapping cortical atlases, ICA and dual regression resulted in higher reliability for the cognitive networks, lower reliability for the somatomotor network, and similar reliability for the visual and TP/OFC networks.

Discussion:

These findings highlight analytical decisions that maximize the reliability of rsFC measures and how they depend on one's networks of interest.

Impact statement

Independent component analysis (ICA) and dual regression is a popular approach to resting-state functional connectivity (rsFC) analysis. Yet there is little consensus around the optimal ICA dimensionality, i.e., how many brain components to extract from group-ICA. We proposed that rsFC test-retest reliability is an important criterion when choosing dimensionality. The present study compares rsFC reliability across various dimensionalities in the state-of-the-art Human Connectome Project data. We also compared reliability based on ICA versus two popular brain atlases. The findings are of interest to both researchers who study brain parcellation and those who use rsFC to examine brain-behavior relationship.

Introduction

Resting-state functional connectivity (rsFC) (Biswal et al., 1995) is a powerful tool for understanding brain functional organization. Recent interests in rsFC research evolve from making between-group comparisons into understanding within-group heterogeneity, such as individual differences in personality (Dubois et al., 2017), cognition (Dubois et al., 2018), and overall life adjustment (Smith et al., 2015). Evidence further indicates that an individual's entire rsFC information (referred to as the “connectome”) serves as a “fingerprint” that uniquely identifies the individual from a group (Finn et al., 2015). These findings support the potential of rsFC to unveil the neural foundation of individual differences and serve as biomarkers for diagnosis, prognosis, and symptom monitoring of neuropsychiatric disorders. One essential prerequisite for these applications is satisfactory neurometric—or measurement—properties of rsFC. The current project addresses the need for analytical guidelines for improving rsFC neurometrics.

Previous work establishes that rsFC reflects patterns of brain functional organization that are reproducible over time, across groups, and between task and resting states (Choe et al., 2015; Meindl et al., 2010; Wisner et al., 2013). To be used in individual differences research, rsFC measures additionally need to have adequate test–retest reliability. Unreliable measures lead to underestimation of effect sizes only when sample sizes are large, while overestimation is equally likely when sample sizes are small (Loken and Gelman, 2017). This can lead to even more bias as many rsFC studies involve a preselection process where only brain connections that bivariately correlate with the outcome of interest are entered into further analysis. Thus, maximizing test–retest reliability is crucial for powerful and reproducible rsFC research. In addition, knowledge about test–retest reliability is necessary for (1) validating the application of rsFC for single-subject inferences in clinical settings; (2) guiding advancement in image acquisition and processing techniques (Marchitelli et al., 2017); and (3) providing insights about state- versus trait-dependent neural mechanisms (Geerligs et al., 2015).

One crucial decision that presumably affects the test–retest reliability of rsFC is the method of dimension reduction, that is, the way to grouping brain voxels into functionally coherent parcels/networks/components for which rsFC is derived. One popular approach for this purpose is independent component analysis (ICA) (Beckmann and Filippini, 2009). In this data-driven approach, group-level spatial ICA groups brain voxels into components that are maximally spatially independent. This is typically followed by a two-step regression procedure (i.e., dual regression) that derives subject-specific time series and spatial maps corresponding to the components. RsFC measures can then be derived: (within-component) coherence represents the mean connectivity strength, or consistency of activities within the spatial map of a component; (between-component) connectivity reflects the temporal synchrony between the activities of two components. Accumulating evidence supports the validity of the ICA and dual regression approach (Biswal et al., 2010; Laird et al., 2011; Smith et al., 2009). Several studies concluded that the test–retest reliability of rsFC derived with ICA and dual regression was acceptable but varied widely across the brain (Choe et al., 2015; Mejia et al., 2016; Poppe et al., 2013; Shirer et al., 2015; Zuo et al., 2010).

Despite these early reports, recent key advancements in the field call for new investigations into rsFC reliability. First, large-scale neuroimaging data sets [e.g., the Human Connectome Project (HCP) (Van Essen et al., 2013), the International Neuroimaging Data-sharing Initiative (Mennes et al., 2013), and the Enhancing NeuroImaging Genetics through Meta-Analysis Consortium (Thompson et al., 2014)] have increased the size of resting-state functional magnetic resonance imaging (rsfMRI) data sets by at least 10 times through larger sample sizes, longer scan duration, and higher spatial and temporal resolution. These factors will likely improve the performance of ICA as well as other dimension reduction algorithms. Second, it is now considered vital for rsFC research to remove motion and physiological artifacts (Power et al., 2015), and several advanced denoising methods have only recently become available (Pruim et al., 2015; Salimi-Khorshidi et al., 2014).

In this context, recent studies have characterized rsFC reliability in large-scale rsfMRI data sets (Dubois et al., 2017; Mejia et al., 2016; Noble et al., 2017; Zhang et al., 2018). However, to date no study has focused on the impact of ICA dimensionality on reliability. ICA dimensionality is the number of independent components to extract from group-ICA and affects the spatial, functional, and neurometric properties of the resulting components (Abou-Elseoud et al., 2010, 2011; Beckmann and Smith, 2004; Poppe et al., 2013; Ray et al., 2013). Low dimensionalities may result in large heterogeneous components, while high dimensionalities may lead to separation of functionally distinct networks and also an abundance of artifactual components, especially when data are few and of poor quality (Beckmann and Smith, 2004; Särelä and Vigário, 2003). Previous ICA studies have recommended a lower dimensionality of around 20 and a higher dimensionality of around 70 based on functional interpretability of components, whereas dimensionalities above 150 were discouraged (Abou-Elseoud et al., 2010; Ray et al., 2013). However, algorithms other than ICA indicate that more than 300 components may be meaningfully derived (Glasser et al., 2016; Gordon et al., 2014). While Mejia and colleagues (2016) reported rsFC reliability at various dimensionalities in a subsample of the HCP, the interaction between dimensionality and brain region was not taken into account, and the dimensionalities tested were sparse.

To provide guidelines for the field regarding dimensionality decisions, we used a large rsfMRI data set to examine the test–retest reliability of two rsFC measures, coherence and connectivity, when decomposing the cortical surface with ICA. Our aims were twofold: (1) examine the impact of ICA dimensionality (varying from 25 to 350) as well as other analytical decisions on the reliability of rsFC; and (2) compare the reliability derived with ICA and dual regression with two existing binary, nonoverlapping cortical atlases, including a 333-region atlas derived with rsFC-boundary (Gordon et al., 2014) and a 360-region atlas derived with multimodal parcellation (Glasser et al., 2016).

Materials and Methods

Image acquisition and preprocessing

RsfMRI data were from the 1200 Subjects Data Release of the WashU–UMinn HCP. One thousand three participants from 429 families completed rsfMRI scans [age = 28.7 ± 3.7 years, range = 22–37 years, 53% female, mean relative root mean square movement (relative RMS) (Jenkinson et al., 2002) = 0.088 ± 0.037 mm]. This study was approved by the University of Minnesota Institutional Review Board (1605S87421).

Image acquisition procedure and parameters are detailed in Van Essen and colleagues (2013). Briefly, participants completed two scan sessions (REST1 and REST2), typically over two consecutive days. Each session included two 15-min rsfMRI runs (TR = 0.72 sec, voxel size = 2.0 mm isotropic) with opposite phase encoding directions (left-to-right [LR] and right-to-left [RL]). This resulted in four rsfMRI runs for each participant (REST1_LR, REST1_RL, REST2_LR, and REST2_RL).

“Resting State Denoised” data were downloaded from ConnectomeDB. Images were preprocessed with the HCP preprocessing pipeline v3.4.0, which included minimal preprocessing as detailed in Glasser and colleagues (2013), followed by high-pass filtering (FWHM = 2355 sec) and removal of artifacts and 24 motion parameters with FMRIB's ICA-based Xnoiseifier (FIX) (Griffanti et al., 2014) as detailed in Smith and colleagues (2013). The images were registered to the standard CIFTI grayordinate space using an areal-feature-based Multimodal Surface Matching algorithm (“MSMAll”) (Glasser et al., 2016; Robinson et al., 2014) and spatially smoothed at 2 mm FWHM with pipeline v3.13.2.

ICA and component categorization

Similar to previous studies (Schaefer et al., 2018; Yeo et al., 2014), we restricted the parcellation to the cerebral cortex due to low signal-to-noise ratio in the subcortical regions [see the supplementary methods of Glasser and colleagues (2016) for an example of subcortical oversplitting at high dimensionalities when these regions were included in group-ICA].

Participants' CIFTI data in the cerebral cortex were temporally demeaned, variance normalized, concatenated, and reduced to the top 4500 weighted spatial eigenvectors through MELODICs Incremental Group-PCA (Smith et al., 2014), resulting in a 59,412 grayordinates × 4500 eigenvector matrix. Group-ICA of the cerebral cortex was performed by feeding the principal component analysis (PCA)-reduced data to the Infomax algorithm in FSL's MELODIC (Beckmann and Smith, 2004). ICA dimensionality was set at 25, 50, 100, 150, 200, 250, 300, or 350.

We categorized the resulting components with the seven canonical resting-state networks in Yeo and colleagues (2011). We first parcellated the brain into nonoverlapping components through which each grayordinate was assigned to the component that had the largest value for that grayordinate among all component maps. We then characterized association indices (Dice, 1945), where the association index of a canonical network with a component was defined as the proportion of grayordinates in the component that overlapped with the canonical network. A component was classified under one of the following canonical networks with which it had the largest association index: (1) frontoparietal, (2) cingulo-opercular, (3) dorsal attention, (4) default, (5) visual, (6) somatomotor, and (7) temporal pole/orbitofrontal cortex (TP/OFC).

Coherence and connectivity

To estimate subject- and run-specific coherence and connectivity at each ICA dimensionality, we first used dual regression (Beckmann and Filippini, 2009) on individual rsfMRI data in each run to derive time series and spatial maps corresponding to the group-level components. Variance normalized time series were used in stage 2 of dual regression to derive spatial maps with both shape and amplitude information (Nickerson et al., 2017).

Coherence

For each component, we characterized coherence by first creating a binary mask with the group-level component map. Specifically, we normalized the group-level component z statistic map by its maximum z value so that the maximum of the normalized map was 1. We then binarized the normalized map after applying a normalized threshold varying from 0.1 to 0.9, in steps of 0.1. According to prior work, this procedure has similar effects to thresholding according to the connectivity of each grayordinate with the peak grayordinate of the component (Poppe et al., 2013). We then applied this mask to the individual spatial map and computed the mean within the mask as coherence. For each normalized threshold, this resulted in a d components × 1003 subjects × 4 runs array of coherence values (Supplementary Fig. S1a, d: dimensionality). Of note, our coherence metric differs from that in electroencephalogram and magneto encephalogram research, where coherence represents the synchronicity of signals in the frequency domain (Thatcher et al., 1986). Previous work supports the reliability of coherence by our definition with links to individual differences (Abram et al., 2015; Blain et al., 2020; Poppe et al., 2013).

Connectivity

For each pair of components, we characterized connectivity as the correlation between their time series. Three correlation methods in the FSLNets toolbox were used, including (1) full (i.e., Pearson's) correlation; (2) partial correlation with L1 regularization; and (3) partial correlation with L2 regularization. Partial correlation is thought to reflect the direct connectivity between two components independent from the contribution of other components and has been shown to successfully detect true connections in simulated and real neural networks (Marrelec et al., 2006; Smith et al., 2011). We set the tuning parameters for L1 (λ = 100) and L2 (ρ = 0.01) regularization to be consistent with the HCP's parcellations+time series+netmats release. For each correlation method, this resulted in a d components × d components × 1003 subjects × 4 runs array of connectivity values (Supplementary Fig. S1b). Connectivity values were then transformed with Fisher's z transformation.

The Gordon and Glasser atlases

While components derived with ICA are unthresholded and overlap, regions in the Gordon and Glasser atlases are binary and do not overlap. Thus, the procedure of characterizing coherence and connectivity for these two atlases differed from that described above. Coherence for a Gordon or Glasser region was defined as the average (Fisher's z transformed) Pearson's correlation of the time series of grayordinates within the region with its mean time series. Connectivity between two regions was defined as the (full, L1 regularized, or L2 regularized) correlation between their mean time series. Despite the disparity between the definitions of coherence and connectivity for ICA and dual regression versus the Gordon and Glasser parcellations, we believe that each reflects the modal of how rsFC measures are derived for the respective approaches in the literature, which makes their comparison reasonable and meaningful. Reliability when forcing ICA spatial maps to be binary or nonoverlapping is shown in Supplementary Figures S9–S12.

Reliability

We evaluated the test–retest reliability of rsFC with intraclass correlation coefficients (ICCs), type “A-1” as defined in McGraw and Wong (1996). For any rsFC measure of interest, the observed rsFC $x_{i j}$ of the ith subject in the jth run can be specified according to the following two-way random effects model: $x_{i j} = μ + r_{i} + c_{j} + r c_{i j} + e_{i j}$

where $μ$ is the population mean rsFC; r_i is the random effect of the ith subject, independently distributed with mean 0 and variance $σ_{r}^{2}$ ; c_j is the random effect of the jth run, independently distributed with mean 0 and variance $σ_{c}^{2}$ ; $r c_{i j}$ is the random interaction effect of the ith subject in the jth run, independently distributed with mean 0 and variance $σ_{r c}^{2}$ ; and $e_{i j}$ is the random error of the ith subject in the jth run, independently distributed with mean 0 and variance $σ_{e}^{2}$ . ICC is defined as the proportion of total variance in rsFC accounted for by the subject effect: $I C C = \frac{σ_{r}^{2}}{σ_{r}^{2} + σ_{c}^{2} + σ_{r c}^{2} + σ_{e}^{2}}$

We calculated ICCs with the MATLAB script ICC.m developed by Salarian (2016). Although ICCs in theory are never negative, ICC estimates can take negative values when $σ_{r}^{2}$ is so small that an ANOVA model estimates it as negative. In the current study, we recorded negative ICC estimates, with an understanding that the true ICCs were likely very small in these cases.

We derived within-session reliability as the mean of ICCs between any pair of 15-min runs within the same session (i.e., REST1_LR vs. REST1_RL, and REST2_LR vs. REST2_RL) and between-session reliability as the mean of ICCs between any pair of 15-min runs between sessions (i.e., REST1_LR vs. REST2_LR, REST1_RL vs. REST2_RL, REST1_RL vs. REST2_LR, and REST1_LR vs. REST2_RL). We interpret ICCs according to the following rule of thumb: ICCs less than 0.4 indicate “poor” reliability; ICCs between 0.4 and 0.6 indicate “fair” reliability; ICCs between 0.6 and 0.8 indicate “good” reliability; and ICCs larger than 0.8 indicate “excellent” reliability (Cicchetti and Sparrow, 1981).

Potential motion confounds

Head motion is known to confound rsFC measures (Power et al., 2015). In the current study, motion denoising strategies include ICA-FIX artifact removal and 24 motion parameter regression. To further rule out motion confounds in reliability estimates, we regressed mean relative RMS in each run out of rsFC measures and recalculated ICCs.

Results

ICA-based cortical parcellations across dimensionalities are shown in Figure 1 alongside the Gordon and Glasser atlases. Increasing dimensionality resulted in finer parcellation of all canonical networks, particularly the default network.

FIG. 1.

Cortical parcellations. (a) Cortical parcellations with ICA along with the Gordon and Glasser atlases. ICA components are shown as nonoverlapping through which each grayordinate was assigned to the component that had the largest value for that grayordinate. (b) Parcellation by canonical networks. d, ICA dimensionality; ICA, independent component analysis; TP/OFC, temporal pole/orbitofrontal cortex.

Coherence reliability

The distributions of coherence reliability improved with higher normalized thresholds (Supplementary Fig. S2). Below we report coherence reliability when normalized threshold was 0.5, balancing gain in reliability and inclusiveness of component maps. We also report between-session reliability unless otherwise specified as it generalizes to most research designs. Reliability estimates were not systematically biased by image reconstruction version (Supplementary Fig. S3a) or family structure in the sample (Supplementary Fig. S4a).

Coherence reliability within each canonical network was summarized in Figure 2. In the following discussion, we compare parcellations based on descriptive statistics. Formal statistical testing was not performed for two reasons: (1) ICCs derived from parcellations that include the same grayordinates are neither fully independent nor paired, violating the assumptions of most parametric and nonparametric tests; and (2) significant test results do not speak to differences between distributions in spread and shape. Neither do they indicate that differences are of practical significance.

FIG. 2.

Coherence reliability within canonical cortical networks. Each point in the sina plots (Sidiropoulos et al., 2018) corresponds to one network. Points are jittered across the x-axis to form the shapes of the density distributions. Boxplot outliers are more than 1.5 interquartile range above the 75th or below the 25th percentile. See main text for important differences in coherence definition for ICA and dual regression versus the Gordon and Glasser atlases. Shown here: between-session reliability of 15-min runs, normalized threshold = 0.5. ICC, intraclass correlation coefficient.

Reliability in the frontoparietal, cingulo-opercular, dorsal attention, and default networks ranged from “fair” to “excellent,” with most ICCs falling into the “good” range. In these cognitive networks, reliability improved considerably with increased dimensionality until plateauing at 150 or 200. Overall, ICA and dual regression resulted in higher reliability than the Gordon and Glasser atlases.

Reliability in the visual and somatomotor networks ranged mostly from “fair” to “good” and reached “excellent” in rare cases. Reliability improved mildly with increased dimensionality, although nearly negligible after dimensionality reached 250. Overall, ICA and dual regression resulted in slightly higher reliability than the Gordon and Glasser atlases.

Reliability in the TP/OFC was highest in the Gordon atlas, which was mostly “fair” and “good.” ICA reliability showed little change with increased dimensionality (except for higher reliability when dimensionality was 100) and was lower than the Gordon and Glasser atlases overall.

Connectivity reliability

Full correlation resulted in the best overall connectivity reliability among the three correlation methods (Supplementary Fig. S5a). Connectivity derived with partial correlation was highly restricted in range and was only more reliable than full correlation for a small portion of connections with nearly zero “effective connectivity” (Supplementary Fig. S5b). Thus, below we report between-session reliability of full correlation connectivity. Reliability estimates were not systematically biased by image reconstruction version (Supplementary Fig. S3b) or family structure in the sample (Supplementary Fig. S4b).

Connectivity reliability of connections within each canonical network is summarized in Figure 3. Connectivity reliability of connections between two canonical networks is summarized in Supplementary Figure S6. We used the same rationale for interpretation as explained for coherence reliability.

FIG. 3.

Connectivity reliability within canonical cortical networks. Reliability of connections within the same canonical network is shown. For connections between two canonical networks, see Supplementary Figure S6. Each point in the sina plots (Sidiropoulos et al., 2018) corresponds to one connection. Points are jittered across the x-axis to form the shapes of the density distributions. Boxplot outliers are more than 1.5 interquartile range above the 75th or below the 25th percentile. See main text for important differences in connectivity definition for ICA and dual regression versus the Gordon and Glasser atlases. Shown here: between-session reliability of 15-min runs, correlation method = full correlation.

Reliability in the frontoparietal and default networks ranged from “poor” to “excellent,” with most ICCs categorized as “fair” and “good.” Reliability improved considerably as dimensionality increased from 50 to 100 and 100 to 150 and plateaued afterward. ICA and dual regression offered better reliability than the Gordon and Glaser atlases.

Reliability in the cingulo-opercular and dorsal attention networks ranged from “poor” to “good,” with most ICCs falling into the “fair” range. Reliability improved somewhat as dimensionality increased until 200 and plateaued. ICA and dual regression and the Gordon and Glasser atlases offered comparable mean reliability. In particular for the cingulo-opercular network, ICA reliability had a wider spread with more “poor” as well as “good” connections.

Reliability in the visual network ranged from “poor” to “good” and was mostly “fair.” Increases in dimensionality led to noticeable increases in the range of reliability distributions, while changes in central tendency were minimal. Overall, the Glasser atlas offered the best reliability in this network by a narrow margin.

Reliability in the somatomotor network was highest in the Gordon and Glasser atlases, where most ICCs were “fair.” ICA reliability changed minimally with dimensionality and at least half of the ICCs were “poor.”

TP/OFC was the only network with mostly “poor” reliability irrespective of parcellation method.

Reliability and population mean and standard deviation

Based on previous findings that suggest higher reliability in statistically nonzero connections (Shehzad et al., 2009), we examined the relationship between reliability and the population mean and standard deviation of rsFC to test the possibility that reliability can be readily predicted from such descriptive information. We computed the population standard deviation as the standard deviation of average subject rsFC over four runs.

For coherence, the population mean and standard deviation were weakly and mostly nonsignificantly correlated with reliability, except for a correlation of 0.61 between the population mean and reliability for ICA at a dimensionality of 25 and a correlation of 0.40 at a dimensionality 50 (Table 1 and Fig. 4a, b).

FIG. 4.

Reliability and population mean and standard deviation. (a) Coherence reliability did not correlate with population mean. (b) Coherence reliability did not correlate with population standard deviation. (c) Connectivity reliability and population mean followed a “funnel”-shaped relationship. Black points represent connections between two canonical networks. Bottom left: Connectivity reliability and absolute population mean. (d) Connectivity reliability highly positively correlated with population standard deviation. (e) Distribution of connections with low connectivity and “poor” reliability in canonical networks. Each cell in the heatmap represents the proportion of connections between two canonical networks that fell into the corresponding type. (f) Distribution of connections with low connectivity and at least “good” reliability in canonical networks. 1. frontoparietal; 2. cingulo-opercular; 3. dorsal attention; 4. default; 5. visual; 6. somatomotor; 7. TP/OFC. Shown here: between-session reliability of 15-min runs, d = 200. For coherence, normalized threshold = 0.5. For connectivity, correlation method = full correlation. Conclusions were similar at other ICA dimensionalities. Patterns in (e, f) were similar when the Fisher's z threshold for weak connections was set at 0.05 or 0.2.

Table 1.

Correlation Between Resting-State Functional Connectivity Reliability and Population Mean and Standard Deviation

	d25	d50	d100	d150	d200	d250	d300	d350	Gordon	Glasser
Coherence	Population mean
Reliability
Reliability	0.61^*	0.40^*	0.03	0.02	−0.03	−0.08	−0.06	−0.09	0.29^***	0.10
	Population standard deviation
	0.31	0.30	−0.04	0.02	0.01	−0.01	0.05	0.04	−0.19^**	−0.23^***
Connectivity	Population mean
Reliability
Reliability	0.23^***	0.41^***	0.37^***	0.38^***	0.39^***	0.41^***	0.42^***	0.42^***	0.63^***	0.64^***
	Population standard deviation
	0.70^***	0.82^***	0.81^***	0.85^***	0.85^***	0.86^***	0.87^***	0.87^***	0.81^***	0.78^***

For coherence, correlation was computed across all components/regions. For connectivity, correlation was computed across all connections. Shown here: between-session reliability of 15-min runs, normalized threshold for coherence = 0.5, correlation method for connectivity = full correlation.

p < 0.05; ^** p < 0.01; ^*** p < 0.001. All p values were Bonferroni adjusted for 10 comparisons.

d, dimensionality.

For connectivity, reliability was highly positively correlated with population standard deviation for all parcellations (Table 1 and Fig. 4d). The relationship between reliability and population mean for ICA parcellations resembled a “funnel” shape (Fig. 4c): while reliability tended to be higher as connections became stronger (positive or negative), reliability of weak connections ranged from “poor” to almost “excellent.” Connections that were weak on the population level (defined heuristically as |Fisher's z transformed full correlation| < 0.1, or roughly less than 1% shared variance) and measured with “poor” reliability arguably reflected a lack of functional relationship between components (Fig. 4e). These included connections with the TP/OFC, connections of the visual and somatomotor networks with the cognitive networks, and connections between the visual and somatomotor networks. On the contrary, a number of connections within the cognitive networks were weak on the population level yet measured with at least “good” reliability (Fig. 4f). As high reliability indicated high interindividual variability (Fig. 4d), these connections were characterized by considerable heterogeneity within the population, so that they may be positive, negative, or zero for different individuals and reliably measured. This finding calls into question a common practice of only analyzing statistically significant connections. When dimensionality was 200, 40.2% of the connections with “good” reliability and above were not statistically significant at a significance level of 0.05 (uncorrected).

Reliability loss from within- to between-session

To isolate the effect of within- to between-session and effect of phase encoding direction, here we estimated between-session reliability only from runs with opposite phase encoding directions (i.e., the mean of REST1_LR vs. REST2_RL and REST1_RL vs. REST2_LR). When ICA dimensionality was 200 (Supplementary Fig. S7), loss in coherence reliability ranged from 0 to 0.16 (mean = 0.07, standard deviation = 0.03); and loss in connectivity reliability ranged from −0.09 to 0.16 (mean = 0.05, standard deviation = 0.03). Patterns were similar at other dimensionalities as well as for the Gordon and Glasser atlases.

Reliability when controlling for motion

As is shown in Supplementary Figure S8a and b, some coherence and connectivity estimates remained correlated with mean relative RMS despite motion denoising strategies. This is expected, and likely reflects widespread artifacts in the BOLD signal due to respiration accompanying motion (Power et al., 2018). When controlling for mean relative RMS, coherence reliability changed minimally for most components yet increased by almost 0.2 for a small number of components (Supplementary Fig. S8c, e). Visual inspection indicated that the components with more pronounced increase in coherence reliability fell in the prefrontal, TP, and temporal parietal junction regions across dimensionalities. On the contrary, changes in connectivity reliability were mostly negligible (Supplementary Fig. S8d, f).

Discussion

To establish principled guidelines for the field, we examined the test–retest reliability of two rsFC measures on the cortical surface derived with ICA and dual regression and compared it with two existing binary, nonoverlapping cortical atlases. Canonical networks differed in their level of reliability. ICA dimensionality affected reliability and such effects varied by network. Canonical networks differed in whether they were more reliably measured with ICA and dual regression or the Gordon and Glasser atlases. For connectivity, but not coherence, reliability could be predicted by population standard deviation estimated from a single run. Contrary to hypotheses, neither connection strength nor statistical significance was a good predictor of connectivity reliability. By using a large rsfMRI data set and state-of-the-art image processing methods, this study updates previous knowledge on rsFC reliability derived with ICA and dual regression.

Canonical networks varied in reliability

Consistent with previous studies (Mejia et al., 2016; Mueller et al., 2013, 2015; Shah et al., 2016; Somandepalli et al., 2015; Zuo and Xing, 2014; Zuo et al., 2010), we found varied reliability across canonical networks. The cognitive networks, especially the frontoparietal and default networks, were measured with the highest reliability, averaging at “good” for coherence and “fair” to “good” for connectivity. The visual and somatomotor networks were measured with intermediate reliability, ranging from “fair” to “good” for coherence and mostly “fair” for connectivity. The TP/OFC network tended to be the least reliably measured: coherence reliability was typically “fair” to “good” and connectivity reliability was mostly “poor.” Notably, many components in the TP/OFC did not show meaningful connections within or outside of this network (Fig. 4e).

The neurometric advantage of the cognitive networks was not restricted to a specific rsFC approach (Mueller et al., 2015; Shah et al., 2016; Somandepalli et al., 2015; Zuo and Xing, 2014). High reliability is a combination of high interindividual variability and low intraindividual variability. Compared with other networks, connections in the cognitive networks had higher population variance (Fig. 4d) and less loss in reliability from within- to between-session (Supplementary Fig. S7b), which may reflect abundant individual difference information and less day-to-day variability. On the contrary, relatively lower reliability in the visual and somatomotor networks was mainly a result of higher intraindividual variability (Supplementary Fig. S7), consistent with a previous report (Laumann et al., 2015). Lastly, poor connectivity reliability in the TP/OFC likely reflects susceptibility artifacts in this region. Fine parcellation in this network is particularly challenging, and ICA and other algorithms alike resulted in components that did not show meaningful engagement in the brain's functional organization, although spatially reliable.

ICA dimensionality had different effects on reliability across canonical networks

Canonical networks responded differently to increases in ICA dimensionality. The cognitive networks benefited from increased dimensionality until at least 150, while reliability gain for the visual, somatomotor, and TP/OFC networks was small. An ICA dimensionality of 150 or above appeared to maximize rsFC reliability in the cortical surface.

That a finer parcellation can be as reliable as or more reliably measured than its coarse counterpart is encouraging and may reflect the advantage of the current large data set. Indeed, the optimal dimensionality in the current data set surpasses that recommended in previous studies (Abou-Elseoud et al., 2010; Ray et al., 2013). Future studies are needed to verify the validity of these high-dimension parcellations, especially their utility in investigating brain/behavior relationships.

Comparing ICA and dual regression with the Gordon and Glasser atlases

ICA and dual regression and the Gordon and Glasser atlases showed different strengths in measuring canonical networks. ICA and dual regression at a dimensionality of at least 150 typically led to higher reliability in the cognitive networks. The Gordon and Glasser atlases led to higher connectivity reliability in the somatomotor network. Performance was on par for the visual network and equally poor for the TP/OFC.

One apparent source of the differences described above is the differences in how coherence and connectivity were calculated for ICA and dual regression versus the binary, nonoverlapping Gordon and Glasser atlases. To examine this further, we performed new analyses where we forced the ICA spatial maps into (1) binary, overlapping components, with a normalized threshold of 0.5; and (2) binary, nonoverlapping components, by assigning each grayordinate to the component with the largest value for that grayordinate (i.e., Fig. 1a). Coherence and connectivity of these binary components were then computed in identical ways with the Gordon and Glasser atlases and reliability comparisons are shown in Supplementary Figures S9–S12. Briefly, (1) and (2) had similar reliability patterns that deviated from ICA and dual regression. For coherence, ICA reliability did not increase with dimensionality nor did it surpass the Gordon and Glasser atlases. For connectivity, ICA reliability in the cognitive networks increased with dimensionality with less steep slopes and outperformed the Gordon and Glasser atlases in the frontoparietal and default networks by a smaller margin. Interestingly, ICA reliability in the somatomotor network actually improved and was on par with the Gordon and Glasser atlases.

Taken together, dual regression on nonbinary, overlapping ICA spatial maps contributed to the superiority of ICA-derived reliability to the Gordon and Glasser atlases. Presumably, using dual regression allowed for individualized mapping of group-level components, resulting in more reliable measures. Consistent with this interpretation, this effect was more obvious at high dimensionalities, where misalignment of components is more consequential. In addition, using nonbinary components likely improved reliability by allowing the weighted sum of rsFC information, and this benefit is most noticeable for coherence. On the contrary, the similarity between the reliability of (1) and (2) suggests that reliability was not greatly altered by allowing components to overlap. Notably, Glasser and colleagues (2016) created a classifier that allows customized parcellation of an individual brain according to the Glasser atlas. While such customization may improve reliability, this classifier has not been publicly available at the time of this project.

Predicting reliability from population mean and standard deviation

For coherence, population mean and standard deviation had limited utility for predicting reliability. For connectivity, reliability was highly positively correlated with population standard deviation. Thus, the potential of a connection to be reliably measured depended largely on how variable it was across the population. Previous research also suggested that the reliability of connections can be postulated from their strengths, and those with significantly nonzero connectivity were more reliable (Shehzad et al., 2009). We, however, argue for a more nuanced approach to this issue. Importantly, we call attention to connections that are weak (|full correlation connectivity| < 0.1) on the population level, may fail to reach significance even in the current large sample, yet were measured with “good” reliability or higher due to decent interindividual variability. These connections, mostly found in the cognitive networks, may be a critical piece of understanding individual differences, such as modulating mechanisms that are present in some but not other subjects.

Other analytical decisions

In addition to ICA dimensionality, we examined the effects of other analytical decisions in ICA and dual regression on reliability. For coherence, reliability tended to increase with normalized threshold, suggesting that components were better defined as spatial maps were more restricted to their functional cores. For connectivity, the reliability of full correlation was systematically higher than partial correlation. This observation is consistent with the speculations by Smith and colleagues (2011) that partial correlation may perform worse than full correlation when the total number of components is large (e.g., >50), as the number of third-party connections to partial out grows quadratically and serves to detract from real connectivity. On a related note, it is possible that our tuning parameters for the regularization algorithms were not optimal. To conclude, although partial correlation is promising to characterize effective connectivity, more work is necessary to find the proper regularization parameters and schemes.

Limitations

Results of the current study should be interpreted with several limitations in mind. First, findings may depend on important characteristics of the HCP data set and not generalize to non-HCP data sets. Nonetheless, the HCP and HCP-style data are increasingly used by researchers, supporting the relevance of these findings. Second, while ICA typically results in some artifactual components, we included all components in our analysis as guidelines for artifact identification are not yet available and pre-exclusion may be arbitrary. Future research may examine the applicability of previously developed classifiers (De Martino et al., 2007) in the CIFTI space and at high ICA dimensionalities. Third, global noise in rsfMRI related to respiration and accompanying head motion is increasingly acknowledged (Power et al., 2018) yet not removed in this study. At the time of this study, this topic remains heavily debated (Glasser et al., 2018, 2019; Power, 2019) and algorithms such as temporal ICA (Glasser et al., 2018) are unavailable. However, we note that our reliability estimates remained mostly unchanged when controlling for mean relative RMS in rsFC, offering some confidence that the presence of global noise did not greatly impact reliability estimates. Lastly, while an important prerequisite, reliability is not the same as validation or behavioral prediction. Bijsterbosch and colleagues (2018) is one example of comparing various parcellations in predicting behavior in the HCP (although ICA in that work was performed on the whole brain and differed from the surface-constrained ICA in the present work).

Conclusion

The present study characterized the test–retest reliability of rsFC on the cortical surface derived with ICA and dual regression at various dimensionalities and compared it with two existing binary, nonoverlapping cortical atlases. Findings provide guidance for variable selection and parameter optimization for rsFC analysis, and importantly highlight that these decisions are hypothesis-dependent. In particular, network assignment is an important factor that affects the general level of reliability, optimal ICA dimensionality, and whether ICA or other parcellations are preferred. For general purposes, cortical ICA with a dimensionality of 150 may provide an optimal balance between parcellation fineness, reliability, and burden for multiple comparison correction. Reliability research is critical as neuroimaging studies continue to adopt larger sample sizes and more advanced methodology to study individual differences such as psychopathology.

Footnotes

Acknowledgments

Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University. Y.M. was supported by a University of Minnesota Informatics Institute On the Horizon Grant to A.W.M. The authors acknowledge the Minnesota Supercomputing Institute at the University of Minnesota for providing resources that contributed to the research results reported within this article.

Data Availability

HCP data are freely available from https://db.humanconnectome.org. Data from many figures in this study will be freely available (upon article acceptance) at https://balsa.wustl.edu/study/show/MxKx0.

Author Disclosure Statement

No competing financial interests exist.

Funding Information

This work was supported by an On the Horizon Award from the University of Minnesota Informatics Institute and a Conte Center grant P50MH119569.

Supplementary Material

Supplementary Figure S1

Supplementary Figure S2

Supplementary Figure S3

Supplementary Figure S4

Supplementary Figure S5

Supplementary Figure S6

Supplementary Figure S7

Supplementary Figure S8

Supplementary Figure S9

Supplementary Figure S10

Supplementary Figure S11

Supplementary Figure S12

References

Abou-Elseoud

, Starck

, Remes

, et al. 2010. The effect of model order selection in group PICA. Hum Brain Mapp, 31:1207–1216.

Abou-Elseoud

, Littow

, Remes

, et al. 2011. Group-ICA model order highlights patterns of functional brain connectivity. Front Syst Neurosci, 5:37.

Abram

, Wisner

, Grazioplene

, et al. 2015. Functional coherence of insula networks is associated with externalizing behavior. J Abnorm Psychol, 124:1079–1091.

Beckmann

, Smith

. 2004. Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans Med Imaging, 23:137–152.

Beckmann

, Filippini

. 2009. Group comparison of resting-state FMRI data using multi-subject ICA and dual regression. Neuroimage, 47:S148.

Bijsterbosch

, Woolrich

, Glasser

, et al. 2018. The relationship between spatial configuration and functional connectivity of brain regions. Elife, 7:e32992.

Biswal

, Mennes

, Zuo

, et al. 2010. Toward discovery science of human brain function. Proc Natl Acad Sci U S A, 107:4734–4739.

Biswal

, Zerrin

, Haughton

, et al. 1995. Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magn Reson Med, 34:537–541.

Blain

, Grazioplene

, Ma

, et al. 2020. Toward a neural model of the openness-psychoticism dimension: functional connectivity in the default and frontoparietal control networks. Schizophr Bull, 46:540–551.

10.

Choe

, Jones

, Joel

, et al. 2015. Reproducibility and temporal structure in weekly resting-state fMRI over a period of 3.5 years. PLoS One, 10:e0140134.

11.

Cicchetti

, Sparrow

. 1981. Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. Am J Ment Defic, 86:127–137.

12.

De Martino

, Gentile

, Esposito

, et al. 2007. Classification of fMRI independent components using IC-fingerprints and support vector machine classifiers. Neuroimage, 34:177–194.

13.

Dice

1945. Measures of the amount of ecologic association between species. Ecology, 26:297–302.

14.

Dubois

, Galdi

, Han

, et al. 2017. Predicting personality traits from resting-state fMRI. bioRxiv, 2017:215129.

15.

Dubois

, Galdi

, Paul

, et al. 2018. A distributed brain network predicts general intelligence from resting-state human neuroimaging data. bioRxiv, 2018:257865.

16.

Finn

, Shen

, Scheinost

, et al. 2015. Functional connectome fingerprinting: identifying individuals using patterns of brain connectivity. Nat Neurosci, 18:1664–1674.

17.

Geerligs

, Rubinov

, Henson

. 2015. State and trait components of functional connectivity: individual differences vary with mental state. J Neurosci, 35:13949–13961.

18.

Glasser

, Coalson

, Bijsterbosch

, et al. 2018. Using temporal ICA to selectively remove global noise while preserving global signal in functional MRI data. Neuroimage, 181:692–717.

19.

Glasser

, Coalson

, Bijsterbosch

, et al. 2019. Classification of temporal ICA components for separating global noise from fMRI data: reply to power. Neuroimage, 197:–438.

20.

Glasser

, Coalson

, Robinson

, et al. 2016. A multi-modal parcellation of human cerebral cortex. Nature, 536:171–178.

21.

Glasser

, Sotiropoulos

, Wilson

, et al. 2013. The minimal preprocessing pipelines for the Human Connectome Project. Neuroimage, 80:105–124.

22.

Gordon

, Laumann

, Adeyemo

, et al. 2014. Generation and evaluation of a cortical area parcellation from resting-state correlations. Cereb Cortex, 26:288–303.

23.

Griffanti

, Salimi-Khorshidi

, Beckmann

, et al. 2014. ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging. Neuroimage, 95:232–247.

24.

Jenkinson

, Bannister

, Brady

, et al. 2002. Improved optimisation for the robust and accurate linear registration and motion correction of brain images. Neuroimage, 17:825–841.

25.

Laird

, Fox

, Eickhoff

, et al. 2011. Behavioral interpretations of intrinsic connectivity networks. J Cogn Neurosci, 23:4022–4037.

26.

Laumann

, Gordon

, Adeyemo

, et al. 2015. Functional system and areal organization of a highly sampled individual human brain. Neuron, 87:657–670.

27.

Loken

, Gelman

. 2017. Measurement error and the replication crisis. Science, 355:584LP–585LP.

28.

Marchitelli

, Collignon

, Jovicich

. 2017. Test–retest reproducibility of the intrinsic default mode network: influence of functional magnetic resonance imaging slice-order acquisition and head-motion correction methods. Brain Connect, 7:69–83.

29.

Marrelec

, Krainik

, Duffau

, et al. 2006. Partial correlation for functional brain interactivity investigation in functional MRI. Neuroimage, 32:228–237.

30.

McGraw

, Wong

. 1996. Forming inferences about some intraclass correlations coefficients. Psychol Methods, 1:30–46.

31.

Meindl

, Teipel

, Elmouden

, et al. 2010. Test-retest reproducibility of the default-mode network in healthy individuals. Hum Brain Mapp, 31:237–246.

32.

Mejia

, Nebel

, Barber

, et al. 2016. Effects of scan length and shrinkage on reliability of resting-state functional connectivity in the human connectome project. arXiv Preprint, 2016:06284.

33.

Mennes

, Biswal

, Castellanos

, et al. 2013. Making data sharing work: the FCP/INDI experience. Neuroimage, 82:683–691.

34.

Mueller

, Wang

, Fox

, et al. 2013. Individual variability in functional connectivity architecture of the human brain. Neuron, 77:586–595.

35.

Mueller

, Wang

, Fox

, et al. 2015. Reliability correction for functional connectivity: theory and implementation. Hum Brain Mapp, 36:4664–4680.

36.

Nickerson

, Smith

, Öngür

, et al. 2017. Using dual regression to investigate network shape and amplitude in functional connectivity analyses. Front. Neurosci, 11:115.

37.

Noble

, Spann

, Tokoglu

, et al. 2017. Influences on the test-retest reliability of functional connectivity MRI and its relationship with behavioral utility. Cereb Cortex, 27:5415–5429.

38.

Poppe

, Wisner

, Atluri

, et al. 2013. Toward a neurometric foundation for probabilistic independent component analysis of fMRI data. Cogn Affect Behav Neurosci, 13:641–659.

39.

Power

JD.

2019. Temporal ICA has not properly separated global fMRI signals: a comment on Glasser et al. (2018). Neuroimage, 197:650–651.

40.

Power

, Plitt

, Gotts

, et al. 2018. Ridding fMRI data of motion-related influences: removal of signals with distinct spatial and physical bases in multiecho data. Proc Natl Acad Sci USA, 115:E2105–E2114.

41.

Power

, Schlaggar

, Petersen

. 2015. Recent progress and outstanding issues in motion correction in resting state fMRI. Neuroimage, 105:536–551.

42.

Pruim

RHR

, Mennes

, van Rooij

, et al. 2015. ICA-AROMA: a robust ICA-based strategy for removing motion artifacts from fMRI data. Neuroimage, 112:267–277.

43.

Ray

, McKay

, Fox

, et al. 2013. ICA model order selection of task co-activation networks. Front Neurosci, 7:237.

44.

Robinson

, Jbabdi

, Glasser

, et al. 2014. MSM: a new flexible framework for multimodal surface matching. Neuroimage, 100:414–426.

45.

Salarian

. 2016. Intraclass Correlation Coefficient (ICC) [WWW Document]. www.mathworks.com/matlabcentral/fileexchange/22099-intraclass-correlation-coefficient-icc Last accessed August 2, 2019 .

46.

Salimi-Khorshidi

, Douaud

, Beckmann

, et al. 2014. Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers. Neuroimage, 90:449–468.

47.

Särelä

, Vigário

. 2003. Overlearning in marginal distribution-based ICA: analysis and solutions. J Mach Learn Res, 4:1447–1469.

48.

Schaefer

, Kong

, Gordon

, et al. 2018. Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. Cereb Cortex, 28:3095–3114.

49.

Shah

, Cramer

, Ferguson

, et al. 2016. Reliability and reproducibility of individual differences in functional connectivity acquired during task and resting state. Brain Behav, 6:1–15.

50.

Shehzad

, Kelly

A.M.C.

, Reiss

, et al. 2009. The resting brain: unconstrained yet reliable. Cereb Cortex, 19:2209–2229.

51.

Shirer

, Jiang

, Price

, et al. 2015. Optimization of rs-fMRI pre-processing for enhanced signal-noise separation, test-retest reliability, and group discrimination. Neuroimage, 117:67–79.

52.

Sidiropoulos

, Sohi

, Pedersen

, et al. 2018. SinaPlot: an enhanced chart for simple and truthful representation of single observations over multiple classes. J Comput Graph Stat, 27:673–676.

53.

Smith

, Beckmann

, Andersson

, et al. 2013. Resting-state fMRI in the Human Connectome Project. Neuroimage, 80:144–168.

54.

Smith

, Fox

, Miller

, et al. 2009. Correspondence of the brain's functional architecture during activation and rest. Proc Natl Acad Sci USA, 106:13040–13045.

55.

Smith

, Hyvärinen

, Varoquaux

, et al. 2014. Group-PCA for very large fMRI datasets. Neuroimage, 101:738–749.

56.

Smith

, Miller

, Salimi-Khorshidi

, et al. 2011. Network modelling methods for FMRI. Neuroimage, 54:875–891.

57.

Smith

, Nichols

, Vidaurre

, et al. 2015. A positive-negative mode of population covariation links brain connectivity, demographics and behavior. Nat Neurosci, 18:1565–1567.

58.

Somandepalli

, Kelly

, Reiss

, et al. 2015. Short-term test–retest reliability of resting state fMRI metrics in children with and without attention-deficit/hyperactivity disorder. Dev Cogn Neurosci, 15:83–93.

59.

Thatcher

, Krause

, Hrybyk

. 1986. Cortico-cortical associations and EEG coherence: a two-compartmental model. Electroencephalogr. Clin Neurophysiol, 64:123–143.

60.

Thompson

, Stein

, Medland

, et al. 2014. The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav, 8:153–182.

61.

Van Essen

, Smith

, Barch

, et al. 2013. The WU-Minn Human Connectome Project: an overview. Neuroimage, 80:62–79.

62.

Wisner

, Atluri

, Lim

, et al. 2013. Neurometrics of intrinsic connectivity networks at rest using fMRI: retest reliability and cross-validation using a meta-level method. Neuroimage, 76:236–251.

63.

Yeo

, Krienen

, Chee

MWL

, et al. 2014. Estimates of segregation and overlap of functional connectivity networks in the human cerebral cortex. Neuroimage, 88:212–227.

64.

Yeo

, Krienen

, Sepulcre

, et al. 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol, 106:1125–1165.

65.

Zhang

, Baum

, Adduru

, et al. 2018. Test-retest reliability of dynamic functional connectivity in resting state fMRI. Neuroimage, 183:907–918.

66.

Zuo

, Kelly

, Adelstein

, et al. 2010. Reliable intrinsic connectivity networks: test-retest evaluation using ICA and dual regression approach. Neuroimage, 49:2163–2177.

67.

Zuo

, Xing

. 2014. Test-retest reliabilities of resting-state FMRI measurements in human brain functional connectomics: a systems neuroscience perspective. Neurosci Biobehav Rev, 45:100–118.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.17 MB

1.76 MB

2.23 MB

2.12 MB

0.67 MB

7.35 MB

0.39 MB

1.15 MB

4.14 MB

0.48 MB

21.09 MB

1.77 MB