Hierarchical Organization of Functional Brain Networks Revealed by Hybrid Spatiotemporal Deep Learning

Abstract

Hierarchical organization of brain function has been an established concept in the neuroscience field for a long time, however, it has been rarely demonstrated how such hierarchical macroscale functional networks are actually organized in the human brain. In this study, to answer this question, we propose a novel methodology to provide an evidence of hierarchical organization of functional brain networks. This article introduces the hybrid spatiotemporal deep learning (HSDL), by jointly using deep belief networks (DBNs) and deep least absolute shrinkage and selection operator (LASSO) to reveal the temporal hierarchical features and spatial hierarchical maps of brain networks based on the Human Connectome Project 900 functional magnetic resonance imaging (fMRI) data sets. Briefly, the key idea of HSDL is to extract the weights between two adjacent layers of DBNs, which are then treated as the hierarchical dictionaries for deep LASSO to identify the corresponding hierarchical spatial maps. Our results demonstrate that both spatial and temporal aspects of dozens of functional networks exhibit multiscale properties that can be well characterized and interpreted based on existing computational tools and neuroscience knowledge. Our proposed novel hybrid deep model is used to provide the first insightful opportunity to reveal the potential hierarchical organization of time series and functional brain networks, using task-based fMRI signals of human brain.

Introduction

Task-based functional magnetic resonance imaging (tfMRI) has been widely used for the identification of functional brain networks (Bartels and Zeki, 2005; Beckmann and Smith, 2005; Biswal et al., 2010; Bullmore and Sporns, 2009; Duncan, 2010; Stam, 2014). Meanwhile, a variety of scientific research studies have suggested the hierarchical organization of human brain networks (Bassett et al., 2008; Biswal et al., 2010; Bullmore and Sporns, 2009; Castro et al., 2016; Gurovich et al., 2019; Kim et al., 2016; Sporns et al., 2004). It is widely believed that the architecture of brain networks is organized at different spatiotemporal scales from functional and structural perspectives (Bullmore and Sporns, 2009; Sporns et al., 2004; Stam, 2014). In the literatures, a variety of computational methods have been developed to map such brain networks, that is, via general linear model (GLM), graph theories, independent component analysis (ICA), and sparse dictionary learning (Andersen et al., 1999; Calhoun et al., 2001; Lee et al., 2011, 2016; Lv et al., 2015a,b; Mckeown and Sejnowski, 1998; Zhang et al., 2017, 2018, 2019). However, these methods are based on “shallow” methodology, which probably cannot satisfy the needs of modeling the possibly hierarchical organization and different scales of brain networks both in temporal and spatial domains (Esteva et al., 2019; Hannun et al., 2019; Hu et al., 2018; Huang et al., 2018; Jang et al., 2017; Topol, 2019; Zhang et al., 2018).

Fortunately, in the machine learning and deep learning fields, there have been significant improvements of algorithms and methodologies, that is, deep belief networks (DBNs) (Bengio et al., 2012; Li et al., 2019; Plis et al., 2014; Schmidhuber., 2015; Suk et al., 2014, 2016), that can provide us unparalleled opportunities to quantify the properties of complex functional magnetic resonance imaging (fMRI) data across individuals and cognitive states. Recently, deep learning such as DBN has been proven to be an efficient technique to learn and extract high-level and midlevel meaningful features from low-level raw data, and promising results of using deep learning for fMRI data modeling have been reported in the literature (Bengio et al., 2012; Hu et al., 2018; Huang et al., 2018; Plis et al., 2014; Schmidhuber, 2015; Suk et al., 2014, 2016; Zhang et al., 2018; Zhao et al., 2018). Thus, we are motivated to explore novel spatiotemporal deep learning models to potentially reveal and confirm the possible hierarchical organization of human brain networks.

Recently, it has been shown that the restricted Boltzmann machine (RBM) can be used to model fMRI time series signals and it can effectively reconstruct functional brain networks with impressive accuracy (Hu et al., 2018; Huang et al., 2018). However, prior RBM models for fMRI time series (Hu et al., 2018; Huang et al., 2018) are still “shallow” and did not incorporate the advantages of deeper neural networks, for example, extracting the hierarchical structures from the raw data. It is already known that the DBN model possesses superb capability of extracting hierarchical features (Li et al., 2019), however, the very high dimensionality of four-dimensional (4D) fMRI signals across hundreds of subjects, that is, dozens of millions of fMRI time series used in our studies, is still a difficult problem for effective learning of spatiotemporal brain networks and their functional dynamics.

Therefore, in this work, we use the DBN to extract the hierarchical temporal features in the first stage, and thus, we achieve a relative lower dimensionality at group-wise level. In the second stage, we leverage sparse representation algorithms that have already been proven as effective techniques to extract spatial brain networks in previous literature studies (Andersen et al., 1999; Calhoun et al., 2001; Lv et al., 2015a) to map spatial patterns of brain networks. Put together, we propose a novel framework named hybrid spatiotemporal deep learning (HSDL) to simultaneously infer the hierarchical temporal features and corresponding hierarchical spatial features of brain networks.

Specifically, we use the learned weights between two adjacent layers of DBN models as the hierarchical temporal dictionary for spatial least absolute shrinkage and selection operator (LASSO) regression from fMRI data. Since each LASSO model takes the temporal dictionaries at different scales to perform the spatial network regression, these regressed spatial maps of brain networks possess the property of hierarchical organization naturally, which is given the name deep LASSO here to reflect the corresponding hierarchical spatial features. The HSDL has been applied on the Human Connectome Project (HCP) 900 subjects' fMRI data set, and our extensive experimental results demonstrate that the characterized hierarchical organization of functional networks derived by our HSDL models is meaningful, consistent, and reproducible across HCP brains and across all HCP fMRI tasks we studied in this article.

Materials and Methods

Figure 1 summarizes the proposed computational framework of HSDL for discovery and characterization of hierarchical organization of temporal features and spatial patterns of functional brain networks. The HSDL consists of two main components to model tfMRI data hierarchically. At first, the DBN is used to extract hierarchical temporal features, equivalent to the weights between two adjacent DBN layers, and the deep LASSO aims to extract the corresponding hierarchical spatial features based on the hierarchical temporal dictionaries. Here, the spatially aggregated tfMRI data of multiple HCP subjects are used as input of the HSDL model, represented as $S_{0} ε ℛ^{t \times (p \times n)}$ , where t is the number of volumes in an fMRI time series, that is, t = 253 in the emotion task, n is the number of volumetric fMRI voxels in MNI 4 mm standard space developed by Montreal Neurological Institute, and p is the number of HCP subjects. In this study, the DBN model is composed of three layers of RBM.

FIG. 1.

Illustration of the proposed computational framework of HSDL. (a) Spatially aggregated fMRI time series of multiple HCP subjects are used as input to train a three-layer DBN model. The DBN model extracts hierarchical temporal features, that is, weights $ω_{1}$ , $ω_{2}$ , and $ω_{3}$ between two adjacent layers; (b) the weights between two adjacent layers of DBN are extracted and treated as hierarchical temporal features, which form time series patterns of functional network activities and they are afterward used as the hierarchical dictionaries for deep LASSO to perform spatial pattern regression; (b1–b3) presents three examples of hierarchical temporal features/dictionaries (green, yellow, and red, respectively) from layers #1, #2, and #3; meanwhile, S ₀, S ₁, and S ₂ are used as input matrix for deep LASSO to perform the spatial network map regression; (c) using hierarchical dictionaries. Deep LASSO model performs the spatial regression to identify the corresponding hierarchical coefficient matrices, for example, hierarchical spatial network maps; (c1–c3) describes the single level of deep LASSOs using the dictionaries from different scales; (d) hierarchical coefficient matrices/spatial networks are derived from deep LASSO; (d1–d3) three examples of hierarchical coefficient matrices/spatial networks are visualized from layers #1, #2, and #3, respectively, for example, those green, yellow, and red brain networks rendered on the cortical surfaces. DBN, deep belief network; fMRI, functional magnetic resonance imaging; HCP, Human Connectome Project; HSDL, hybrid spatiotemporal deep learning; LASSO, least absolute shrinkage and selection operator. Color images are available online.

Data acquisition and preprocessing

In this work, we adopt tfMRI data sets from the HCP (Barch et al., 2013; Van Essen et al., 2013) to test the proposed method. Specifically, we selected the emotion and language data sets, which are representative tfMRI data sets from the HCP 900 subjects' data release. These tfMRI data sets have been released publicly and include multiple modality MRI neuroimaging data sets (i.e., cortical structure, connectivity, and function), considered comprehensive tfMRI data sets to identify vital functional brain areas covering a large part of cerebral cortex (Barch et al., 2013; Van Essen et al., 2013). The fundamental information of the task paradigms can be referred by previous research reports (Barch et al., 2013; Van Essen et al., 2013).

The detailed acquisition parameters are shown as follows: 90 × 104 matrix, 220 mm FOV, 72 slices, TR = 0.72 s, TE = 33.1 ms, flip angle = 52°, BW = 2290 Hz/Px, in-plane FOV = 208 × 180 mm, 2.0 mm isotropic voxels. After obtaining the released preprocessed tfMRI data sets, we adopted the tfMRI data with minimal preprocessing pipelines (Lv et al., 2015a,b). This pipeline contains the steps of spatial artifact clearness, distortion removal, and cortical surface generation. After that, different subjects are aligned to the standard MNI space (Glasser et al., 2013; Lv et al., 2015a,b).

DBN for hierarchical temporal feature mapping

The RBM is a probabilistic energy-based model that describes a probability distribution over a set of visible random variables to the observed data (Li et al., 2019). An RBM model consists of two layers, that is, the visible layer binding with input and the hidden layer representing latent factors. The units in the two layers are connected by the weights (Fig. 1a). There is no within-layer connection. Inputs are modeled by RBMs via latent factors expressed through the interaction between hidden and visible variables (Hu et al., 2018; Huang et al., 2018; Zhang et al., 2019). The energy function is used to update the weights as Equation (1), where v_i and h_j are binary states of two layers; b_i and b_j are the bias, and $w_{i j}$ is the weight between these two layers. $E (v, h) = \sum b_{i} v_{i} - \sum b_{j} h_{j} - \sum v_{j} h_{j} w_{i j}$ (1)

It has been reported in the literature (Schmidhuber, 2015) that the RBM exhibited remarkable performance in representing fMRI data and reconstructing functional brain networks. Experimental comparison results (Hu et al., 2018) also demonstrated the superiority of RBM over ICA in identifying task-related networks. However, the work (Hu et al., 2018) only used one layer of RBM and it is not really the deep learning model yet. That is, the full potential and powerfulness of DBN model have not been explored, which motivated us to investigate the DBN model in this article.

DBN can be viewed as a composition of unsupervised RBM networks where each subnetwork's hidden layer serves as the visible layer for the next, as already illustrated in Figure 1a. The theoretical background of DBN/RBM is based on the Markov random field (Hinton et al., 2006). The Markov convergence theorem can ensure that the update process for each neuron of DBN/RBM is approximate to a fixed point, although the initial situation is random (Hinton, 2002; Hinton et al., 2006). However, it is still difficult to optimize the weights in nonlinear models that have multiple hidden layers. Updating the fully connected two layers is a nondeterministic polynomial time (NP) hard problem, that is, if input layer has x neurons and output layer has y neurons, the total update process requires the time complexity of $O (e^{x + y})$ . Thus, the Gibbs sampling and gradient descend method are typically applied to approximate the original distribution, for example, the contrastive divergence can be used to implement the update process of DBN/RBM (Fischer and Igel, 2011, 2014; Hinton and Salakhutdinov, 2006; Hinton et al., 2012; Hjelm et al., 2014; LeCun et al., 2015). Since DBN requires a relatively large number of training samples, we aggregated group-wise tfMRI signals, that is, we used a subset of language tfMRI data that contains 2,916,000 samples from 32 randomly selected subjects of HCP 900 subjects' release. Now, a difficult problem is to design a compatible deep structure for HSDL. To reasonably decide the key parameters in DBN, that is, the number of neurons in each layer, we used a technique called rank estimation. The rank estimation technique (Wen et al., 2012; Zhang et al., 2017) is used in this article to estimate the possible optimal number of neurons in DBN layers, which is equivalent to dictionary size in sparse coding since we consider the learned weights as the input dictionary of LASSO regression model (Lee et al., 2011, 2016; Lv et al., 2015a,b). In general, a relative accurate estimation is to derive the rank of each individual fMRI signal matrix: $S^{i} \in ℛ^{t \times n}, i = 1, 2, \dots, 32$ . This rank estimator utilizes a technique of rank-revealing by using the orthogonal decomposition (QR factorization) (Wen et al., 2012). At the beginning, r* is denoted as the initial estimated rank of Sⁱ and we denote r as the optimal rank estimation of input matrix Sⁱ . If r* $\geq$ r holds, the detection of the diagonal line of the upper triangular matrix in the QR factorization can be performed using the input matrix Sⁱ . If we can determine the ideal size of QR factorization using Sⁱ in the work with permutation matrix E, the diagonal matrix R is nonincreasing in magnitude (Wen et al., 2012). The QR factorization and rank-revealing will eventually provide a reasonable solution using a proper thresholding value introduced in Equations (2) and (3) (Wen et al., 2012). By detecting the diagonal line of matrix R, we compute two vectors $d \in ℛ^{r}$ and $r \in ℛ^{r - 1}$ : $d_{i} = |R_{i i}|$ $r_{i} = \frac{d_{i}}{d_{i + 1}}$ (2)

and then examine the value: $μ = \frac{(m - 1) r (p)}{\sum_{i \neq p} r_{i}}$ (3)

where r(p) is the maximum element of the vector r (with the largest index p if the maximum value is not unique). In our current implementation, we reset the rank estimated r top once $μ > 2$ , and this adjustment can be successfully done only once (Wen et al., 2012). Using the technique of low rank estimation, we set the number of neurons for all hidden DBN layers as 100, as visualized and justified in Figure 2.

FIG. 2.

This figure shows the estimated rank (vertical axis) for fMRI data sets of 32 HCP subjects (horizontal axis) in our empirical experiments. Based on observation, most subjects' rank is equivalent or approximate to 100 (please see the dashed line). To simplify, we set 100 as the number of neurons for all hidden layers of the DBN model, which is also treated as the dictionary size for deep LASSO in the next step of spatial network regression.

After the DBN is trained with millions of fMRI signals aggregated from 32 HCP subjects, weights between two adjacent layers of DBNs are then extracted and treated as hierarchical temporal features. Essentially, these temporal features, for example, those colored curves shown in Figure 1b, form time series patterns of functional brain network activities. The hierarchical properties of these learned time series are visualized, analyzed, and interpreted in the Results and Discussion section.

Moreover, we performed parameter tuning to provide reasonable parameters such as sparse trade-off, number of nodes and layers, based on the experience in previous works (Liu et al., 2010; Wen et al., 2012; Zhang et al., 2017, 2018, 2019).

Deep LASSO for hierarchical spatial feature mapping

As illustrated in Figure 1d, a deep LASSO model is used to map the corresponding hierarchical spatial features based on the temporal features, for example, hierarchical dictionaries learned in each layer of DBN in the Data Acquisition and Preprocessing section. In general, the deep LASSO method can be described as decomposing input group-wise fMRI signal matrix $S_{k - 1}, k = 1, 2, 3$ , based on coefficient matrix $α^{i}, i = 1, 2, \dots, 32$ , and group-wise normalized $ω_{k}$ (e.g., treated as hierarchical dictionary for each layer k, k = 1, 2, or 3 in this article). If we utilize normalized weights from the DBN layers to replace the D_k , the minimization function of $k^{t h}$ layer is formulated as follows: $D_{k} \leftarrow N o r m (ω_{k})$ $f_{k} (D_{k}, α_{k}^{i}) = \sum_{i = 1}^{32} \frac{1}{2} S_{k - 1}^{i} - D_{k} α_{k F}^{i 2} + λ α_{k 1}^{i}$ (4)

where $ω_{k}$ and $α_{k}^{i}$ are the group-wise hierarchical weights/dictionaries and coefficient matrix (spatial maps), respectively, as shown in Figure 1 (d1–d3); $S_{k - 1}^{i}$ represents a i-th subject in the layer k-1, that is, $S_{2}^{i}$ represents the i-th sample of layer #2, $i = 1, 2, \dots, 32$ . Here, $N o r m$ is denoted as a normalization operator (Lv et al., 2015a,b). Since the multiplication of weight matrices can cause the larger scale of deeper dictionaries, the normalization procedure can ensure reasonable performance of deep LASSO in the same scale. After this deep LASSO decomposition procedure, the coefficient matrix $α_{i}$ will be mapped onto the brain image space for spatial pattern visualization and interpretation, such as those colored regions rendered on cortical surfaces in Figure 1d.

Results and Discussion

In general, we visualize, analyze, and interpret the identified hierarchical spatial and temporal patterns of brain networks both qualitatively and quantitatively in this section. Also, we compare these spatiotemporal patterns with temporal task paradigms (i.e., task designs) and spatial GLM-derived brain network maps.

All details including all 32 subjects' original task designs and corresponding identified temporal features of emotion task can be viewed by the link below:

http://hafni.cs.uga.edu/Combinational_tDBN_DeepSSDL/Deep_Combinational_Learning_Strategy_TimeSeriesEMOTION__presentation.html

Also, we provide a series of examples to compare with the original task design curves, including all 32 subjects' original task designs and the corresponding identified temporal features of language task can be viewed by the link below:

http://hafni.cs.uga.edu/Combinational_tDBN_DeepSSDL/Deep_Combinational_Learning_Strategy_TimeSeriesLANGUAGE__presentation.html

All representative slices and details of identified spatial networks of emotion task can be viewed by the link below:

http://hafni.cs.uga.edu/Combinational_tDBN_DeepSSDL/Deep_Combinational_Learning_Strategy_SpatialMapsEMOTION__presentation.html

The similar spatial results of language task of 32 subjects' individual spatial maps can be viewed by the link below:

http://hafni.cs.uga.edu/Combinational_tDBN_DeepSSDL/Deep_Combinational_Learning_Strategy_SpatialMapsLANGUAGE__presentation.html

Interpretation of hierarchical temporal and spatial features

In the following paragraphs, we present the hierarchical temporal and spatial features by taking the HCP emotion and language tfMRI data sets as examples. The results from two randomly selected HCP subjects are examined in the following figures. For each task event, we use the Pearson correlation coefficient (PCC) (Jiang et al., 2018; Lv et al., 2015a,b; Zhang et al., 2017) between the task paradigm curve and the learned temporal features as a metric to identify the temporal features related to the paradigm (Jiang et al., 2018; Lv et al., 2015a,b; Zhang et al., 2017). Figure 3e–g shows the identified temporal features (color curves, green, yellow, and red are for DBN layers #1, #2, and #3, respectively) and the task paradigms (black lines) in the emotion task for an exemplar HCP subject. Figure 3a–c shows the corresponding spatial features identified in each layer and Figure 3d is the GLM-derived brain network map for the same subject. Figure 3h provides a quantitative comparison of the similarities between the identified temporal features and the task paradigm curve (PCC), as well as the similarities between the identified spatial features and GLM-derived networks (spatial overlap).

FIG. 3.

Identified hierarchical temporal and spatial features in HCP emotion task for one exemplar subject. (a–c) The identified spatial features in different layers. (d) The GLM-derived spatial map. (e–g) The identified temporal features (color lines) in different layers and the task paradigms (black lines). (h) The quantitative comparisons of spatial and temporal similarities of identified features with task paradigms are provided. GLM, general linear model. Color images are available online.

These experimental results demonstrate that there are two major differences between temporal features in different layers. (1) In the deeper layer, the learned temporal features are smoother and better correlated to the task paradigm curves. The highest correlation is observed in DBN layer #3. (2) Distinct frequency variances exist between layers. For example, the time series frequency changes significantly between layers #1 and #3, suggesting that lower DBN layers represent higher frequency patterns, while deeper DBN layers represent lower frequency patterns. This result agrees with the theoretic properties of DBN (Hu et al., 2018; Huang et al., 2018; Li et al., 2018; Zhang et al., 2018). It is also observed that the spatial features identified in deeper layers have a stronger similarity with the traditional GLM-derived network maps, and the similarities are gradually increased by layers.

In general, we provide the qualitative and quantitative validations of two examples of emotion task in Figures 3, 4, and 7 and the similar results of language task in Figures 5, 6, and 8, respectively. Specifically, for Figures 3 –6, we compare the identified hierarchical spatiotemporal features with the corresponding task paradigms and task-evoked functional brain networks (i.e., COPE) in a single figure. For Figures 7 and 8, we consider the qualitative comparison of identified time series and the original corresponding task paradigms from two different tasks, including emotion and language. In the following figures, due to the presentation of our identified spatiotemporal features via two different fMRI task data sets, respectively, in detail, the quantitative measurements of those temporal and spatial similarities are provided in the bottom right subfigures in Figures 3 and 4. Moreover, these interesting observations are quite consistent and reproducible in all HCP subjects we studied. Figures 3 and 4 illustrate exemplar subjects from emotion task, and Figures 5 and 6 illustrate exemplar subjects from the language task.

FIG. 4.

Identified hierarchical temporal and spatial features in HCP emotion task for another exemplar subject. (a–c) The identified spatial features in different layers. (d) The GLM-derived spatial map. (e–g) The identified temporal features (color lines) in different layers and the task paradigms (black lines). (h) The quantitative comparisons of spatial and temporal similarities of identified features with task paradigms are provided. GLM, general linear model. Color images are available online.

FIG. 5.

Identified hierarchical temporal and spatial features in HCP language task for an exemplar subject. (a–c) The identified spatial features in different layers. (d) The GLM-derived spatial map. (e–g) The identified temporal features (color lines) in different layers and the task paradigms (black lines). (h) The quantitative comparisons of spatial and temporal similarities of identified features with task paradigms are provided. GLM, general linear model. Color images are available online.

FIG. 6.

Identified hierarchical temporal and spatial features in HCP language task for another exemplar subject. (a–c) The identified spatial features in different layers. (d) The GLM-derived spatial map. (e–g) The identified temporal features (color lines) in different layers and the task paradigms (black lines). (h) The quantitative comparisons of spatial and temporal similarities of identified features with task paradigms are provided. GLM, general linear model. Color images are available online.

FIG. 7.

Group-wise temporal time series patterns compared with original task paradigms in emotion task.

FIG. 8.

Group-wise temporal time series patterns compared with original task paradigms in language task.

In addition to the results for individual HCP subject, Figures 7 and 8 show two examples of the group-wise hierarchical temporal features related to different task paradigms in HCP emotion and language tasks, respectively. Based on these qualitative analyses, it is clear that the identified temporal features in deeper layers are more similar to the original task paradigm curves, as shown in the black curves. In layer #1, all the identified temporal features are noisier and with higher frequencies. In layer #2, the identified temporal features are substantially more similar to the original task paradigms. In the last layer #3, the identified temporal feature patterns are almost matched to the original task paradigms.

Also, Figure 9 shows the group-wise hierarchical spatial features related to different COPEs in HCP emotion tasks. By visual inspection, the spatial networks in deeper layers are more similar to GLM-derived network maps. In layer #1, all the identified spatial networks are quite noisy, and they only have a small portion overlapped with GLM-derived network maps. In layer #2, the identified spatial networks moderately overlap with GLM-derived spatial maps (e.g., spatial maps #1, #2, #4, and #5 in layer #2 in Figure 7; spatial maps #1, #2, #3, #4, and #5 in layer #2 in Fig. 8). More results of language task can be viewed in our released web pages.

FIG. 9.

Group-wise spatial networks related to the six COPEs in HCP emotion task. Color images are available online.

In layer #3, the identified spatial networks are largely similar to the original GLM-derived brain network maps. These observations and results are consistent and reproducible in all HCP subjects we studied. Thus, our results suggest the hierarchical organization of spatiotemporal functional brain networks in human brains is essentially enabled by our effective HSDL models. Notably, both the revealed spatial and temporal patterns of functional networks in Figures 3–10 demonstrated that lower DBN layers represent higher frequency features, while deeper DBN layers represent lower frequency features. In addition, our results demonstrate the effectiveness of using DBN (Bengio et al., 2012; Zhao et al., 2015) and deep LASSO for HSDL of 4D fMRI data.

FIG. 10.

A quantitative assessment of the identified temporal and spatial features in different layers. (a, c) Show the Pearson correlation coefficients between the identified temporal features and the task paradigms for each task event (x-axis) and each subject (y-axis) in different layers (from left to right). (b, d) Show the Hausdorff spatial similarities (Zhang et al., 2017) between the identified spatial features and GLM-derived brain maps. Color images are available online.

Statistical analysis of hierarchical temporal and spatial features

An overall quantitative assessment of those identified hierarchical temporal and spatial features is shown in Figure 10. Figure 10a shows the PCCs between the task paradigms and the corresponding temporal features for all studied subjects in each layer (from left to right) in the emotion task, where x-axis is the index of task events (e.g., HCP COPEs) and y-axis is the index of studied HCP subjects in each subfigure. Figure 10b shows the spatial similarity measured by the Hausdorff metric (Zhang et al., 2017) in the emotion task. Figure 10c and d shows the results for the language task. In brief, the correlation between the temporal features and the task paradigms increases with deeper layers, and so does the spatial pattern similarity. Again, these quantitative analyses further demonstrate that the identified temporal and spatial features correspond well with known meaningful features, that is, task paradigm or GLM-derived brain maps. Two-sample t-tests show that the deeper layers have significantly higher temporal and spatial similarities with the benchmark patterns, compared with lower layers (please see p-values in Table 1). As discussed before, these analyses demonstrate the consistent reproducibility of identified spatiotemporal features through all individuals involved in this validation.

Table 1.

The p Values in Two-Sample t-Tests That Compare Temporal and Spatial Similarities in Higher and Lower Layers

	p Value of spatial similarity	p Value of temporal similarity
Layer 1 vs. layer 2	0.04	0.03
Layer 2 vs. layer 3	0.01	0.01

Conclusion

In this article, we proposed a novel computational framework named HSDL that integrates DBN and deep LASSO to extract hierarchical temporal and spatial features in tfMRI data. The key idea of HSDL is to use DBNs to extract hierarchical temporal features, considered the hierarchical dictionaries for the next step of deep LASSO that performs spatial regression.

The contributions of the proposed computational framework in this article are considered threefold. (1) This method can be considered the early attempt to reveal the potential existence of hierarchical spatiotemporal features via tfMRI signals. (2) The validation of our methods demonstrates that there are some robust properties of hierarchical spatiotemporal features, such as the identified three layers of features both from emotion and language tasks, and the frequency and correlations are similarly varied through two different task data sets. (3) These hierarchical features were used to explore and interpret the hierarchical structures of the human brain networks.

Although current identified hierarchical spatiotemporal organizations of human brain probably represent the neural activity at different levels, more further research works need to be conducted. In conclusion, our study not only provides novel evidence to the existence of hierarchical macroscale functional networks but also opens a new venue for exploring cognitive and clinical human neuroscience problems from a unique perspective of hierarchical organization of brain functions in the future.

Footnotes

Author Disclosure Statement

No competing financial interests exist.

Funding Information

T.L. was partially supported by the National Institutes of Health (DA033393, AG042599) and the National Science Foundation (IIS-1149260, CBET-1302089, BCS-1439051, and DBI-1564736).

References

Andersen

, Gash

, Avison

. 1999. Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. Magn Reson Imaging, 17:795–815.

Barch

, Burgess

, Harms

, Petersen

, Schlaggar

, Corbetta

, et al. 2013. Function in the human connectome: task-fMRI and individual differences in behavior. NeuroImage, 80:169–189.

Bartels

, Zeki

. 2005. Brain dynamics during natural viewing conditions—a new guide for mapping connectivity in vivo. Neuroimage, 24:339–349.

Bassett

, Bullmore

, Verchinski

, Mattay

, Weinberger

, Meyer-Lindenberg

. 2008. Hierarchical organization of human cortical networks in health and schizophrenia. J Neurosci, 28:9239–9248.

Beckmann

, Smith

. 2005. Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage, 25:294–311.

Bengio

, Courville

, Vincent

. 2012. Unsupervised feature learning and deep learning: a review and new perspectives. CoRR, abs/1206.5538, 1.

Biswal

, Maarten

, Xi-Nian

, Suril

, Clare

, Smith

, et al. 2010. Toward discovery science of human brain function. Proc Natl Acad Sci, 107:4734–4739.

Bullmore

, Sporns

. 2009. Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci, 10:186–198.

Calhoun

, Adali

, Pearlson

, Pekar

. 2001. A method for making group inferences from functional MRI data using independent component analysis. Hum Brain Mapp, 14:140–151.

10.

Castro

, Hjelm

, Plis

, Dihn

, Turner

, Calhoun

. 2016. Deep independence network analysis of structural brain imaging: application to schizophrenia. IEEE Trans Med Imaging, 35:1729–1740.

11.

Duncan

2010. The multiple-demand (MD) system of the primate brain: mental programs for intelligent behavior. Trends Cogn Sci, 14:172–179.

12.

Esteva

, Robicquet

, Ramsundar

, Kuleshov

, DePristo

, Chou

, et al. 2019. A guide to deep learning in healthcare. Nat Med, 25:24–29.

13.

Fischer

, Igel

2011. Bounding the bias of contrastive divergence learning. Neural Comput, 23:664–673.

14.

Fischer

, Igel

2014. Training restricted Boltzmann machines: an introduction. Pattern Recognit, 47:25–39.

15.

Glasser

, Sotiropoulos

, Wilson

, Coalson

, Fischl

, Andersson

, et al. 2013. The minimal preprocessing pipelines for the human connectome project. Neuroimage, 80:105–124.

16.

Gurovich

, Hanani

, Bar

, Nadav

, Fleischer

, Gelbman

, et al. 2019. Identifying facial phenotypes of genetic disorders using deep learning. Nat Med, 25:60.

17.

Hannun

, Rajpurkar

, Haghpanahi

, Tison

, Bourn

, Turakhia

, Ng

. 2019. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat Med, 25:65.

18.

Hinton

, Deng

, Yu

, Dahl

, Mohamed

A-R

, Jaitly

, et al. 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag, 29:82–97.

19.

Hinton

GE.

2002. Training products of experts by minimizing contrastive divergence. Neural Comput, 14:1771–1800.

20.

Hinton

, Osindero

, Teh

Y-W

. 2006. A fast learning algorithm for deep belief nets. Neural Comput, 18:1527–1554.

21.

Hinton

, Salakhutdinov

. 2006. Reducing the dimensionality of data with neural networks. Science, 313:504–507.

22.

Hjelm

, Calhoun

, Salakhutdinov

, Allen

, Adali

, Plis

. 2014. Restricted Boltzmann machines for neuroimaging: an application in identifying intrinsic networks. NeuroImage, 96:245–260.

23.

, Huang

, Peng

, Han

, Liu

, Lv

, et al. 2018. Latent source mining in FMRI via restricted Boltzmann machine. Hum Brain Mapp, 39:2368–2380.

24.

Huang

, Hu

, Zhao

, Makkie

, Dong

, Zhao

, et al. 2018. Modeling task fMRI data via deep convolutional autoencoder. IEEE Trans Med Imaging, 37:1551–1561.

25.

Jang

, Plis

, Calhoun

, Lee

. 2017. Task-specific feature extraction and classification of fMRI volumes using a deep neural network initialized with a deep belief network: evaluation using sensorimotor tasks. Neuroimage, 145:314–328.

26.

Jiang

, Zhao

, Liu

, Guo

, Kendrick

, Liu

. 2018. A cortical folding pattern-guided model of intrinsic functional brain networks in emotion processing. Front Neurosci, 12:575.

27.

Kim

, Calhoun

, Shim

, Lee

J-H

. 2016. Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: evidence from whole-brain resting-state functional connectivity patterns of schizophrenia. Neuroimage, 124:127–146.

28.

LeCun

, Bengio

, Hinton

. 2015. Deep learning. Nature, 521:436–444.

29.

Lee

, Tak

, Ye

. 2011. A data-driven sparse GLM for fMRI analysis using sparse dictionary learning with MDL criterion. IEEE Trans Med Imaging, 30:1076–1089.

30.

Lee

Y-B

, Lee

, Tak

, Lee

, Na

, Seo

, et al. 2016. Sparse SPM: group Sparse-dictionary learning in SPM framework for resting-state functional connectivity MRI analysis. Neuroimage, 125:1032–1045.

31.

, Dong

, Ge

, Qiang

, Zhao

, Wang

, et al. 2019. Simultaneous spatial-temporal decomposition of connectome-scale brain networks by deep sparse recurrent auto-encoders. In: Chuang ACS, Gee JC, Yushkevich PA, Bao S (eds.) International Conference on Information Processing in Medical Imaging (IPMI). Cham: Springer; pp. 579–591.

32.

Liu

, Yuan

, Ye

. 2010. An efficient algorithm for a class of fused lasso problems. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY: ACM; pp. 323–332.

33.

, Jiang

, Li

, Zhu

, Chen

, Zhang

, et al. 2015a. Sparse representation of whole-brain fMRI signals for identification of functional networks. Med Image Anal, 20:112–134.

34.

, Jiang

, Li

, Zhu

, Zhang

, Zhao

, et al. 2015b. Holistic atlases of functional networks and interactions reveal reciprocal organizational architecture of cortical function. IEEE Trans Biomed Eng, 62:1120–1131.

35.

Mckeown

, Sejnowski

. 1998. Independent component analysis of fMRI data: examining the assumptions. Hum Brain Mapp, 6:368–372.

36.

Plis

, Hjelm

, Salakhutdinov

, Allen

, Bockholt

, Long

, et al. 2014. Deep learning for neuroimaging: a validation study. Front Neurosci, 8:229.

37.

Schmidhuber

2015. Deep learning in neural networks: an overview. Neural Netw, 61:85–117.

38.

Sporns

, Chialvo

, Kaiser

, Hilgetag

. 2004. Organization, development and function of complex brain networks. Trends Cogn Sci, 8:418–425.

39.

Stam

CJ.

2014. Modern network science of neurological disorders. Nat Rev Neurosci, 15:683.

40.

Suk

H-I

, Lee

S-W

, Shen

, Initiative

ASDN

. 2014. Hierarchical feature representation and multimodal fusion with deep learning for AD/MCI diagnosis. Neuroimage, 101:569–582.

41.

Suk

H-I

, Wee

C-Y

, Lee

S-W

, Shen

. 2016. State-space model with deep learning for functional dynamics estimation in resting-state fMRI. Neuroimage, 129:292–307.

42.

Topol

EJ.

2019. High-performance medicine: the convergence of human and artificial intelligence. Nat Med, 25:44.

43.

Van Essen

, Smith

, Barch

, Behrens

, Yacoub

, Ugurbil K; WU-Minn HCP

Consortium

. 2013. The WU-Minn human connectome project: an overview. Neuroimage, 80:62–79.

44.

Wen

, Yin

, Zhang

. 2012. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math Program Comput, 4:333–361.

45.

Zhao

, Han

, Hu

, Jiang

, Lv

, Zhang

, et al. 2018. Extendable supervised dictionary learning for exploring diverse and concurrent brain activities in task-based fMRI. Brain Imaging Behav, 12:743–757.

46.

Zhao

, Han

, Lv

, Jiang

, Hu

, Zhao

, et al. 2015. Supervised dictionary learning for inferring concurrent brain networks. IEEE Trans Med Imaging, 34:2036–2045.

47.

Zhang

, Jiang

, Zhang

, Chen

, Zhao

, et al. 2019. Joint representation of connectome-scale structural and functional profiles for identification of consistent cortical landmarks in macaque brain. Brain Imaging Behav, 13:1427–1443.

48.

Zhang

, Jiang

, Zhang

, Howell

, Zhao

, Zhang

, et al. 2017. Connectome-scale functional intrinsic connectivity networks in macaques. Neuroscience, 364:1–14.

49.

Zhang

, Lv

, Li

, Zhu

, Jiang

, Zhang

, et al. 2019. Experimental comparisons of sparse dictionary learning and independent component analysis for brain network inference from fMRI data. IEEE Trans Biomed Eng, 66:289–299.

50.

Zhang

, Lv

, Zhang

, Zhao

, Liu

. 2018. Modeling resting state fMRI data via longitudinal supervised stochastic coordinate coding. In: 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI). Piscataway, NJ: IEEE; pp. 127–131.

51.

Zhang

, Zhao

, Li

, Zhao

, Dong

, Jiang

, et al. 2019. Identify hierarchical structures from task-based fMRI data via hybrid spatiotemporal neural architecture search net. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI). Cham: Springer; pp. 745–753.