Abstract
Hierarchical organization of brain function has been an established concept in the neuroscience field for a long time, however, it has been rarely demonstrated how such hierarchical macroscale functional networks are actually organized in the human brain. In this study, to answer this question, we propose a novel methodology to provide an evidence of hierarchical organization of functional brain networks. This article introduces the hybrid spatiotemporal deep learning (HSDL), by jointly using deep belief networks (DBNs) and deep least absolute shrinkage and selection operator (LASSO) to reveal the temporal hierarchical features and spatial hierarchical maps of brain networks based on the Human Connectome Project 900 functional magnetic resonance imaging (fMRI) data sets. Briefly, the key idea of HSDL is to extract the weights between two adjacent layers of DBNs, which are then treated as the hierarchical dictionaries for deep LASSO to identify the corresponding hierarchical spatial maps. Our results demonstrate that both spatial and temporal aspects of dozens of functional networks exhibit multiscale properties that can be well characterized and interpreted based on existing computational tools and neuroscience knowledge. Our proposed novel hybrid deep model is used to provide the first insightful opportunity to reveal the potential hierarchical organization of time series and functional brain networks, using task-based fMRI signals of human brain.
Introduction
Task-based functional magnetic resonance imaging (tfMRI) has been widely used for the identification of functional brain networks (Bartels and Zeki, 2005; Beckmann and Smith, 2005; Biswal et al., 2010; Bullmore and Sporns, 2009; Duncan, 2010; Stam, 2014). Meanwhile, a variety of scientific research studies have suggested the hierarchical organization of human brain networks (Bassett et al., 2008; Biswal et al., 2010; Bullmore and Sporns, 2009; Castro et al., 2016; Gurovich et al., 2019; Kim et al., 2016; Sporns et al., 2004). It is widely believed that the architecture of brain networks is organized at different spatiotemporal scales from functional and structural perspectives (Bullmore and Sporns, 2009; Sporns et al., 2004; Stam, 2014). In the literatures, a variety of computational methods have been developed to map such brain networks, that is, via general linear model (GLM), graph theories, independent component analysis (ICA), and sparse dictionary learning (Andersen et al., 1999; Calhoun et al., 2001; Lee et al., 2011, 2016; Lv et al., 2015a,b; Mckeown and Sejnowski, 1998; Zhang et al., 2017, 2018, 2019). However, these methods are based on “shallow” methodology, which probably cannot satisfy the needs of modeling the possibly hierarchical organization and different scales of brain networks both in temporal and spatial domains (Esteva et al., 2019; Hannun et al., 2019; Hu et al., 2018; Huang et al., 2018; Jang et al., 2017; Topol, 2019; Zhang et al., 2018).
Fortunately, in the machine learning and deep learning fields, there have been significant improvements of algorithms and methodologies, that is, deep belief networks (DBNs) (Bengio et al., 2012; Li et al., 2019; Plis et al., 2014; Schmidhuber., 2015; Suk et al., 2014, 2016), that can provide us unparalleled opportunities to quantify the properties of complex functional magnetic resonance imaging (fMRI) data across individuals and cognitive states. Recently, deep learning such as DBN has been proven to be an efficient technique to learn and extract high-level and midlevel meaningful features from low-level raw data, and promising results of using deep learning for fMRI data modeling have been reported in the literature (Bengio et al., 2012; Hu et al., 2018; Huang et al., 2018; Plis et al., 2014; Schmidhuber, 2015; Suk et al., 2014, 2016; Zhang et al., 2018; Zhao et al., 2018). Thus, we are motivated to explore novel spatiotemporal deep learning models to potentially reveal and confirm the possible hierarchical organization of human brain networks.
Recently, it has been shown that the restricted Boltzmann machine (RBM) can be used to model fMRI time series signals and it can effectively reconstruct functional brain networks with impressive accuracy (Hu et al., 2018; Huang et al., 2018). However, prior RBM models for fMRI time series (Hu et al., 2018; Huang et al., 2018) are still “shallow” and did not incorporate the advantages of deeper neural networks, for example, extracting the hierarchical structures from the raw data. It is already known that the DBN model possesses superb capability of extracting hierarchical features (Li et al., 2019), however, the very high dimensionality of four-dimensional (4D) fMRI signals across hundreds of subjects, that is, dozens of millions of fMRI time series used in our studies, is still a difficult problem for effective learning of spatiotemporal brain networks and their functional dynamics.
Therefore, in this work, we use the DBN to extract the hierarchical temporal features in the first stage, and thus, we achieve a relative lower dimensionality at group-wise level. In the second stage, we leverage sparse representation algorithms that have already been proven as effective techniques to extract spatial brain networks in previous literature studies (Andersen et al., 1999; Calhoun et al., 2001; Lv et al., 2015a) to map spatial patterns of brain networks. Put together, we propose a novel framework named hybrid spatiotemporal deep learning (HSDL) to simultaneously infer the hierarchical temporal features and corresponding hierarchical spatial features of brain networks.
Specifically, we use the learned weights between two adjacent layers of DBN models as the hierarchical temporal dictionary for spatial least absolute shrinkage and selection operator (LASSO) regression from fMRI data. Since each LASSO model takes the temporal dictionaries at different scales to perform the spatial network regression, these regressed spatial maps of brain networks possess the property of hierarchical organization naturally, which is given the name deep LASSO here to reflect the corresponding hierarchical spatial features. The HSDL has been applied on the Human Connectome Project (HCP) 900 subjects' fMRI data set, and our extensive experimental results demonstrate that the characterized hierarchical organization of functional networks derived by our HSDL models is meaningful, consistent, and reproducible across HCP brains and across all HCP fMRI tasks we studied in this article.
Materials and Methods
Figure 1 summarizes the proposed computational framework of HSDL for discovery and characterization of hierarchical organization of temporal features and spatial patterns of functional brain networks. The HSDL consists of two main components to model tfMRI data hierarchically. At first, the DBN is used to extract hierarchical temporal features, equivalent to the weights between two adjacent DBN layers, and the deep LASSO aims to extract the corresponding hierarchical spatial features based on the hierarchical temporal dictionaries. Here, the spatially aggregated tfMRI data of multiple HCP subjects are used as input of the HSDL model, represented as

Illustration of the proposed computational framework of HSDL.
Data acquisition and preprocessing
In this work, we adopt tfMRI data sets from the HCP (Barch et al., 2013; Van Essen et al., 2013) to test the proposed method. Specifically, we selected the emotion and language data sets, which are representative tfMRI data sets from the HCP 900 subjects' data release. These tfMRI data sets have been released publicly and include multiple modality MRI neuroimaging data sets (i.e., cortical structure, connectivity, and function), considered comprehensive tfMRI data sets to identify vital functional brain areas covering a large part of cerebral cortex (Barch et al., 2013; Van Essen et al., 2013). The fundamental information of the task paradigms can be referred by previous research reports (Barch et al., 2013; Van Essen et al., 2013).
The detailed acquisition parameters are shown as follows: 90 × 104 matrix, 220 mm FOV, 72 slices, TR = 0.72 s, TE = 33.1 ms, flip angle = 52°, BW = 2290 Hz/Px, in-plane FOV = 208 × 180 mm, 2.0 mm isotropic voxels. After obtaining the released preprocessed tfMRI data sets, we adopted the tfMRI data with minimal preprocessing pipelines (Lv et al., 2015a,b). This pipeline contains the steps of spatial artifact clearness, distortion removal, and cortical surface generation. After that, different subjects are aligned to the standard MNI space (Glasser et al., 2013; Lv et al., 2015a,b).
DBN for hierarchical temporal feature mapping
The RBM is a probabilistic energy-based model that describes a probability distribution over a set of visible random variables to the observed data (Li et al., 2019). An RBM model consists of two layers, that is, the visible layer binding with input and the hidden layer representing latent factors. The units in the two layers are connected by the weights (Fig. 1a). There is no within-layer connection. Inputs are modeled by RBMs via latent factors expressed through the interaction between hidden and visible variables (Hu et al., 2018; Huang et al., 2018; Zhang et al., 2019). The energy function is used to update the weights as Equation (1), where vi
and hj
are binary states of two layers; bi
and bj
are the bias, and
It has been reported in the literature (Schmidhuber, 2015) that the RBM exhibited remarkable performance in representing fMRI data and reconstructing functional brain networks. Experimental comparison results (Hu et al., 2018) also demonstrated the superiority of RBM over ICA in identifying task-related networks. However, the work (Hu et al., 2018) only used one layer of RBM and it is not really the deep learning model yet. That is, the full potential and powerfulness of DBN model have not been explored, which motivated us to investigate the DBN model in this article.
DBN can be viewed as a composition of unsupervised RBM networks where each subnetwork's hidden layer serves as the visible layer for the next, as already illustrated in Figure 1a. The theoretical background of DBN/RBM is based on the Markov random field (Hinton et al., 2006). The Markov convergence theorem can ensure that the update process for each neuron of DBN/RBM is approximate to a fixed point, although the initial situation is random (Hinton, 2002; Hinton et al., 2006). However, it is still difficult to optimize the weights in nonlinear models that have multiple hidden layers. Updating the fully connected two layers is a nondeterministic polynomial time (NP) hard problem, that is, if input layer has x neurons and output layer has y neurons, the total update process requires the time complexity of
and then examine the value:
where r(p) is the maximum element of the vector r (with the largest index p if the maximum value is not unique). In our current implementation, we reset the rank estimated r top once

This figure shows the estimated rank (vertical axis) for fMRI data sets of 32 HCP subjects (horizontal axis) in our empirical experiments. Based on observation, most subjects' rank is equivalent or approximate to 100 (please see the dashed line). To simplify, we set 100 as the number of neurons for all hidden layers of the DBN model, which is also treated as the dictionary size for deep LASSO in the next step of spatial network regression.
After the DBN is trained with millions of fMRI signals aggregated from 32 HCP subjects, weights between two adjacent layers of DBNs are then extracted and treated as hierarchical temporal features. Essentially, these temporal features, for example, those colored curves shown in Figure 1b, form time series patterns of functional brain network activities. The hierarchical properties of these learned time series are visualized, analyzed, and interpreted in the Results and Discussion section.
Moreover, we performed parameter tuning to provide reasonable parameters such as sparse trade-off, number of nodes and layers, based on the experience in previous works (Liu et al., 2010; Wen et al., 2012; Zhang et al., 2017, 2018, 2019).
Deep LASSO for hierarchical spatial feature mapping
As illustrated in Figure 1d, a deep LASSO model is used to map the corresponding hierarchical spatial features based on the temporal features, for example, hierarchical dictionaries learned in each layer of DBN in the Data Acquisition and Preprocessing section. In general, the deep LASSO method can be described as decomposing input group-wise fMRI signal matrix
where
Results and Discussion
In general, we visualize, analyze, and interpret the identified hierarchical spatial and temporal patterns of brain networks both qualitatively and quantitatively in this section. Also, we compare these spatiotemporal patterns with temporal task paradigms (i.e., task designs) and spatial GLM-derived brain network maps.
All details including all 32 subjects' original task designs and corresponding identified temporal features of emotion task can be viewed by the link below:
Also, we provide a series of examples to compare with the original task design curves, including all 32 subjects' original task designs and the corresponding identified temporal features of language task can be viewed by the link below:
All representative slices and details of identified spatial networks of emotion task can be viewed by the link below:
The similar spatial results of language task of 32 subjects' individual spatial maps can be viewed by the link below:
Interpretation of hierarchical temporal and spatial features
In the following paragraphs, we present the hierarchical temporal and spatial features by taking the HCP emotion and language tfMRI data sets as examples. The results from two randomly selected HCP subjects are examined in the following figures. For each task event, we use the Pearson correlation coefficient (PCC) (Jiang et al., 2018; Lv et al., 2015a,b; Zhang et al., 2017) between the task paradigm curve and the learned temporal features as a metric to identify the temporal features related to the paradigm (Jiang et al., 2018; Lv et al., 2015a,b; Zhang et al., 2017). Figure 3e–g shows the identified temporal features (color curves, green, yellow, and red are for DBN layers #1, #2, and #3, respectively) and the task paradigms (black lines) in the emotion task for an exemplar HCP subject. Figure 3a–c shows the corresponding spatial features identified in each layer and Figure 3d is the GLM-derived brain network map for the same subject. Figure 3h provides a quantitative comparison of the similarities between the identified temporal features and the task paradigm curve (PCC), as well as the similarities between the identified spatial features and GLM-derived networks (spatial overlap).

Identified hierarchical temporal and spatial features in HCP emotion task for one exemplar subject.
These experimental results demonstrate that there are two major differences between temporal features in different layers. (1) In the deeper layer, the learned temporal features are smoother and better correlated to the task paradigm curves. The highest correlation is observed in DBN layer #3. (2) Distinct frequency variances exist between layers. For example, the time series frequency changes significantly between layers #1 and #3, suggesting that lower DBN layers represent higher frequency patterns, while deeper DBN layers represent lower frequency patterns. This result agrees with the theoretic properties of DBN (Hu et al., 2018; Huang et al., 2018; Li et al., 2018; Zhang et al., 2018). It is also observed that the spatial features identified in deeper layers have a stronger similarity with the traditional GLM-derived network maps, and the similarities are gradually increased by layers.
In general, we provide the qualitative and quantitative validations of two examples of emotion task in Figures 3, 4, and 7 and the similar results of language task in Figures 5, 6, and 8, respectively. Specifically, for Figures 3 –6, we compare the identified hierarchical spatiotemporal features with the corresponding task paradigms and task-evoked functional brain networks (i.e., COPE) in a single figure. For Figures 7 and 8, we consider the qualitative comparison of identified time series and the original corresponding task paradigms from two different tasks, including emotion and language. In the following figures, due to the presentation of our identified spatiotemporal features via two different fMRI task data sets, respectively, in detail, the quantitative measurements of those temporal and spatial similarities are provided in the bottom right subfigures in Figures 3 and 4. Moreover, these interesting observations are quite consistent and reproducible in all HCP subjects we studied. Figures 3 and 4 illustrate exemplar subjects from emotion task, and Figures 5 and 6 illustrate exemplar subjects from the language task.

Identified hierarchical temporal and spatial features in HCP emotion task for another exemplar subject.

Identified hierarchical temporal and spatial features in HCP language task for an exemplar subject.

Identified hierarchical temporal and spatial features in HCP language task for another exemplar subject.

Group-wise temporal time series patterns compared with original task paradigms in emotion task.

Group-wise temporal time series patterns compared with original task paradigms in language task.
In addition to the results for individual HCP subject, Figures 7 and 8 show two examples of the group-wise hierarchical temporal features related to different task paradigms in HCP emotion and language tasks, respectively. Based on these qualitative analyses, it is clear that the identified temporal features in deeper layers are more similar to the original task paradigm curves, as shown in the black curves. In layer #1, all the identified temporal features are noisier and with higher frequencies. In layer #2, the identified temporal features are substantially more similar to the original task paradigms. In the last layer #3, the identified temporal feature patterns are almost matched to the original task paradigms.
Also, Figure 9 shows the group-wise hierarchical spatial features related to different COPEs in HCP emotion tasks. By visual inspection, the spatial networks in deeper layers are more similar to GLM-derived network maps. In layer #1, all the identified spatial networks are quite noisy, and they only have a small portion overlapped with GLM-derived network maps. In layer #2, the identified spatial networks moderately overlap with GLM-derived spatial maps (e.g., spatial maps #1, #2, #4, and #5 in layer #2 in Figure 7; spatial maps #1, #2, #3, #4, and #5 in layer #2 in Fig. 8). More results of language task can be viewed in our released web pages.

Group-wise spatial networks related to the six COPEs in HCP emotion task. Color images are available online.
In layer #3, the identified spatial networks are largely similar to the original GLM-derived brain network maps. These observations and results are consistent and reproducible in all HCP subjects we studied. Thus, our results suggest the hierarchical organization of spatiotemporal functional brain networks in human brains is essentially enabled by our effective HSDL models. Notably, both the revealed spatial and temporal patterns of functional networks in Figures 3–10 demonstrated that lower DBN layers represent higher frequency features, while deeper DBN layers represent lower frequency features. In addition, our results demonstrate the effectiveness of using DBN (Bengio et al., 2012; Zhao et al., 2015) and deep LASSO for HSDL of 4D fMRI data.

A quantitative assessment of the identified temporal and spatial features in different layers.
Statistical analysis of hierarchical temporal and spatial features
An overall quantitative assessment of those identified hierarchical temporal and spatial features is shown in Figure 10. Figure 10a shows the PCCs between the task paradigms and the corresponding temporal features for all studied subjects in each layer (from left to right) in the emotion task, where x-axis is the index of task events (e.g., HCP COPEs) and y-axis is the index of studied HCP subjects in each subfigure. Figure 10b shows the spatial similarity measured by the Hausdorff metric (Zhang et al., 2017) in the emotion task. Figure 10c and d shows the results for the language task. In brief, the correlation between the temporal features and the task paradigms increases with deeper layers, and so does the spatial pattern similarity. Again, these quantitative analyses further demonstrate that the identified temporal and spatial features correspond well with known meaningful features, that is, task paradigm or GLM-derived brain maps. Two-sample t-tests show that the deeper layers have significantly higher temporal and spatial similarities with the benchmark patterns, compared with lower layers (please see p-values in Table 1). As discussed before, these analyses demonstrate the consistent reproducibility of identified spatiotemporal features through all individuals involved in this validation.
The p Values in Two-Sample t-Tests That Compare Temporal and Spatial Similarities in Higher and Lower Layers
Conclusion
In this article, we proposed a novel computational framework named HSDL that integrates DBN and deep LASSO to extract hierarchical temporal and spatial features in tfMRI data. The key idea of HSDL is to use DBNs to extract hierarchical temporal features, considered the hierarchical dictionaries for the next step of deep LASSO that performs spatial regression.
The contributions of the proposed computational framework in this article are considered threefold. (1) This method can be considered the early attempt to reveal the potential existence of hierarchical spatiotemporal features via tfMRI signals. (2) The validation of our methods demonstrates that there are some robust properties of hierarchical spatiotemporal features, such as the identified three layers of features both from emotion and language tasks, and the frequency and correlations are similarly varied through two different task data sets. (3) These hierarchical features were used to explore and interpret the hierarchical structures of the human brain networks.
Although current identified hierarchical spatiotemporal organizations of human brain probably represent the neural activity at different levels, more further research works need to be conducted. In conclusion, our study not only provides novel evidence to the existence of hierarchical macroscale functional networks but also opens a new venue for exploring cognitive and clinical human neuroscience problems from a unique perspective of hierarchical organization of brain functions in the future.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
T.L. was partially supported by the National Institutes of Health (DA033393, AG042599) and the National Science Foundation (IIS-1149260, CBET-1302089, BCS-1439051, and DBI-1564736).
