Multi-modal neuroimaging feature fusion via 3D Convolutional Neural Network architecture for schizophrenia diagnosis

Abstract

Early and precise diagnosis of schizophrenia disorder (SZ) has an essential role in the quality of a patient’s life and future treatments. Structural and functional neuroimaging provides robust biomarkers for understanding the anatomical and functional changes associated with SZ. Each of the neuroimaging techniques shows only a different perspective on the functional or structural of the brain, while multi-modal fusion can reveal latent connections in the brain. In this paper, we propose an approach for the fusion of structural and functional brain data with a deep learning-based model to take advantage of data fusion and increase the accuracy of schizophrenia disorder diagnosis. The proposed method consists of an architecture of 3D convolutional neural networks (CNNs) that applied to magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), and diffusion tensor imaging (DTI) extracted features. We use 3D MRI patches, fMRI spatial independent component analysis (ICA) map, and DTI fractional anisotropy (FA) as model inputs. Our method is validated on the COBRE dataset, and an average accuracy of 99.35% is obtained. The proposed method demonstrates promising classification performance and can be applied to real data.

Keywords

3D-CNN data fusion deep learning multi-modality analysis schizophrenia disorder

1. Introduction

Schizophrenia (SZ) is a complex mental disorder involving abnormal brain functions, such as cognitive impairment, mental impairment, and aberrant sensory awareness [1]. The conventional approach to diagnosing schizophrenia is based on clinical interviews, and there are no standards that can be used for final validation [2]. Recently, vast research has been carried out on the search for biomarkers and designing automated systems that use machine learning tools and various types of neuroimaging techniques to diagnose SZ [3, 4, 5, 6, 7]. Neuroimaging, such as structural and functional magnetic resonance imaging (sMRI, fMRI), positron emission tomography (PET) and, diffusion tensor imaging (DTI), provides essential information to realize the involved anatomical and functional changes in schizophrenia. Researchers have been used the various extracted features from neuroimaging modalities to design a system for automatically diagnose schizophrenia. These features are categorized into voxel and region level categories. Chyzhyk et al. [8] considered the problem of classification of SZ patients with and without a history of auditory hallucinations (AH) and healthy control subjects, by extracting regional homogeneity (ReHo) and fractional amplitude of low-frequency fluctuations (fALFF) as voxel-based features from resting-state functional magnetic resonance imaging (rs-fMRI) data. The voxel-based features are simple and very detailed but generally have very high dimensionality, so usually used in conjunction with feature selection techniques. Many researchers use region-based features. For instance, Rashid et al. [9] for classifying bipolar, schizophrenia, and healthy subjects used static and dynamic functional network connectivity (FNC), which is the temporal correlation between different brain regions extracted from fMRI data. The region-based features have low dimensions and are less sensitive to noise. These features are generally not susceptible to minor changes. Also, disease-related information may locate in a part of a region or in several multiple areas that should be considered. Due to both structural and functional brain impairments in most cases of mental disorders, the combination of neuroimaging data from several sources can provide additional information to improve the diagnosis of the disorder [10, 11]. Previous studies on multi-modal brain disorder classification such as Alzheimer’s disease [12, 13, 14, 15, 16], autism [17, 18], Parkinson’s [19], and schizophrenia [20] have shown promising results in the multi-modal fusion of brain imaging data. Deep learning approaches have recently acquired excellent reliability, especially for extracting useful information for medical image processing and computer vision applications [21]. In medical image analysis, deep learning methods can discover intricate patterns and no need handcrafted features; thus, non-experts can use them for their researches [22]. Suk et al. [23] proposed a stacked auto-encoder (SAE) based method to extract hidden data from FDG-PET, MRI, and clinical scores for classification of AD/MCI/HC. Zeng et al. [24] proposed a deep discriminant auto-encoder network with a sparsity constraint (DANS) model using functional connectivity measures, calculated from multi-site fMRI data for automatic diagnosis of schizophrenia. Kim et al. [25] use whole-brain FC maps as input to a deep neural network (DNN) for the classification of SZ/HC. Hosseini-Asl et al. [26] designed a model based on 3D-CNNs for extracting distinguishing features to diagnosis AD with structural MRI data. Liu et al. [22] proposed a cascaded 3D-CNNs and 2D-CNNs model for AD classification. They built 3D-CNNs on various MRI and PET local patches to generate compact high-level features. Inspired by the success of CNNs in the diagnosis of psychiatric disorders, we present a novel algorithm based on multi-modality and 3D-CNN architecture to learn and to combine the different and multi-level input data. The model uses 3D patches extracted from MRI and handcrafted features extracted from fMRI and DTI as input. We preserve MRI spatial data during the training and design a 3D-CNN model for learning MRI 3D patterns. The network generates automatic features from them. We extract independent component analysis (ICA), and fractional anisotropy (FA) from fMRI and DTI, respectively, as handcrafted features. The model uses a 3D-CNN architecture for each of them to obtain discriminative features. Then two fully connected layers use for feature fusion. Finally, this architecture is complemented by a softmax prediction layer to classify SZ/HC subjects. In summary, the main contributions of this work are as follow:

•
We represent a framework for the fusion of different source data based on 3D-CNNs for classifying healthy control (HC) and schizophrenia-strict (SZ) subgroups.
•
The model combines different sources and different types of voxel-based features: MRI patches with no need for further processing such as segmentation or registration, ICA measures which simulate fMRI data as a mixture of spatially independent sources and fractional anisotropy as a measure from DTI data.
•
Our proposed model maintains the intrinsic relationships of each feature due to the use of 3D inputs.
•
We found that the fractional anisotropy alone as input to the proposed model is more accurate than the other two used features, and the combination of each of the two features increases the accuracy of the classification. Finally, we demonstrate the fusion of three proposed features can complement each other and enhance the accuracy of diagnosis SZ.

2. Related work

In the recent decade, there has been an increasing interest in employing unimodal analysis and multi-modal fusion with machine learning methods for diagnosis of schizophrenia disorder. Some researchers have focused on the extracted low-level features from uni-modal data. For example, Chyzhyk et al. [27] and Savio et al. [28] used local activity measures such as voxel-mirrored homotopic connectivity (VMHC) and amplitude of low-frequency fluctuations (ALFF) [29] derived from fMRI for diagnosis SZ. Lu et al. [30] employed both voxel-based morphometry (VBM) and region of interest (ROI) analyses extracted from structural MRI for each subject. They classified them into SZ/HC groups with the SVM and recursive feature elimination (SVM-RFE) approach. Liu et al. [31] constructed individual hierarchical brain networks from structural MRI images for the classification of schizophrenia. Qureshi et al. [32] designed a deep learning classification framework with 3D-CNNs based ICA features. Kaufmann et al. [33] calculated functional connectivity measures based on group-level ICA analysis as rs-fMRI features and utilized regularized linear discriminant classifier for classification procedure. Phang et al. [34] utilized time and domain connectivity patterns derived from EEG signals. They proposed a CNN framework for Identifying Schizophrenia subjects. De Pierrefeu et al. [35] computed neuroanatomical features such as the volume of subcortical, voxel-based morphometry (VBM) maps, and the average thickness of cortical parcels from sMRI. Caprihan et al. [36] introduced a principal component analysis based method and applied to DTI data to identify age-matched schizophrenia patients from healthy controls subjects. Deng et al. utilized [37] tractography-based diffusion features to classify first-episode schizophrenia (FES) and healthy individuals. Yan et al. [38] proposed a multi-scale recurrent neural networks (RNN) framework to deal with this issue. Their model used independent components (ICs) calculated from fMRI time courses. Although the diagnosis of schizophrenia by uni-modal neuroimaging imposes less time and cost on the patient, it may reduce accuracy because both brain’s functional and structural changes are not considered. Therefore, some researchers suggest combining the information contained in images obtained by different MRI methods as a way to improve the accuracy of classification. Cetin et al. [20] Combined magnetoencephalography (MEG) and fMRI data by static and dynamic functional network connectivity analyses. They reported classification accuracy improved relative to the use of a single modality. Qureshi et al. [10] combined 12 weighted features (nine structural and three functional) to the diagnosis of schizophrenic patients. They proposed a concatenation method based on hybrid weighted features. Guo et al. [3] constructed ROI-based multi-index vectors by the combination of the functional, structural, and DTI features. These vectors for each ROI consist of 89 functional connectivity (FC) coefficients from fMRI, the volume of grey matter (GM) and white matter (WM) from sMRI and the fractional anisotropy (FA), radial diffusivity (RD) and mean diffusivity (MD). In [39], Liu et al. extracted eight features from sMRI and DTI images via multi brain atlases then a feature selection method applied to select the most discriminative features. Sui et al. [40] proposed a fusion model called ‘mCCA1

¹
multi-set Canonical Correlation Analysis.

+

jICA2

joint Independent Component Analysis.

’. They studied combining the related cognitive biomarkers of schizophrenia, ALFF, grey matter density (GM), and FA measures from fMRI, MRI, and DTI modalities. Generally, the incompatibility of the nature and size of the data is one of the challenges of multi-modal data fusion methods. For example, sMRI is three-dimensional images, while fMRI is four-dimensional data, which makes them incompatible in data analysis properties such as linear regression or correlation. In addition, the data associated with brain images are very dimensional and scant in samples. For instance, brain MRI images may have 300,000 or more voxels, while the number of samples is less than 100 [41].

3. Material and method

3.1 Convolutional neural networks (CNNs)

CNN models are an important class of deep architectures which have been used in many machine learning fields. There are various variants of CNN models used in the literature. However, their essential components are fundamentally the same. Generally, in addition to one input and one output layer, convolutional neural networks also have multiple hidden layers that stacked on top of each other. Typically, CNN hidden layers consist of convolutional layers, pooling layers, rectified linear unit (RELU) layers, activation function, and fully connected layers. The convolution layer has several filters with the predefined size that convolve with the input of the layer to create a feature map for the next layer. The main idea of the convolutional layer is the extraction of visual features such as edges, lines, etc. The convolutional layers can hold spatial patterns and share weights over the entire spatial region, thus expand the generalization ability of the model. The next important layer is pooling that uses to down-sampling the feature maps of the former layer. The output of this layer due to applying max or average activation becomes more robust than small shifts and effectively reduces the size of feature maps. Commonly, several convolutional and pooling layers are repeated successively. The classification part of the network is made of several fully connected layers and generate the final results. Conventional CNN is designed for use on 2D data. However, applying 2D-CNN on 3D data such as video or MRI because of missing information on the third dimension is a challenge. To tackle this problem, Ji et al. designed a 3D-CNN architecture to analyze video for action recognition [42]. So feature maps obtained from both the 2D image and time by performing 3D convolutions. Similar to video data, MRI and DTI data both also have three dimensions; thus, in the proposed method, we utilize 3D convolutions to extract the 3D local patterns of neuroimaging features to help the diagnosis of schizophrenia.

3.2 Independent component analysis (ICA)

ICA is a practical method to decompose multivariate observed data into separate components that are statistically independent. ICA is also used as an efficient tool to analyze the hidden spatial and temporal structure contained in the brain imaging data. Figure 1 shows a general ICA model for fMRI data. In the data matrix, each row represents simultaneously 3D volume data, and each column contains data at all-time points from one voxel. All voxels are ordered next to each other to create one long row, at each point in time, which represents the entire three-dimensional brain. Spatial ICA decomposes fMRI data into the mixing matrix and spatially independent component (spatial maps) matrix. Mixing matrix is an M-by-N, where M is the number of time courses, and N is the number of components. The spatially independent matrix is an N-by-V, where V is the voxel number [43]. The goal of the ICA is to calculate the spatially separate component matrix.

Figure 1.

Spatial ICA for fMRI data. The rows in the data matrix are contain vectorized spatial scanned volumes. Activation time-courses are in the corresponding column in the mixing matrix, and the rows of the array of the spatial map are vectorized volumes.

FastICA is One of the most successful approaches for solving ICA problems, especially for biomedical signal processing [44]. FastICA provides an easy way to extract independent components. It is independent of any user-defined parameters and quickly converge to the most accurate solution allowable by the data. We apply the FastICA library3

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html.

with python on the fMRI scans. In this paper, we use the first component (IC1) of the spatially independent component matrix to reduce complexity.

Figure 2.

Proposed architecture based on 3D-CNNs for the fusion of multi-modal data to classify SZ/HC subjects. The inputs of the model are three types of whole-brain 3D features from different sources. The architecture contains three channels. Each channel comprises three layers, including three 3D-CNN layers. Finally, two fully connected and a softmax layer concatenate and classify data.

3.3 Proposed method

Figure 2 shows our multi-modal 3D-CNN architecture to classify SZ and HC subjects. The model has three similar channel that takes MRI, fMRI, and DTI data as input, separately. After pre-processing all the input data, we extract the desired features. Pre-processing details are described in Section 4.1. To reduce complexity, we consider a cube with a size of 70 $\times$ 70 $\times$ 70 feature values for all extracted features and exclude zeros in the boundary areas. However, the network has no limit on the size of the extracted features. The model uses MRI data after eliminating excess parts such as the skull, which is done in pre-processing. In the following, 3D-CNNs automatically construct low-level features from the MRI patches. The second input is the fMRI data. After pre-processing the fMRI data, we find the independent components (ICs) that the combination of them produces the actual signal. As we stated earlier, we use the FastICA algorithm to this end. DTI data use as the third input. We calculate fractional anisotropy from pre-processed DTI data as a low-level feature. The model constructs latent feature maps from ICA data and fractional anisotropy. The proposed network architecture consists of three layers in each channel; we call them C ${}_{1}$ , C ${}_{2}$ , and C ${}_{3}$ . In each layer, we apply convolution filters, ReLu function, and max-pooling function, respectively. As with most deep learning challenges, the selection of network parameters is problem-dependent. The network parameters described in this paper are the result of several experiments that have yielded the best results for the applied dataset. In the C ${}_{1}$ layer, we train eight different 3D kernels with a size of (5, 5, 5). After using the ReLu function to the output of convolution filters, a (2, 2, 2) max-polling layer applies to reduce feature map size. The output of the first layer after down-sampling is a feature map with a size of (35, 35, 35). Similarly, we consider 16 different (3, 3, 3) convolutional 3D kernels in the C ${}_{2}$ layer and 32 different convolutional 3D kernels with the size of (3, 3, 3) in the C ${}_{3}$ layer. The down-sampled feature maps after layers C ${}_{2}$ and C ${}_{3}$ have the size of (18, 18, 18) and (9, 9, 9), respectively. The output of the C ${}_{3}$ layer in all channels is flattened. Then, two fully connected layers with 11664 and 1024 hidden units, respectively, concatenate the flattened maps. Finally, a softmax layer produces a probabilistic score over each class. We set the initial weights of the 3D convolutional kernels according to a Gaussian distribution. We apply cross-entropy [45] for updating the network parameters (weights and biases). This loss function takes into account two probability distributions, which are the label of the $n^{\text{th}}$ sample (true marker) and the output of the neural network (the prediction). Cross-entropy yields some measure of similarities between these two probability distributions and define as:

$\displaystyle{C}({{{w}},{{b}}})=-1/N\left({\mathop{\sum}\limits_{{{n}}=1}^{{N}% }{{{y}}_{{n}}}\ln{{{f}}_{{{W}},{{b}}}}({{{{x}}_{{n}}}})+({1-{{{y}}_{{n}}}})\ln% ({1-{{{f}}_{{{W}},{{b}}}}({{{{x}}_{{n}}}})})}\right)$ (1)

Where $N$ is the training samples number, $y_{n}$ is the corresponding label of the input $x_{n}$ , and $f_{(W,b)}(.)$ represents the output of the neural network. The fully connected layers use the sigmoid function as the activation function. We additionally apply the standard dropout trick [46] in the fully connected layers as a regularization technique. During dropout, the inputs of the second fully connected layer with a probability of 0.5 are randomly set to zero. It is proved that the dropout technique effectively reduces overfitting in the training of deep neural networks [46]. We train the 3D-CNNs via an adaptive gradient algorithm (Adagrad) [47] with a learning rate of 0.01 and a batch size to 16. The learning procedure of proposed 3D-CNN architecture is summarized as Algorithm 1.

Pseudo-Code for Training proposed architectureInput:X1: MRI patches after pre-processingX2: ICA values, extract from pre-processed fMRI data with FastICA algorithmX3: FA measures, calculat from pre-processed DTI datay: labels[1] i in [1, epoch] $M1\leftarrow\textit{X1 }$ Calculating M1 feature map through the first 3D-CNN layer. $M2\leftarrow\textit{M1 }$ Calculating M2 feature map through the second 3D-CNN layer. $M3\leftarrow\textit{M2 }$ Calculating M3 feature map through the third 3D-CNN layer. $F1\leftarrow\textit{X2 }$ Calculating F1 feature map through the first 3D-CNN layer. $F2\leftarrow\textit{F1 }$ Calculating F2 feature map through the second 3D-CNN layer. $F3\leftarrow\textit{F2 }$ Calculating F3 feature map through the third 3D-CNN layer. $D1\leftarrow\textit{X3 }$ Calculating D1 feature map through the first 3D-CNN layer. $D2\leftarrow\textit{D1 }$ Calculating D2 feature map through the second 3D-CNN layer. $D3\leftarrow\textit{D2 }$ Calculating D3 feature map through the third 3D-CNN layer. $N1\leftarrow\textit{concate(flat(M3,F3,D3) }$ concatenating flattened final feature maps. $N2\leftarrow\textit{N1 }$ Feeding N1 into first fully connected layer. $N3\leftarrow\textit{N2 }$ Feeding N2 into the second fully connected layer. $\hat{y}\leftarrow\textit{N3 }$ Feeding N3 into the softmax layer.With the Adagrad update rule, minimize cross-entropy between y and $\hat{y}$ .

4. Experiments and results

4.1 Experiment dataset

In this work, we use the ‘Center for Biomedical Research Excellence’4

⁴
http://cobre.mrn.org.

(COBRE) dataset [48, 49]. This dataset is publicly available on the (www.schizoconnect.com) website. In our experiments, we exclude each subject that did not have one of the MRI, fMRI, or DTI modalities. Thus used dataset includes 81 subjects in the healthy control (HC) subgroup and 64 subjects in the schizophrenia-strict (SZ) subgroup. There are 13 females and 51 males with an average age of 38.92 in the SZ subgroup and 21 females and 60 males with an average age of 37.98 in the HC subgroup. T1-weighted MRI images in this dataset are scanned by a single 3T SIEMENS MAGNETOM TrioTim syngo B17 MR scanner with the following parameters: TR

=

2530 ms, flip angle

=

{}^{\circ}

, TE

=

[1.64, 3.5, 5.36, 7.22, 9.08], TI

=

1200 ms, FOV Phase

=

100%, FOV read

=

256 mm, slices per slab

=

192, voxel size

=

1 mm

{}^{3}

. We apply the BET tool from ‘Functional Magnetic Resonance Imaging of the Brain’ Center Software Library (FSL) [50] to MRI images for eliminating none brain regions. In the COBRE dataset, resting-state fMRI images are scanned when subjects rest passively with their open/close eyes. Scanning parameters are set as: TR

=

2000 ms, voxel size

=

3.8

\times

3.8

\times

3.5 mm, echo time TE

=

29 ms, slice thickness

=

3.5 mm, number of slices

=

33, FoV read

=

240 mm, FoV phase

=

100% and flip angle

=

{}^{\circ}

. We pre-process the rs-fMRI data using ‘Data Processing and Analysis for Brain Imaging’ (DPABI, http://www.rfmri.org/dpabi) toolbox. The pre-processing stages are as follows: The initial ten volumes are discarded to ensure that the fMRI signal reached a steady state. We correct the slice timings to compensate for the slice acquisition delays according to the acquisition of the slice at the mid-point of each repetition time (TR). After pre-processing, we extract ICA data for all fMRI images with the FastICA algorithm. DTI imaging parameters are: no. of slices

=

72, FoV read

=

256 mm, FoV phase

=

100%, slice thickness

=

2 mm, TR

=

9000 mm, TE

=

84 ms, b-value

=

800 s/mm voxel size

=

2 mm

{}^{3}

. For DTI image after eddy current correction with FSL software, we extract FA measures with the same software. FA (0

\leqslant

\leqslant

1) represents the amount of anisotropy diffusing water molecules. Since water molecules have more diffusion in the direction of the brain white matter fibers, FA can show the measure of fiber integrity in the white matter of the brain. Fiber integrity and thus, FA is a biomarker to some brain disorders such as schizophrenia [51]. It is worth mentioning; we register all modalities in the ‘Montreal Neurological Institute’ (MNI) 152 space [52].

4.2 Experiments setup

We use the COBRE data set to evaluate our multi-modality 3D-CNN architecture. To avoid random factors influencing the results, we apply a ten-fold cross-validation manner with eight folds for use in the training step and one fold for each of the validation and the testing steps. We use the validation part to end the training process when obtaining the optimized weight for the model. We implement the proposed model with the TensorFlow library [53] in Python and Google COLAB systems. To evaluate the effectiveness of classification and comparison with the other studies, we calculate the classification accuracy (ACC), the specificity (SPE), the sensitivity (SEN), and the area under the receiver operating characteristic (ROC) curve (AUC) [10] and demonstrate the ROC curves.

4.3 Performance of the model with single and multi-modality input

In the first experiment, we evaluate the proposed model on the different individual modalities and the fusion of multi-modalities. We use MRI patches, fMRI (ICA components), and DTI (FA biomarker) as the input of the model in the separate experiments. The multi-modality tests perform as the fusion of MRI-fMRI, MRI-DTI, fMRI-DTI, and MRI-fMRI-DTI. In these tests, we consider three 3D-CNN layers and set other network parameters the same as discussed in the previous section. Table 1 shows the evaluated results of the SZ’s vs. the HC’s subjects. Figures 3 and 4 compare the ROC curves of these experiments.

Table 1
Comparison of the single and multi-modality

	SEN (%)	SPE (%)	ACC (%)	AUC (%)
MRI	82.71	78.81	81.21	88.3
fMRI	88.84	84.80	84.78	90.9
DTI	82.71	78.81	81.21	88.3
MRI-fMRI	91.33	91.39	91.35	96.8
MRI-DTI	91.72	92.62	92.07	97.0
fMRI-DTI	92.55	94.8	93.14	97.0
MRI-fMRI-DTI	99.00	94.95	97.29	98.3

Figure 3.

Comparison of the ROC curves of single and multi-modality classification.

Figure 4.

Comparison of the ROC curves of different multi-modalities classification.

4.4 The effect of the number of 3D-CNN layers on model performance

In this experiment, we change the count of 3D-CNN layers for the fusion of “MRI-fMRI-DTI” and compare the model performance. In separate tests, we use 3, 4, 5, and 6 layers of 3D-CNNs for each modality. All network parameters are the same as discussed in Section 4.1. In these tests, we set 32 different (3, 3, 3) convolutional 3D kernels for layers 3 and 4, and 64 convolutional 3D kernels with the size of (5, 5, 5) for layers 5 and 6. Table 2 shows the comparison of their classification performance, and Fig. 5 demonstrates the ROC curve of each experiment.

Table 2
Our model classification performance with changing in the number of 3D-CNN layers

Layer number	SEN (%)	SPE (%)	ACC (%)	AUC (%)
3	99.00	94.95	97.29	98.3
4	99.62	94.95	98.64	98.3
5	100	98.28	99.35	98.5
6	97.24	97.17	97.21	98.2

Table 3

Our model classification performance with changing in the number of 3D-CNN layers

Study	Modality	Sample (%)	SEN (%)	SPE (%)	ACC (%)
[33]	rs-fMRI	196 HC, 71 SZ	50.00	92.8	84.4
[39]	sMRI, DTI	33 HC, 62 SZ	90.85	92.17	91.28
[27]	rs-fMRI	72 HC, 74 SZ	–	–	91.2/100
[32]	rs-fMRI	72 HC, 72 SZ	97.49	98.62	98.09
[34]	EEG	39 HC, 45 SZ	91.11	92.50	91.69
[38]	fMRI	542 HC, 558 SZ	83.1	83.5	83.2
[10]	rs-fMRI, sMRI	72 HC, 72 SZ	100	98.57	99.29
[3]	rs-fMRI, sMRI, DTI	168 HC, 161 SZ	85.98	87.34	86.52
Proposed method	MRI, rs-fMRI, DTI	81 HC, 64 SZ	100	98.28	99.35

Figure 5.

ROC curves of experiments with the different layer number of 3D-CNNs

4.5 Comparison with existing methods

A performance comparison of our 3D multi-modal approach with the presented results in the papers is shown in Table 3. In Section 2, we described the used methods in these articles. It should be noted that different feature extraction and classification approaches and also the different datasets are effective on the obtained results. All these differences make it difficult to compare results. Moreover, differences in sample size and applying the cross-validation can also make it challenging to achieve a fair comparison [22]. In contrast to these methods, our approach requires fewer pre-processing steps to extract features. Our approach does not require any ROI or voxel-based analysis, segmentation, or rigid registration, which reduces calculation costs.

5. Discussion

Different from the previous techniques that rely only on the handcrafted features, the proposed method combined automatic and handcrafted extracted features and use them to find latent features to diagnosis schizophrenia. We used MRI patches with no segmentation, rigid registration, or further pre-processing. 3D-CNNs learn the properties of these patches and provide features that are more robust to individual variations such as translation, rotation, etc. Researches have proposed a wide variety of ICA approaches, and more of them successfully applied to the study of functional network biomarkers in the fMRI data [9, 54, 55, 56]. In this study, we used the first component of the ICA model to reduce complexity. From DTI data, we extracted FA measures. Several studies have shown the relationship between FA measures and symptoms of schizophrenia [57, 58, 59, 60], and that is why we chose it. Results in Table 1 show, all single modalities have acceptable results, and as we expected, the accuracy of all multi-modality cases improved. The improvement in classification accuracy shows that there is complementary information in each of the modalities, and the combination of MRI, fMRI, and DTI data is very suitable in diagnosis SZ. For finding proper tuning for the network, we changed the number of 3D-CNNs layers. As Table 2 shows, by increasing the layers, accuracies increased, but after the 5th layer, it decreased. One reason for this situation is that the number of training samples is low. We report the accuracy of our model with five layers of 3D-CNNs. Comparing the results in Table 3 shows that the proposed method with 99.35% accuracy is reliable, but Because of the datasets are not the same, judgment is not fair. However, our proposed method with fewer low-level features is capable of delivering promising results with less complexity. There are some limitations to our proposed method. First, choosing random weights may lead to changes in results; however, we can use the fixed seed for the initial weights for reproducible results. Second, the 3D-CNN parameters, such as the size and number of filters in each layer or the number of layers, may not be optimally set. We find optimal settings in our experiment by cross-validation. Third, in this study, only ICA components and FA are used as low-level features. There are more informative features such as mean diffusivity (MD) and mode of anisotropy (MO) from DTI images and the amplitude of low-frequency fluctuation (ALFF), the fractional amplitude of low-frequency fluctuation (fALFF), regional homogeneity (ReHo), voxel-mirrored homotopic connectivity (VMHC) from rs-fMRI, which may improve the performance.

6. Conclusion

In this study, we have presented a multi-modal classification algorithm based on the 3D-CNNs to diagnosis SZ subjects from normal healthy control subjects using the fusion of MRI, fMRI, and DTI images. Our proposed method combines row data and low-level features with a simple architecture to this end. We built the 3D-CNNs on the MRI patches (row data) and low-level features extracted from fMRI and DTI images. The network gradually learns the latent features from multiple imaging modalities and fusing them for disease classification. Experimental results on the COBRE dataset demonstrate promising performance for SZ diagnosis. Promising results on our experiments showed that the fusion of structural and functional data with 3D-CNNs architecture is useful in diagnosis schizophrenia. In future work, we try to use a large dataset contain multi-site imaging data and try to find the effective fusion of features with this architecture based on one modality to reduce time and cost.

Footnotes

Acknowledgments

The authors would like to very kindly thank the anonymous reviewers for their valuable comments, which helped us to improve the work.

References

Latha

B.M.

and Kavitha

, Detection of Schizophrenia in brain MR images based on segmented ventricle region and deep belief networks, Neural Computing and Applications 31(9) (2019), 5195–5206. doi: 10.1007/s00521-018-3360-1.

and Calhoun

V.D.

, Classification and prediction of brain disorders using functional connectivity: promising but challenging, Frontiers in Neuroscience 12(2) (2018), 525–525. doi: 10.3389/fnins.2018.00525.

Guo

Huang

C.C.

Zhao

Yang

A.C.

Lin

C.P.

Nichols

et al., Combining multi-modality data for searching biomarkers in schizophrenia, PLoS ONE 13(2) (2018), 1–20. doi: 10.1371/journal.pone.0191202.

Huang

Zhu

Hao

Shi

Gao

et al., Identifying resting-state multi-frequency biomarkers via tree-guided group sparse learning for schizophrenia classification, IEEE Journal of Biomedical and Health Informatics 31(9) (2018), 1–1. doi: 10.1109/JBHI.2018.2796588.

Wang

Zhang

Fan

Zhao

et al., Abnormal regional homogeneity as a potential imaging biomarker for adolescent-onset schizophrenia: a resting-state fMRI study and support vector machine analysis, Schizophrenia Research 192 (2018), 179–184. doi: 10.1016/j.schres.2017.05.038.

Woo

C.-W.

Chang

L.J.

Lindquist

M.A.

and Wager

T.D.

, Building better biomarkers: brain models in translational neuroimaging, Nature Neuroscience 20 (2017). doi: 10.1038/nn.4478.

Dillon

Calhoun

and Wang

Y.-P.

, A robust sparse-modeling framework for estimating schizophrenia biomarkers from fMRI, Journal of Neuroscience Methods 276 (2017), 46–55. doi: 10.1016/j.jneumeth.2016.11.005.

Chyzhyk

Graña

Öngür

and Shinn

A.K.

, Discrimination of schizophrenia auditory hallucinators by machine learning of resting-state functional MRI, International Journal of Neural Systems 25(3) (2015), 1550007–1550007. doi: 10.1142/S0129065715500070.

Rashid

Arbabshirani

M.R.

Damaraju

Cetin

M.S.

Miller

Pearlson

G.D.

et al., Classification of schizophrenia and bipolar patients using static and dynamic resting-state fMRI brain connectivity, NeuroImage 134 (2016), 645–657. doi: 10.1016/j.neuroimage.2016.04.051.

10.

Qureshi

M.N.I.

Cho

H.J.

and Lee

, Multimodal discrimination of schizophrenia using hybrid weighted feature concatenation of brain functional connectivity and anatomical features with an extreme learning machine, Frontiers in Neuroinformatics 11 (2017), 1–14. doi: 10.3389/fninf.2017.00059.

11.

Haddadpour

Daneshvar

and Seyedarabi

, PET and MRI image fusion based on combination of 2-D Hilbert transform and IHS method, Biomedical Journa 40(4) (2017), 219–225. doi: 10.1016/j.bj.2017.05.002.

12.

Shi

Zheng

Zhang

and Ying

, Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of alzheimer’s disease, IEEE Journal of Biomedical and Health Informatics 22(1) (2018), 173–183. doi: 10.1109/JBHI.2017.2655720.

13.

Schouten

T.M.

Koini

de Vos

Seiler

van der Grond

Lechner

et al., Combining anatomical, diffusion, and resting state functional magnetic resonance imaging for individual classification of mild and moderate Alzheimer’s disease, NeuroImage: Clinical 11 (2016), 46–51. doi: 10.1016/j.nicl.2016.01.002.

14.

Chen

and Yao

, Classification of alzheimer’s disease, mild cognitive impairment, and cognitively unimpaired individuals using multi-feature kernel discriminant dictionary learning, Frontiers in Computational Neuroscience 11(117) (2018). doi: 10.3389/fncom.2017.00117.

15.

Yun

H.J.

Kwak

Lee

J.-M.

and Alzheimer’s Disease Neuroimaging

, Multimodal discrimination of alzheimer’s disease based on regional cortical atrophy and hypometabolism, PLOS ONE 10(6) (2015), e0129250. doi: 10.1371/journal.pone.0129250.

16.

Youssofzadeh

McGuinness

Maguire

L.P.

and Wong-Lin

, Multi-kernel learning with dartel improves combined MRI-PET classification of alzheimer’s disease in AIBL data: group and individual analyses, Frontiers in Human Neuroscience 11(380) (2017). doi: 10.3389/fnhum.2017.00380.

17.

Libero

L.E.

DeRamus

T.P.

Lahti

A.C.

Deshpande

and Kana

R.K.

, Multimodal neuroimaging based classification of autism spectrum disorder using anatomical, neurochemical, and white matter correlates, Cortex 66 (2015), 46–59. doi: 10.1016/j.cortex.2015.02.008.

18.

Akhavan Aghdam

Sharifi

and Pedram

M.M.

, Combination of rs-fMRI and sMRI data to discriminate autism spectrum disorders in young children using deep belief network, Journal of Digital Imaging 31(6) (2018), 895–903. doi: 10.1007/s10278-018-0093-8.

19.

Nemmi

Pavy-Le Traon

Phillips

O.R.

Galitzky

Meissner

W.G.

Rascol

et al., A totally data-driven whole-brain multimodal pipeline for the discrimination of Parkinson’s disease, multiple system atrophy and healthy control, NeuroImage: Clinical 23 (2019), 101858. doi: 10.1016/j.nicl.2019.101858.

20.

Cetin

M.S.

Houck

J.M.

Rashid

Agacoglu

Stephen

J.M.

Sui

et al., Multimodal classification of schizophrenia patients with MEG and fMRI data using static and dynamic connectivity measures, Frontiers in Neuroscience 10(466) (2016). doi: 10.3389/fnins.2016.00466.

21.

Shen

Suk

H.-i.

and Engineering

, Deep learning in medical image analysis, Annual Review of Biomedical Engineering 19(1) (2017), 221–248. doi: 10.1146/annurev-bioeng-071516-044442.

22.

Liu

Cheng

Wang

and Wang

, Multi-modality cascaded convolutional neural networks for alzheimer’s disease diagnosis, Neuroinformatics 16(3–4) (2018), 295–308. doi: 10.1007/s12021-018-9370-4.

23.

Suk

H.-I.

Lee

S.-W.

and Shen

, Latent feature representation with stacked auto-encoder for AD/MCI diagnosis, Brain Structure and Function 220(2) (2015), 841–859. doi: 10.1007/s00429-013-0687-3.

24.

Zeng

L.-L.

Wang

Yang

Shen

et al., Multi-site diagnostic classification of schizophrenia using discriminant deep learning with functional connectivity MRI, EBioMedicine 30 (2018), 74–85. doi: 10.1016/j.ebiom.2018.03.017.

25.

Kim

Calhoun

V.D.

Shim

and Lee

J.H.

, Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: evidence from whole-brain resting-state functional connectivity patterns of schizophrenia, NeuroImage 124 (2016), 127–146. doi: 10.1016/j.neuroimage.2015.05.018.

26.

Hosseini-Asl

Ghazal

Mahmoud

Aslantas

Shalaby

A.M.

Casanova

M.F.

et al., Alzheimer’s disease diagnostics by a 3D deeply supervised adaptable convolutional network, Front Biosci (Landmark Ed) 23 (2018), 584–596.

27.

Chyzhyk

Savio

and Graña

, Computer aided diagnosis of schizophrenia on resting state fMRI data by ensembles of ELM, Neural Networks 68 (2015), 23–33. doi: 10.1016/j.neunet.2015.04.002.

28.

Savio

and Graña

, Local activity features for computer aided diagnosis of schizophrenia on resting-state fMRI, Neurocomputing 164 (2015), 154–161. doi: 10.1016/j.neucom.2015.01.079.

29.

Yu-Feng

Yong

Chao-Zhe

Qing-Jiu

Man-Qiu

Meng

et al., Altered baseline brain activity in children with ADHD revealed by resting-state functional MRI, Brain and Development 29(2) (2007), 83–91. doi: 10.1016/j.braindev.2006.07.002.

30.

Yang

Gao

Zhang

et al., Discriminative analysis of schizophrenia using support vector machine and recursive feature elimination on structural MRI images, Medicine (United States) 95(30) (2016). doi: 10.1097/MD.0000000000003973.

31.

Liu

Pan

F.-X.

Chen

and Wang

, Classification of schizophrenia based on individual hierarchical brain networks constructed from structural MRI images, IEEE Transactions on NanoBioscience 16(7) (2017), 600–608. doi: 10.1109/TNB.2017.2751074.

32.

Qureshi

M.N.I.

and Lee

, 3D-CNN based discrimination of schizophrenia using resting-state fMRI, Artificial Intelligence in Medicine 98 (2019), 10–17. doi: 10.1016/j.artmed.2019.06.003.

33.

Kaufmann

Skatun

K.C.

Alnaes

Doan

N.T.

Duff

E.P.

Tonnesen

et al., Disintegration of sensorimotor brain networks in schizophrenia, Schizophr Bul 41(6) (2015), 1326–1335. doi: 10.1093/schbul/sbv060.

34.

Phang

C.-R.

Noman

Hussain

Ting

C.-M.

and Ombao

, A Multi-Domain Connectome Convolutional Neural Network for Identifying Schizophrenia from EEG Connectivity Patterns, IEEE Journal of Biomedical and Health Informatics, 2019.

35.

de Pierrefeu

Löfstedt

Laidi

Hadj-Selem

Bourgin

Hajek

et al., Identifying a neuroanatomical signature of schizophrenia, reproducible across sites and stages, using machine learning with structured sparsity, Acta Psychiatrica Scandinavica 138(6) (2018), 571–580. doi: 10.1111/acps.12964.

36.

Caprihan

Pearlson

G.D.

and Calhoun

V.D.

, Application of principal component analysis to distinguish patients with schizophrenia from healthy controls based on fractional anisotropy measurements, NeuroImage 42(2) (2008), 675–682. doi: 10.1016/j.neuroimage.2008.04.255.

37.

Deng

Hung

K.S.Y.

Lui

S.S.Y.

Chui

W.W.H.

Lee

J.C.W.

Wang

et al., Tractography-based classification in distinguishing patients with first-episode schizophrenia from healthy individuals, Progress in Neuro-Psychopharmacology and Biological Psychiatry 88 (2019), 66–73. doi: 10.1016/J.PNPBP.2018.06.010.

38.

Yan

Calhoun

Song

Cui

Yan

Liu

et al., Discriminating schizophrenia using recurrent neural network applied on time courses of multi-site FMRI data, EBioMedicine, 2019. doi: 10.1016/j.ebiom.2019.08.023.

39.

Liu

Wang

Zhang

Pan

Wang

and Wang

, MMM: classification of schizophrenia using multi-modality multi-atlas feature representation and multi-kernel learning, Multimedia Tools and Applications 77(22) (2018), 29651–29667. doi: 10.1007/s11042-017-5470-7.

40.

Sui

Chen

Rogers

Pearlson

G.D.

et al., Combination of resting state fMRI, DTI, and sMRI data to discriminate schizophrenia by n-way MCCa+jICA, Frontiers in Human Neuroscience 7 (2013), 1–14. doi: 10.3389/fnhum.2013.00235.

41.

Sabuncu

M.R.

Konukoglu

and I. for the Alzheimer’s Disease Neuroimaging, Clinical prediction from structural brain MRI scans: a large-scale empirical study, Neuroinformatics 13(1) (2015), 31–46. doi: 10.1007/s12021-014-9238-1.

42.

Yang

and Yu

, 3D convolutional neural networks for human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 35(1) (2013), 221–231. doi: 10.1109/TPAMI.2012.59.

43.

Bijsterbosch

Smith

S.M.

and Beckmann

C.F.

, Introduction to Resting State fMRI Functional Connectivity, 2017. Available: http://public.eblib.com/choice/publicfullrecord.aspx?p=4862781.

44.

Hyv

, Rinen and Oja

, A fast fixed-point algorithm for independent component analysis, Neural Comput 9(7) (1997), 1483–1492. doi: 10.1162/neco.1997.9.7.1483.

45.

Nielsen

M.A.

, Neural Networks and Deep Learning, Determination Press, 2015.

46.

Srivastava

Hinton

Krizhevsky

Sutskever

and Salakhutdinov

, Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res 15(1) (2014), 1929–1958.

47.

Duchi

Hazan

and Singer

, Adaptive subgradient methods for online learning and stochastic optimization, J. Mach. Learn. Res 12 (2011), 2121–2159.

48.

Calhoun

V.D.

Sui

Kiehl

Turner

Allen

and Pearlson

, Exploring the psychosis functional connectome: aberrant intrinsic networks in schizophrenia and bipolar disorder, Frontiers in Psychiatry 2 (2012). doi: 10.3389/fpsyt.2011.00075.

49.

Mayer

A.R.

Ruhl

Merideth

Ling

Hanlon

F.M.

Bustillo

et al., Functional imaging of the hemodynamic sensory gating response in schizophrenia, Human Brain Mapping 34(9) (2013), 2302–2312. doi: 10.1002/hbm.22065.

50.

Smith

S.M.

Jenkinson

Woolrich

M.W.

Beckmann

C.F.

Behrens

T.E.J.

and Johansen-Berg

, Advances in functional and structural MR image analysis and implementation as FSL, NeuroImage 23 (2004). doi: 10.1016/j.neuroimage.2004.07.051.

51.

Towards a Functional Neuroanatomy of Symptoms and Cognitive Deficits of Schizophrenia, eds. Springer Netherlands, Dordrecht, 2009, 55–66.

52.

Mandal

P.K.

Mahajan

and Dinov

I.D.

, Structural brain atlases: design, rationale, and applications in normal and pathological cohorts, Journal of Alzheimer’s Disease: JAD 31 Suppl 3(3) (2012), S169–S188. doi: 10.3233/JAD-2012-120412.

53.

Abadi

Barham

Chen

Davis

Dean

et al., TensorFlow: a system for large-scale machine learning, in: Presented at the Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, Savannah, GA, USA, 2016.

54.

Sui

Chen

Liu

Jiang

Silva

et al., Parallel group ICA+ICA: joint estimation of linked functional network variability and structural covariation with application to schizophrenia, Human Brain Mapping 40(13) (2019), 3795–3809. doi: 10.1002/hbm.24632.

55.

Lottman

K.K.

White

D.M.

Kraguljac

N.V.

Reid

M.A.

Calhoun

V.D.

Catao

et al., Four-way multimodal fusion of 7 T imaging data using an mCCA+jICA model in first-episode schizophrenia, Human Brain Mapping 39(4) (2018), 1475–1488. doi: 10.1002/hbm.23906.

56.

Pearlson

G.D.

Liu

Sui

et al., A group ICA based framework for evaluating resting fMRI markers when disease categories are unclear: application to schizophrenia, bipolar, and schizoaffective disorders, NeuroImage 122 (2015), 272–280. doi: 10.1016/j.neuroimage.2015.07.054.

57.

Yang

Cao

Liang

and Zhao

, Schizophrenia symptomatic associations with diffusion tensor imaging measured fractional anisotropy of brain: a meta-analysis, Neuroradiology 59(7) (2017), 699–708. doi: 10.1007/s00234-017-1844-9.

58.

Yao

Lui

Liao

M.Y.

and Thomas

J.A.

, White matter deficits in first episode schizophrenia: an activation likelihood estimation meta-analysis, Prog Neuro-Psychopharmacol Biol Psychiatry 45 (2013). doi: 10.1016/j.pnpbp.2013.04.019.

59.

Voineskos

A.N.

Lobaugh

N.J.

Bouix

Rajji

T.K.

Miranda

Kennedy

J.L.

et al., Diffusion tensor tractography findings in schizophrenia across the adult lifespan, Brain: A Journal of Neurology 133(Pt 5) (2010), 1494–1504. doi: 10.1093/brain/awq040.

60.

Kelly

Jahanshad

Zalesky

Kochunov

Agartz

Alloza

et al., Widespread white matter microstructural differences in schizophrenia across 4322 individuals: results from the ENIGMA Schizophrenia DTI Working Group, Molecular Psychiatry 23 (2017), 1261. doi: 10.1038/mp.2017.170.

Multi-modal neuroimaging feature fusion via 3D Convolutional Neural Network architecture for schizophrenia diagnosis

Abstract

Keywords

1. Introduction

1 multi-set Canonical Correlation Analysis.

3.1 Convolutional neural networks (CNNs)

3.2 Independent component analysis (ICA)

4.1 Experiment dataset

4 http://cobre.mrn.org.

4.3 Performance of the model with single and multi-modality input

Table 1 Comparison of the single and multi-modality

Table 2 Our model classification performance with changing in the number of 3D-CNN layers

5. Discussion

6. Conclusion

Footnotes

Acknowledgments

References

¹
multi-set Canonical Correlation Analysis.

⁴
http://cobre.mrn.org.

Table 1
Comparison of the single and multi-modality

Table 2
Our model classification performance with changing in the number of 3D-CNN layers