Abstract
Alzheimer’s disease is a brain disorder which causes the malfunction of neurons. This disease can cause loss of brain function and dementia which can further damage memory, thought process and human behaviour. Regardless of being a worst disease, it has no cure. Only handful of classification strategies have been proposed in the literature that too with a small set of training images. Existing methods for the detection of Alzheimer’s disease from MRI images make use of only certain selective subsets of data based on age, gender etc., and often rely on clinical data to aid in their classification. This paper proposes a Convolutional Neural Network (CNN) model for recognition and detection of Alzheimer’s disease from MRI images, trained on the Open Access Series of Imaging Studies (OASIS) dataset. CNNs are the most popular deep learning architectures used for image related problems in recent times. In addition to that, CNNs are also robust for classification, which eliminates the need to ignore certain subsets of data, and solely focus on the image data. The proposed model achieves an accuracy of 80% and can be expected to achieve even higher accuracy with a substantial increase in the amount of data provided for training. We have incorporated Keras library in the python environment for building proposed CNN.
Introduction
Dementia is a category of brain diseases, of which Alzheimer’s disease is the most prominent. Alz- heimer’s disease results in a decrease in mental faculties, problems with language, and a decrease in the ability to form logical thoughts. Alzheimer’s has no known cure. There are initial tests that are run to determine whether a cause for a subject presenting symptoms of Alzheimer’s is something short term or treatable. Once that is ruled out, medical imaging is taken to confirm the diagnosis. Brookmeyer et al. [1] conducted a study based on a stochastic, multistate model to determine how many individuals would suffer from Alzheimer’s in the years to come. They derived formulas for age-specific prevalence, incidence rates, disease progression and death rates with the United Nations worldwide population forecasts, along with epidemiological studies on the risks of Alzheimer’s. It has been predicted that the number of individuals suffering from this disease could grow to as much as 106.8 million by 2050. In other words, 1 in 85 persons will be living with this disease. With numbers as high as these, it is easy to see why proper detection of Alzheimer’s is crucial [1]. Almost 50 million people have suffered dementia recently based on WHO report (
Convolutional Neural Networks (CNN) are a class of artificial neural networks that are useful in visual imagery [3]. They follow feed-forward techniques. CNNs are multi-layer back propagation networks. Thus, they can learn high-dimensional non-linear mappings from large sets of samples, making them ideal for use in image recognition, segmentation and detection. CNNs handle local distortions of image data by obtaining invariance from replicating weight configurations [3].
This paper makes the following main contributions:
A generalized CNN deep learning model, which does not require any other clinical data (e.g. age, gender, etc.) has been proposed. Other works have utilized subsets of the OASIS dataset, whereas we make use of the entire dataset. The MRI scans go through minimal preprocessing, so as to preserve the integrity of the images. We do not perform any image processing on the 3D images, and only rely on 2D scans. We achieve an accuracy of 80% when our approach is tested on the entire dataset. To the best of our knowledge, there is no work in the literature which uses the entire dataset (which usually results in a drop of the accuracy).
The remaining part of the paper has been organized as follows. Section 2 describes the related work. Section 3 presents the proposed deep learning i.e CNN model. Experimental set up and experimental results have been incorporated in Sections 4 and 5 respectively. In Sections 6 and 7, comparative study and conclusion have been presented.
Numerous methods have been adopted previously for the identification of Alzheimer’s disease [1, 2]. Most studies have favoured the use of support vector machines (SVMs) for their classification. In [4], a three-dimensional discrete wavelet transformation (3D DWT) was used to extract the features. This was done by applying a 1D DWT to each dimension. Volumetric feature extraction was then performed by first extracting the energy, variance and Shannon entropy of each sub-band of each brain image, then feeding the triplet into a principal component analysis (PCA) module for dimensionality-reduction. An SVM was then used to perform the final classification. The paper [5] used independent component analysis (ICA) for image decomposition and feature attraction, along with an SVM for classification. They constructed a tree from the shape of the brain, with branches corresponding to desired classification forming the feature vectors. This too used an SVM for the classification. Zhang et al. [6] used inter-class variance (ICV), eigen-brain sets, Welch’s
a. A conventional CNN diagram labeled with all major operation; b. Example of a max pooling operation.
A very different method was adopted in [15], which used a Genetic Algorithm (GA) to explore the various feature combinations, followed by an extraction through voxel selection, using Voxel-Based Morphometry (VBM). This ensemble formed the Extreme Learning Machine (ELM) [16]. Babu and Suresh [17] argued that the sequential learning was the more efficient approach, and followed a projection-based metacognitive learning algorithm with a radial basis function network (PBL-McRBFN). Their process allowed the training samples to be used singularly at the time of their availability, and discarded shortly after. Voxel-based morphometry and deformation-based morphometry were both used to extract features in [18], followed by a lattice computing approach – comprising a meta representation with interval numbers, extraction, dimensionality reduction and k nearest neighbours (k-NN) – for the classification. The authors in [20] made use of Bayesian network analysis to identify structural interactions within the default mode network, which explored areas of general interest in the brain, as opposed to a simple use of the regions of interest (ROI). The authors in [20] applied a series of logistic regression models to the image data and included age as a covariate. It focused on the ROIs identified prior to the training. Lai et al. [21] combined the Uncorrelated Multilinear PCA (UMPCA) and Laplacian Score (LS) methods to perform feature extraction and selection of discriminatory features. They then used an SVM classifier. Authors in [22] developed an automatic unsupervised learning approach. This technique used the symmetric log-domain diffeomorphic demons algorithm, followed by the calculation of the Riemannian distance. Their final step was a quick shift method. In [23], a Linear Regression Classification (LRC) method was introduced, which used pseudo Zernike moments for feature generation. Mahmood et al. were some of the few who produced research based on the use of neural networks in this area. They [24] performed PCA on normalized images before processing them with an artificial neural network (ANN). Hence, it can be depicted that plenty amount of works have been carried out on Alzheimer’s disease using advanced neural models [25, 26, 27, 28, 29]. Most of the works carried on Alzeimer disease are based on small a data set. The limited utilization of the data set is the gap that we have tried to minimize in this work [20, 31].
During our research, we concluded that a complex CNN (multitudes of layers make a CNN complex) is not suitable for our dataset. The images are small (176 pixels
The convolutional layer is the main component of the CNN model. It consists of filter or kernels which are learning units, which are quite small in size spatially (e.g., 3
i.e., for a 9
A Max Pooling layer “reduces” the size of data for the next layer. This effectively lowers the computation time. For an N
Here this image is 8 by 8 pixels in size, and a max pooling kernel of size 2 by 2 with a stride value of 2 (the filter will move by 2 pixels each time), will reduce the image as shown on the right. Max pooling has been shown to work better than other pooling methods. However, the problem with a max pooling layer or any pooling layer in general is the loss of data. An N
Thus, in this case, Eq. (3) shows the data loss.
Thus, a data loss of 75% is inevitable. In our work, we also tried the approach proposed by Springenberg et al. [25] i.e., replacing the pooling layers with a convolutional layer with larger stride values. This, however, seemed to have little effect on the performance of our model.
A dropout layer efficiently combats overfitting in neural networks, and has been proven to be a regularization technique by Srivastava et al. [26]. In this layer, the main hyper-parameter is the rate or the probability parameter ‘p’. A neuron in the neural network, while in the training phase, is in the active state with probability ‘p’.
Dense layers are fully connected neural network layers (every neuron in a dense layer is connected to every neuron in the previous layer) that take a single dimensional input. A flattening layer is required before the first dense layer to convert the multidimensional input data into a single dimensional output.
Network architecture.
From L-R, a. Coronal view; b. Sagittal view; c. Transverse view; d. Masked transverse view.
Figure 2 depicts the network architechture we used in this study. The proposed model contains first three layers of 2-dimensional convolutional layers. For the first layer, the input dimension (first layers contains direct input of image) is 176
The Open Access Series of Imaging Studies (OASIS) is a dataset of brain images consisting of different views of MRI scans, which was released publicly in 2007 [29]. The dataset was released to facilitate study, research and analysis in the area of basic and clinical neuroscience. It contains both cross-sectional as well as longitudinal MRI images. The dataset contains subjects ranging from 18 to 96 years old. Both genders are included to maintain uniformity and medical accuracy, all of the subjects in this dataset are right handed. A single imaging session was conducted for each subject, and between 3 and 6 scans of T1-weighted MRI were taken. Apart from the basic scans of T1-weighted MRI images, the dataset also provides processed images for each subject. There is an averaged image for each subject achieved by motion correction of the co-registered average image. A Gain-Field Corrected (GFC) atlas-registered image and a masked version of that atlas-registered image is also present for each subject. Each image occupied roughly 16 MB of space. We have chosen python environment for implementing this project. As, keras is one of the popular packages available for deep learning implementation, we have adopted Keras framework. We chose the cross-sectional data as it is reported to be of more relevance and backed by a plethora of authors in this field. Primarily, it simply has more than double the data than the longitudinal MRI data – and since our algorithm gives data quantity the highest priority, we were compelled to opt for cross-sectional data. Keeping in mind the relatively small size of the dataset (we had to pick the least noisy scans in order to avoid to further reduce the dataset size), and the size of the images (a 6 MB dataset required an average of 11 minutes per epoch, 300 epochs would take 55 hours), we opted to use all 4 views – coronal (Fig. 3a), sagittal (Fig. 3b), transverse (Fig. 3c) and masked transverse (Fig. 3d) – of the T88 images present in the dataset. These GIF images were converted to PNG through lossless compression, and then converted into single channel greyscale images. This allowed us to reduce the dataset to 6 MB for each view.
Around mid-March 2018, the makers of the dataset restructured their website and the data. They classified the dataset into three categories – OASIS-1 (the cross-sectional dataset published in 2007 containing data on 416 subjects which has been the most widely used so far), OASIS-2 (the longitudinal dataset published in 2010 containing scans of 150 subjects) and OASIS-3 (the newest addition on the OASIS website of longitudinal dataset on 1098 subjects). We used the OASIS-1 dataset.
Experimental results
The configuration of our experiments include a DELL laptop with RAM size of 16 GB; in addition to that it has used intel i5-7200U CPU processor, Python 2.7 and Keras package.
Graphics generated
The proposed model was run on all three available views of the OASIS dataset, i.e., coronal, transverse and sagittal. Each model generated a model plot, a confusion matrix for both the training data and the test data, an accuracy graph, a loss graph, and a record of the various metrics were calculated.
Figure 4 is the confusion matrix for the training data
Confusion matrix for training data.
Confusion matrix for test data.
Accuracy graph.
Loss graph.
Figure 6 is the accuracy graph, which depicts the variations in accuracy of the CNN model with each epoch. As can be observed, with each epoch, the overall accuracy of the model improves. Similarly, Fig. 7 shows the loss graph, which depicts the variations in the loss function.
A record of the metrics is also calculated by the models. These metrics were calculated from the data obtained from the confusion matrices, and include loss, accuracy, misclassification rate, true positive rate, false positive rate, specificity, precision and prevalence for both training data and test data. True positive (
Apart from the metrics calculated from the confusion matrices, recall score, F1 score and Cohen Kappa score were calculated from the built-in functions. All the training data metrics are given in Table 1, while all the test data metrics are given in Table 2. Both tables compare the performances of the top 3 most accurate models built.
Training data metrics
Training data metrics
Test data metrics
Data used by other researchers
Reduced data used in this paper
Summary of other works
The best performing proposed model achieved an accuracy of 80.85%, sensitivity of 58.82% and specificity of 93.34% on the test data. Accuracy obtained by any classifier has always been a significance parameter and this has always been the major measurement to evaluate the performance of Deep CNN. In this work, a generalized CNN based deep learning model, which does not require any other clinical data (e.g. age, gender, etc.) has been proposed. It can be found in the literature that most of the authors have utilized subsets of the OASIS dataset, whereas we have incorporated the entire dataset. The MRI scans have gone through minimal pre-processing, to preserve the integrity of the images. We have not perform any image processing on the 3D images, and only rely on 2D scans. We have achieved an accuracy of 80% of classification. To the best of our knowledge, until now, none has reported any proposal that has used the entire dataset of OASIS, and perhaps that is reason that the accuracy of image classification has dropped a bit.
This paper strives to utilize the prowess of CNN and build a robust model to classify any subject, irrespective of age or gender. To this end, the entire dataset (235 controls with confirmed/valid clinical diagnosis) was utilised. Bachman et al. [24] concluded that the circularity of cross-sectional area decreases at an accelerated rate proportional to the increasing severity of the Alzheimer’s. Fotenos et al. [25] concluded that demented individuals exhibit a faster rate of whole-brain atrophy from an early stage in adulthood than individuals without Alzheimer’s. Loss of white-matter in brain was also observed in their research. There is plenty of evidence that there are biomarkers and features that characterize a brain affected by Alzheimer’s – whether visible to the naked eye or not – which a CNN should be able to identify [32]. Table 3 depicts the data used by most other researchers in their studies. However, it was observed that the clinical diagnosis – and the corresponding CDR ratings – on all controls were not present. As their CDR status could not be assumed, the corresponding images could not be used for classification purposes and were excluded from the usable dataset. The reduced feasible dataset is summarized in Table 4. This is the dataset used in this study.
There are 235 clinically confirmed and diagnosed controls. One hundred exhibit Alzheimer’s to varying degrees of severity and 135 do not have Alzheimer’s. As a CDR rating of 0 indicated the absence of AD, a label of “0” was allotted to those subjects diagnosed as free of Alzheimer’s, and “1” to those diagnosed with Alzheimer’s, whatever the severity. It is important to consider a CDR of 0.5 as a form of Alzheimer’s, supported by Morris [2], as is vital to diagnose a person with Alzheimer’s in its early stages. When diagnosed early, the progression of Alzheimer’s can be controlled and treated to a large extent. Unfortunately, once neurodegeneration begins, there isn’t really much that can be done to counter the disease. It can be observed that the dataset is, therefore, slightly biased towards the non-Alzheimer’s class. The other prominent studies in this field are compiled in Table 5. It can be noted that almost all proposed studies work with a comparatively smaller subset of the dataset. Some studies only include women, some only include controls aged above 60 years and some works have done research on various divisions rather than a unified/generalized dataset. From Table 5, it can easily be observed that [20, 7, 15, 11, 18] have interpreted CDR ratings differently and have classified controls in various divisions such as HC or NC (healthy controls or normal controls), MCI (Mild Cognitive Impairment) controls, AD controls, yHC (young healthy controls), mHC (medium-aged healthy controls) and oHC (old healthy controls). They have then applied their proposed methods on these individual divisions and reported results through a method of averaging. In [20], the best result was reported on a dataset (of size 81 – CDR
Conclusion
This paper has proposed a deep learning model that follows a convolutional neural network approach. A generalized CNN deep learning model, which does not require any other clinical data (e.g. age, gender etc.) is proposed. Through the models used, it can clearly be observed that the model is flexible and can diagnose whether a subject has Alzheimer’s fairly accurately. It is robust, as it does not discriminate between subsets of data based on clinical factors like age, gender, etc. The accuracy of the model can further greatly be improved by an increase in the amount of training data addition to it. This work can be considered as a machine learning based data driven method that can diagnose Alzheimer’s disease without any domain knowledge expertise. The future scope of this paper will be to hybridize algorithm to achieve better accuracy for imbalanced and blur images. Alongside it will also be good to test and verify the performance metrics such as precision, recall rate with the use of high end GPUs. Reducing the computation time by incorporating optimal weight vectors will also be an interesting future work that can further be exploited.
Footnotes
Acknowledgments
Data were provided [in part] by OASIS: Cross-Sectional: Principal Investigators: D. Marcus, R, Buckner, J, Csernansky J. Morris; P50 AG05681, P01 AG03 991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382.
