Abstract
Background:
Within the past decade, computer scientists have developed many methods using computer vision and machine learning techniques to detect Alzheimer’s disease (AD) in its early stages.
Objective:
However, some of these methods are unable to achieve excellent detection accuracy, and several other methods are unable to locate AD-related regions. Hence, our goal was to develop a novel AD brain detection method.
Methods:
In this study, our method was based on the three-dimensional (3D) displacement-field (DF) estimation between subjects in the healthy elder control group and AD group. The 3D-DF was treated with AD-related features. The three feature selection measures were used in the Bhattacharyya distance, Student’s t-test, and Welch’s t-test (WTT). Two non-parallel support vector machines, i.e., generalized eigenvalue proximal support vector machine and twin support vector machine (TSVM), were then used for classification. A 50 × 10-fold cross validation was implemented for statistical analysis.
Results:
The results showed that “3D-DF+WTT+TSVM” achieved the best performance, with an accuracy of 93.05 ± 2.18, a sensitivity of 92.57 ± 3.80, a specificity of 93.18 ± 3.35, and a precision of 79.51 ± 2.86. This method also exceled in 13 state-of-the-art approaches. Additionally, we were able to detect 17 regions related to AD by using the pure computer-vision technique. These regions include sub-gyral, inferior parietal lobule, precuneus, angular gyrus, lingual gyrus, supramarginal gyrus, postcentral gyrus, third ventricle, superior parietal lobule, thalamus, middle temporal gyrus, precentral gyrus, superior temporal gyrus, superior occipital gyrus, cingulate gyrus, culmen, and insula. These regions were reported in recent publications.
Conclusions:
The 3D-DF is effective in AD subject and related region detection.
Keywords
ABBREVIATIONS
(kernel) (generalized eigenvalue problem) (non-parallel) (twin) support vector machine
(non-) rigid registration
Alzheimer’s disease
Brodmann area
Bhattacharyya distance
brain region cluster
computer-aided diagnosis
Clinical Dementia Rating
coronal index
cross validation
displacement field
discrete wavelet transform
eigenbrain
geodesic anisotropy
healthy control
information gain
modulated grey matter
magnetic resonance (imaging)
principal component analysis
Pearson’s correlation
positron emission tomography
radial basis function
region-of-interest
single-photon emission computed tomography
Student’s t-test
singular value decomposition
trace of Jacobian matrix
undersampling
voting feature intervals
Welch’s t-test
INTRODUCTION
Alzheimer’s disease (AD) is an abnormal disease involved in aging [1]. AD is in a special category of senile dementia, which is part of short-term and long-term memory, thinking, and behavior [2]. Research on AD has attracted scholars from all over the world because of its importance and effect on society [3]. Throughout the progression of AD, symptoms of AD may become more severe [4, 5]. Currently there is no cure or treatment for AD. Studies showed that in 2006, more than twenty million people in the world suffered from this disease [6]. It is predicted that in 30 years AD will influence 1 in every 85 people worldwide, and more than 40% of prevalent cases will need high level of care [7].
The number of people aging through out the world has increased, which has led AD to cause heavier burdens on families and society than before [8, 9]. In China, AD accounts for more than half of senile dementia, which costs a total economic loss of more than eighty billion Yuan every year, and it is responsible for nearly sixty billion Yuan expended in healthcare costs every year [10, 11]. In the United States, the healthcare costs on AD are about 100 billion dollars per year and it is predicted to cost $1 trillion per year by 2050 [12].
Currently, it is essential to develop accurate and early detection methods for AD subjects, because it contributes to providing treatment to control the deterioration of AD [13]. A three-dimensional (3D) scan of the whole brain is more acceptable and affordable due to recent advances in neuroimaging technology [14, 15], especially due to the help of the most popular imaging technique, magnetic resonance imaging (MRI). With the high-resolution of magnetic resonance (MR) images, the diagnostic accuracy of AD has been greatly enhanced. MR images already play a significant role in detecting AD from healthy elder controls (HC).
In recent studies, a variety of systems were used for AD detection (more references will be introduced below). Most of these systems contained three stages: (1) Feature extraction, to extract efficient features that can distinguish pathological brain from healthy brains; (2) Feature selection, to select important features form original ones, which can be skipped if the size of features is of a reasonable number; and (3) Classification, to construct a classifier using the extracted (and reduced) features.
In this study, we proposed employing displacement field (DF) to track the morphometry of both healthy brains and AD brains. Our past work used 2D-DF and was able to prove its effectiveness [16]. In this current study, we extended 2D-DF to 3D-DF and expected the latter to achieve better performance. Then we employed three different criterions for feature selection. The criterion included the Bhattacharyya distance (BD), the Student’s t-test (STT), and the Welch’s T-Test (WTT). For classification, we did not choose the popular support vector machine (SVM) since its parallel hyperplane restrains its performance. We used two non-parallel SVM, which dropped the parallelism to standard SVM in order to further augment the classification performance.
State-of-the-art
In the past, most diagnostic work was carried out by measuring a region-of-interest (ROI) in brain MR images, because several typical AD-related regions and corresponding shape deformation were known [17, 18], such as the enlarged ventricles, shrinkage of the hippocampus, and shrinkage of the cortex [19]. However, the ROI-based methods suffered from some shortcomings: (i) The ROI methods needed a priori information and expert knowledge. (ii) The detection accuracy relied on an interpreters’ knowledge [20]. (iii) It is necessary to explore other potential regions that may be connected to AD [21]. (iv) Automatic segmentation of ROI is not feasible in practice, and examiners needed to segment the brain manually [22].
Recently, a new type of method, the “whole-brain analysis”, has gained popularity since it considers all voxels in the brain as a whole. It does not need to segment the brain beforehand, and it does not need any biomarker for the classification task. The main disadvantage is dimensionality, which can be solved through high-speed computers, which are relatively inexpensive [23]. The whole-brain analysis heavily relies on pure computation, and it can only be done by computer scientists after physicians help label the data as either AD or healthy. Generally, the whole-brain analysis labels the whole brain as a ROI, and it consists of two stages: feature extraction and classification. We reviewed and analyzed more than 10 studies in detail.
Scholars have presented various methods to extract efficient features for AD and other types of pathological brain detection 1 . In addition, various classification models and methods exist; nevertheless, not all of them are appropriate for processing MR images. Plant et al. [24] employed brain region cluster (BRC), and suggested the use of information gain (IG) to evaluate the interestingness of a voxel. They used Bayes statistics, SVM, and voting feature intervals (VFI) to classify patterns. Park [25] employed manifold learning for classification. The 1st and 2nd distance measures yielded an 18% and 46% error rate in classifying AD and normal patients respectively. Chaves et al. [26] utilized large margin-based methodology for AD detection in single-photon emission computed tomography (SPECT) and positron emission tomography (PET) images. Their system yielded accuracy, sensitivity, and specificity values of 90.67% , 88% , and 93.33% (for PET) and 92.78% , 91.07% , and 95.12% (for SPECT), respectively. Saritha et al. [27] was the first to use wavelet-entropy in pathological brain detection. They used spider-web plots to reduce the size of feature space. They also used the probabilistic neural network for classification. Zhang et al. [28] suggested that removing the spider-web-plot resulted in the same classification performance. Savio and Grana [29] presented a novel technique that employed the deformation-based morphometry method. They tested five features, and found three features performed excellently as trace of Jacobian matrix (TJM), modulated grey matter (MGM), and geodesic anisotropy (GEODAN). Furthermore, they utilized Pearson’s correlation (PEC), WTT, and BD, to measure the significance of the voxel site. Kalbkhani et al. [30] modeled the detail coefficients of 2-level discrete wavelet transform (DWT) by generalizing the autoregressive conditional heteroscedasticity statistical model, and the parameters of this model were considered as the primary feature vector. They tested the k-nearest neighbors and SVM models. Wang et al. [31] proposed the utilization of the undersampling (US) technique on a three-dimensional image. They used singular value decomposition (SVD) principal component analysis (PCA) to select features. Finally, they combined the kernel SVM (KSVM) with the decision tree (KSVM-DT). Zhou et al. [32] followed Saritha’s method, and again employed wavelet-entropy for feature extraction. Naive Bayes classifier was utilized for abnormal brain detection. Harikumar and Kumar [33] analyzed the performance of artificial neural network in terms of classification of medical images by using wavelets as a feature extractor. The classification accuracy it achieved was 96% . Zhang et al. [34] employed discrete wavelet packet transform. They utilized Tsallis entropy to acquire wavelet packet entropy features. They introduced a generalized eigenvalue proximal support vector machine (GEPSVM). Nazir et al. [35] suggested using filters to remove noises, and extracted color moments as mean features. Through this method, they achieved an overall accuracy of 91.8% . Zhang et al. [36] employed the eigenbrain (EB) to extract features, and then they used WTT to reduce the features. They proposed the use of SVM with radial basis function (RBF). Damodharan and Raghavan [37] combined tissue segmentation and neural network for brain tumor detection. Zhang et al. [38] proposed a novel classification system that implemented 3D-DWT to extract wavelet coefficients from the volumetric image. Zhang et al. [39] proposed combining stationary wavelet transform with GEPSVM. Farzan et al. [40] used longitudinal percentage of brain volume changes in a two-year follow up and its intermediate counterparts in early 6-month and late 18-month as features. Their experiment results showed SVM with RBF performed the best with an accuracy of 91.7% , which was higher than the accuracy of the K-means of 83.3% , the accuracy of the FCM of 83.3% , and the accuracy of the linear SVM of 90% . Munteanu et al. [41] employed proton magnetic resonance spectroscopy data, with the goal of detecting mild cognitive impairment and AD. They utilized a single-layer perceptron with only two spectroscopic voxel volumes obtained in the left hippocampus, using an AUROC value of 0.866. Savio and Grana [42] utilized local activity features, such as regional homogeneity in order to develop a computer-aided diagnosis (CAD) of schizophrenia on the resting-state functional MRI (fMRI). Zhang et al. [43] combined wavelet entropy with Hu moment invariants. The total of the feature number was 14. They also used GEPSVM as the classifier.
Based on the latest literature, we found: (1) The DWT based features were efficient. However, we presented a novel feature of the DF, which was the first to be used in MR images. (2) SVMs were commonly used and compared with conventional decision tree, artificial neural network [44, 45], and other classifiers [46]. Hence, we continued to use SVM. Two variants of SVM were introduced in this study; GEPSVM and twin SVM (TSVM), with the aim of further augmenting the classification performance.
MATERIALS AND METHODS
Dataset
The open dataset was downloaded from Open Access Series of Imaging Studies [47], which consisted of 416 subjects from the ages of eighteen to ninety-six. All subjects were right-handed. We selected 126 samples (28 ADs and 98 HCs) from the dataset. The demographic statuses are reported in Table 1. Following common convention, the Clinical Dementia Rating was interpreted as the target (label). Subjects with missing records or who were under the age of 60 were removed.
Co-registration and brain-masking
All the 3D MR brain images were motion-corrected and co-registered to generate an averaged image, so the signal-to-noise ratio could be increased. Afterwards, the MR images were spatially co-registered to the Talairach space and were then brain-masked. Figure 1 offers an example of the preprocessing of the 3D images with a resolution of 1 mm × 1 mm × 1.25 mm. Its motion-correction procedure registered the images of three scans, and then generated an image in the original spatial coordinates with resampling to 1 mm × 1 mm × 1 mm. The averaged image was registered to the Talairach coordinates with the brain extracted (Fig. 1).
Displacement field
The shape registration contained both rigid registration (RR) and non-rigid registration (NRR). The RR estimated the rigid parameters, which was accomplished above in “Co-registration and brain-masking”. The NRR extracted either 2D- or 3D-DF of the moving image on the basis of a given reference image. Figure 2 shows the flowchart of the DF calculation.
The estimation task usually set a healthy brain as a moving image (I M ) and a brain afflicted with AD as a reference image (I R ). The RR is an essential preprocessing procedure before NRR. It can relieve the deformation originated from the movement of subjects, so the subsequent NRR can reflect the realistic deformation stemming from brain diseases.
Several types of methods are available to estimate NRR, such as spline function based methods, phase-correlation methods, fluid methods, optical-flow methods, and elastic methods. Among these methods, the first type is parametric, which needs to solve optimal spline-based function parameters [48]. The second type needs a mass of computation resources, and it is difficult to determine its local search range. It is also impossible to guarantee finding global optimal points [49]. The other three methods are non-parametric. The DF was found by solving a predefined physical model directly using the partial differential equation [50]. The methods mentioned above are time-consuming and vulnerable to noises.
A novel approach emerged as a level-set method [51–53], which was developed based on the level-set evolution theory. The
I
M
morphs iteratively along its gradient
direction until it is close to the I
R
., with
DF written as
2D-displacement field
Figure 3 gives an example of the DF between a glioma brain and a healthy brain. Figure 3a offers I M while Fig. 3b offers I R Fig. 3c and d provide the result of RR and NRR, respectively. The DF and its local region are shown in Fig. 3e and f, respectively. The main shortcoming of the 2D-DF is that it can estimate the in-slice DF, but it cannot estimate the out-of-slice DF. To extend its ability, 3D-DF was used in this study.
3D-displacement field
The 3D-DF is illustrated in Fig. 4. The arrows in the figure represent the DF. The length of the arrow represents the magnitude of the DF, and the direction of the arrow represents the heading along where the DF moves. It is difficult to depict a 3D-DF within an image, so the common solution is to depict the slices and the DFs on the corresponding slices in sequence.
The DF features consisted of two folds: (1) the directions of the DF and (2) the magnitude of the DF. We did not use the sign of the displacement vector, which were commonly used in the literatures [54, 55]. This was because the direction field already contained the information of the displacement’s sign.
Before 3D-DF, each pixel in the 3D image can be treated as a feature, so there were 176*208*176 = 6,443,008 features in total. After 3D-DF, the obtained 3D-DF estimation contained the same number of features. However, most values in 3D-DF were zero or near-zero, so the 3D-DF was sparse. We needed to develop feature selection approaches to select the most important sites (having the longest displacement between AD and HC) in 3D-DF.
Feature selection
The DFs were calculated on each voxel site and they contained too many features; hence, we needed to reduce the number of features. Two categories exist to reduce features: (i) feature selection and (ii) featureextraction. The former selects a subset of original features without any transformation, while the latter transforms original features into a lower dimension space. Their differences are depicted in Fig. 5. Here O represents the number of original features, and R represents the number of reduced features (R≪O).
In this study, feature selection was used for two reasons: (1) feature extraction destroys the physical meaning of features; (2) we needed to detect the AD-related region (see below: “AD-related region detection”), which can be obtained directly from the results of feature selection.
Schools of feature selection
Currently, there are two main schools of feature selection, the filter methods and the wrapper methods (see Fig. 6). Their differences stem from how the selection algorithm and the model building are combined.
Wrappers use a predictive model to score the feature subsets. Each subset was used to train the model, and the score was given based on the error rate of the trained model. Wrappers needed to re-train the model every time a new subset was introduced, so they were computationally intensive with the best performance of feature selection.
Filters use a measure (instead of error-rate or its variants) to score the feature subsets. This kind of measure can be computed fast regardless of the model. Common features include the inter/intra distance, Person’s correlation (PC), mutual information, and information-theoretic measures. Filters perform the fastest, but the feature subset they produce is not tuned to the classification model. Note that filters usually provide the ranking of features rather than explicit optimal feature subset, so the cut-off point is obtained by cross validation (CV) technique. Filters sometimes are used as a preprocessing for wrappers.
In this study, the image of each subject was 208 × 176 × 176, and there were three components for the DF at each voxel. In total over 19 million features were formed. Wrapper methods were infeasible to handle in such a large feature set; therefore, we used the filters methods to process them.
Bhattacharyya distance
BD measures the relative closeness of two discrete (or continuous) probability distributions. Compared to Mahalanobis distance, the BD was more reliable since the Mahalanobis distance is a particular case of BD in which the standard deviations of the two classes are equal. Suppose the data of two classes are under normal distribution, the BD is formed as
Student’s t-test
The STT was regarded as one of the filters in this work, since it measures the difference
degree of features over two classes. The STT is the most popular method that assumes
“equal means” and “equal variances” of the two data
sets [56]. Due to the unequal sample sizes, the
STT is computed by
Welch’s t-test
This “equal variances” did not make sense and was discarded; while the
“equal means” was necessary. We used WTT, which is an adaption of the
STT. WTT only checks whether the two populations have equal means [36]. The WTT is computed by
Cut-off point
The cut-off threshold was set to 0.0002% by trial-and-error method, which meant we selected only 0.0002% of the total original features. In this task, we selected 39 features for each subject. If the cut-off threshold was too small, then the selected features were insufficient in providing distinguishing information. On the other hand, if the cut-off threshold was set too large, then the redundancy would impair the following classification procedure. The implementation of the feature selection is depicted in Table 2.
Non-parallel support vector machine
The family of support vector machine (SVM) has gained popularity in the small-size problem [57]. To improve its classification performance, scholars tend to discard the parallelism restrain of hyperplanes, and propose the non-parallel support vector machine (NPSVM). Figure 7a shows that parallel hyperplanes work the best for two classes whose points have the same margin distances. However, for most cases such as Fig. 7b, NPSVM will perform better than SVM.
In this work, two NPSVMs were introduced. The first NPSVM was GEPSVM, proposed by Mangasarian and Wild [58]. GEPSVM dropped the parallelism restrain on the two planes, which was necessary in the original SVM, and it required each hyperplane to be as close as possible to one of the data sets and as far as possible from the other.
The second NPSVM was the twin support vector machine (TSVM), which was proposed by Jayadeva et al. [59]. The TSVM is similar to GEPSVM in the way that both obtain non-parallel hyperplanes. Their difference lies in that GEPSVM and TSVM are formulated entirely different. Each quadratic programming problem in TSVM pair was formulated as a typical SVM. Reports showed that TSVM was better than both SVM and GEPSVM [60–62].
AD-related region detection
We proposed a visual interpretation approach based on the magnitude of 3D-DF in order to
detect the AD-related brain regions R.
A smaller threshold T may introduce more noises in the estimated DF; whereas, a larger threshold T will drop realistic deformation with short deformation magnitude. Hence, we believed the deformation with a magnitude larger than 5 would represent realistic deformity within the brain.
Statistical analysis
Generalization error was obtained by K-fold CV, and K was set to 10 because of two reasons: (1) to create a balance between computational cost and reliableestimates, and (2) to provide fair comparison since the common convention was to set K to the value of 10 [63].
For a 10-fold CV, the MR image dataset was divided randomly into ten mutually exclusively folds of nearly equal size and nearly the same distribution. In each run, 9 subsets were used for training, and the rest were used for validation (see Fig. 8). The procedure above repeated 10 runs, in which every subset was utilized for validation once. The 10 results over the validation set were combined together along with the diagonal blocks in Fig. 8, with the aim of producing an individual out-of-sample evaluation. The 10-fold CV was repeated 50 times, viz., a 50 × 10-fold CV was implemented.
Evaluation
We used four indicators to measure which algorithm performed the best. These four indicators consisted of sensitivity (recall), specificity, accuracy, and precision (Table 4). In this work, a correctly detected AD brain was treated as true positive. After the 50 × 10-fold CV, the final evaluation results were written in the form of “mean ± standard deviation”.
Implementation of proposed system
Our system contained two goals. First, we needed to develop a CAD of AD brain detection system and be able to report its performance. Second, we needed to locate the AD-related voxels. The pseudocode is listed in Table 5.
RESULTS
We developed the programs by using Matlab 2015a, and ran them on the IBM laptop with 3GHz Intel i3 dual-processor and 8GB random access memory.
3D-DF result
Figure 9 shows the 3D-DF estimation of an AD brain. In this figure, the green arrows represents a backward DF, while the red arrows represent a forward DF. The slices were rendered by 50% transparency to show the DF arrows.
Feature selection and classifier comparison
In the second experiment, we compared the three feature selection approaches (BD, STT, and WTT) and the two classifiers (GEPSVM and TSVM). The results are shown in Table 6.
Comparison to state-of-the-art
The best proposed classifier of “3D-DF + WTT +TSVM” was compared with 13 state-of-the-art approaches. The results from the comparison are shown in Table 7.
Region detection and labeling
We implemented the AD-related region detection (see Materials and Methods). Figure 10 shows the related regions, which are labeled by green points. In 2D-DF, some areas were slightly outside of the brain, because the algorithm can also be considered a distortion of the background. In 3D-DF, it was shown that the background distortion problem was fixed.
Talairach Daemon software was downloaded from http://www.talairach.org/, and was utilized to provide accurate anatomical labels. Table 8 shows the 17 related regions detected by our method. In the table, the voxel numbers of the related regions are calculated by counting all connected components within the same brain regions while ignoring very small regions. Note that some areas were not included (such as Culmen of vermis) because they moved due to the expansion and shrinkage of neighboring areas. We only utilized the areas that showed a change in volume.
Time analysis
In the final experiment, we calculated the computation time of each step. The results are shown in Table 9. The data import cost 0.11 s and the co-registration cost 16.47 s. The 3D-DF estimation cost the longest, with a time of 114.02 s (nearly 2 min). The WTT cost 26.76 s. The training and test of TSVM cost 12.63 s and 0.23 s, respectively. Detecting AD-related regions cost 2.88 s.
DISCUSSION
Figure 9 suggests that the 3D-DF was able to accurately reveal the atrophy of AD. We can see clearly how a healthy brain shrunk to an AD brain. Our method can be treated as one of the deformation-based morphometry approaches.
The comparison results in Table 6 shows that the proposed “3D-DF+WTT+TSVM” performed the best among all six proposed approaches in terms of accuracy, sensitivity, and precision. The proposed “3D-DF+BD+GEPSVM” performed the best in terms of specificity. Considering that the sensitivity is more important than specificity (to detect diseased brains is more important than to detect healthy brains), we concluded that 3D-DF+WTT+TSVM is the best of all proposed approaches, with accuracy of 93.05 ± 2.18, sensitivity of 92.57 ± 3.80, specificity of 93.18 ± 3.35, and precision of 79.51 ± 2.86. Next we compared it with recent approaches.
There are two reasons that explain why TSVM performed better. First, the provided more flexible and complicated hyperplanes than that of the standard SVM. Second, the TSVM formulated each of the two quadratic programming problems as a standard SVM, which made it superior to the GEPSVM.
The results in Table 7 compare the proposed “3D-DF+WTT+TSVM” with 13 other methods. Results in Savio et al. (Table 5 in [29]) and Dong et al. (Table 9 in [36]) gave the means with standard deviation. Results of Plant et al. (Task 1 in Table 3 in [24]) presented the means together with 95% confidence intervals. Results in Wang et al. (Table 7 in [31]) were obtained through a single K-fold CV analysis. Results in Zhang et al. (Table 5 in [16]) were yielded by 50 × 10-fold CV.
In terms of average accuracy, the proposed “3D-DF + WTT + TSVM” result was as large as 93.05% ± 2.18%, which was better than 13 approaches of AD prediction: MGM + PEC + SVM of 92.07% [29], GEODAN+BD+SVM of 92.09% [29], TJM + WTT + SVM of 92.83% [29], BRC + IG + SVM of 90.00% [24], BRC + IG + Bayes of 92.00% [24], BRC + IG + VFI of 78.00% [24], US + SVD-PCA + SVM-DT of 90% [31], EB + WTT + SVM of 91.47% [36], EB + WTT + RBF-KSVM of 86.71% [36], EB + WTT + POL-KSVM of 92.36% [36], DF + PCA + SVM of 88.27% [16], DF + PCA + GEPSVM of 91.52% [16], and DF + PCA + TSVM of 92.75% [16].
There were many other methods [64–68] proposed for detecting AD from HC; however, they dealt with images produced by other types of modalities: PET, SPECT, diffusion tensor imaging, etc. Hence, it was inappropriate to compare the proposed methods with them. In the future, we will test our methods on SPECT and PET images.
Table 8 shows that the DF finds the discriminant associated with the following regions reported in the latest references: sub-gyral [69], inferior parietal lobule [70], precuneus [71], angular gyrus [72], lingual gyrus [73], supramarginal gyrus [74], postcentral gyrus [75], third ventricle [76], superior parietal lobule [77], thalamus [78], middle temporal gyrus [79], precentral gyrus [80], superior temporal gyrus [81], superior occipital gyrus [82], cingulate gyrus [83], culmen [84], and insula [85].
Notwithstanding, some regions reported to be associated with AD were not interpreted by 3D-DF. Those areas contained anterior cingulate [86], caudate nucleus [87], claustrum [88], cuneus [89], fusiform gyrus [90], inferior frontal gyrus [91], inferior occipital gyrus [92], inferior semi-lunar lobule [84], inferior temporal gyrus [93], lateral ventricle [94], middle frontal gyrus [95], middle occipital gyrus [96], lentiform nucleus[36], medial frontal gyrus [97], middle occipital gyrus [98], paracentral lobule [99], parahippocampal gyrus [100], posterior cingulate [101], subcallosal gyrus [36], subthalamic nucleus [102], and uncus [103]. The reasons were two-fold: (1) We only preserved the 3D-DF with magnitude longer than the threshold T of 5. Reducing T may cover more AD-related regions, but the noises will introduce artifacts. (2) In this work, we only used structural image. Other references used different imaging modalities, such as diffusion tensor imaging, magnetic resonance spectroscopic imaging, and fMRI for metabolism detection. In future studies, we will try to include those advanced modalities.
CONCLUSIONS
The contributions of this study are: (i) We proposed the use of 3D-DF in detecting AD, and proved its effectiveness. (ii) Two non-parallel SVMs were tested, and we proved TSVM performed better than standard SVM and GEPSVM. (iii) The proposed CAD achieved higher accuracy that 13 state-of-the-art approaches. (iv) Our proposed CAD was able to locate 17 AD-related regions, which were reported in other recent literatures.
Future research will focus on many aspects. First the spectroscopy method, to extract the quantity of docosahexaenoic acid [104], which will possibly help increase classification accuracy. Second, the proposed approach may be helpful in identifying brain mechanisms underlying an endogenously defensive mechanism to neuroinjury and neurodegeneration [105]. Third, the swarm-intelligence algorithm will be introduced to help enhance the algorithm performance [106]. Fourth, other advanced classifiers, such as fuzzy SVM [107], will be tested.
Footnotes
ACKNOWLEDGMENTS
This paper was supported by NSFC (61273243, 51407095), Natural Science Foundation of Jiangsu Province (BK20150982, BK20150983), Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), Jiangsu Key Laboratory of 3D Printing Equipment and Manufacturing (BM2013006), Key Supporting Science and Technology Program (Industry) of Jiangsu Province (BE2012201, BE2013012-2, BE2014009-3), Program of Natural Science Research of Jiangsu Higher Education Institutions (13KJB460011, 14KJB520021), Special Funds for Scientific and Technological Achievement Transformation Project in Jiangsu Province (BA2013058), Nanjing Normal University Research Foundation for Talented Scholars (2013119XGQ0061, 2014119XGQ0080), Education Reform Project in NJNU (18122000090615) Open Fund of Guangxi Key Laboratory of Manufacturing System & Advanced Manufacturing Technology (15-140-30-008K).
The authors acknowledge their gratitude to the Open Access Series of Imaging Studies dataset that came from NIH grants P50AG05681, P01 AG03991, R01 AG021910, P50 MH071616, U24 RR021382, and R01 MH56584.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
1
Some abbreviations are modified to avoid conflict within this paper.
