Multivariate Approach for Alzheimer’s Disease Detection Using Stationary Wavelet Entropy and Predator-Prey Particle Swarm Optimization

Abstract

Background:

The number of patients with Alzheimer’s disease is increasing rapidly every year. Scholars often use computer vision and machine learning methods to develop an automatic diagnosis system.

Objective:

In this study, we developed a novel machine learning system that can make diagnoses automatically from brain magnetic resonance images.

Methods:

First, the brain imaging was processed, including skull stripping and spatial normalization. Second, one axial slice was selected from the volumetric image, and stationary wavelet entropy (SWE) was done to extract the texture features. Third, a single-hidden-layer neural network was used as the classifier. Finally, a predator-prey particle swarm optimization was proposed to train the weights and biases of the classifier.

Results:

Our method used 4-level decomposition and yielded 13 SWE features. The classification yielded an overall accuracy of 92.73±1.03%, a sensitivity of 92.69±1.29%, and a specificity of 92.78±1.51%. The area under the curve is 0.95±0.02. Additionally, this method only cost 0.88 s to identify a subject in online stage, after its volumetric image is preprocessed.

Conclusion:

In terms of classification performance, our method performs better than 10 state-of-the-art approaches and the performance of human observers. Therefore, this proposed method is effective in the detection of Alzheimer’s disease.

Keywords

Alzheimer’s disease detection particle swarm optimization predator-prey model single-hidden-layer neural network stationary wavelet entropy

INTRODUCTION

Alzheimer’s disease (AD) is an irreversible and progressive brain disorder [1 –3]. It is currently ranked as the 6th leading cause of death in the United States. This disease is named after Dr. Alois Alzheimer, who noticed the plaques, the tangles, and the loss of neural connections in the brain of a dead woman. In the preclinical stage, the patients are symptom-free, but abnormal protein deposits form tau tangles [4] and amyloid plaques [5] throughout the brain. This leads healthy neurons to eventually stop functioning, and the brain begins to shrink. In the final stage of the disease, the whole brain shrinks dramatically and is in dysfunction. Various research teams have proposed many detection methods based on magnetic resonance imaging (MRI). Plant et al. [6] employed brain region cluster (BRC). They tested voting feature interval (VFI) and support vector machine (SVM). Zhang et al. [7] combined kernel support vector machine (KSVM) and decision tree (DT). Zhang et al. [8] proposed eigenbrain (EB). They used a radial basis function support vector machine (RBF-SVM). Phillips [9] expanded the EB to three-dimensional (3D-EB), and utilized the RBF-SVM method. Savio and Grana [10] used geodesic anisotropy (GDA) and Bhattacharyya distance (BD) as features. They employed SVM as the classifier. Wang [11] used multilayer perceptron (MLP). They introduced a novel algorithm— biogeography-based optimization (BBO) to train the MLP. Zhang [12] proposed a new displacement field (DIF) feature. They tested both SVM and generalized eigenvalue proximal SVM (GEPSVM) methods. Gray et al. [13] utilized voxel-based morphometry (VBM) and random forest (RF). Du [14] presented a method that combined pseudo Zernike moment (PZM) and linear regression classifier (LRC).

These methods that were described achieved satisfying detection results. However, these methods also have several problems: 1) Their features did not capture the texture information of brain tissues. 2) Their classifiers are not stable, i.e., the classification performance may fluctuate among different runs.

To solve the first problem, we consider using discrete wavelet transform. The wavelet is a wave-like tool. This mathematical tool has been widely used in signal processing fields, such as fingerprint recognition [15], facial recognition [16], dendrite spine detection [17], emotion recognition [18], fruit classification [19], and tea classification [20]. Wavelet has also been used in detecting disease in the brain, such as Parkinson’s disease [21], sensorineural hearing loss [22, 23], and neuromuscular disease [24].

The wavelet feature is a translational variant. Due to this property, the extracted features may change even if the brain image is translated one or two pixels. Hence, we used stationary wavelet transform (SWT) in this study. Additionally, entropy was combined with the coefficients of SWT in order to reduce the size of the features.

To solve the second problem, we introduced a bioinspired algorithm— particle swarm optimization (PSO) [25]. PSO iteratively improves candidate solutions by mimicking the bird flock and fish school. The stability [26] and convergence [27] performances of PSO are proved to be superior to traditional gradient-descent algorithms. PSO is currently a hot topic in the field of computational mathematics and has been applied in crop classification [28], circuit design [29], path planning [30, 31], job scheduling [32], spam detection [33], protein-ligand docking [34], and structuring element decomposition [35].

The predator-prey (PP) model mimics the behavior of sardines and killer whales. Adding PP model into PSO can increase its optimization performance. Hence, the PP-PSO is expected to give better stability than solely using PSO does.

Our contribution in this study involves three points: 1) We proposed the stationary wavelet entropy to extract the texture information of brain images. 2) We proposed the use of PP-PSO to increase algorithm stability. 3) In terms of accuracy, our proposed system is better than 10 state-of-the-art approaches.

MATERIALS AND METHODS

Materials and subjects

The brain imaging data are from two sources:

One source is downloaded from “Open Access Series of Imaging Studies (OASIS)” [36]. We selected 126 subjects, removing those with missing records. The 126 subjects contain 28 AD patients and 98 healthy control (HC) subjects. Their demographic status is provided in reference [37].

The other source is from local hospitals (Affiliated Nanjing Brain Hospital of Nanjing Medical University, Children’s Hospital of Nanjing Medical University, and Zhong-Da Hospital of Southeast University). We enrolled 70 AD subjects from community advertisements. The exclusion criteria for all participants were known neurological or psychiatric diseases, brain lesions such as tumors or strokes, taking psychotropic medications, and contraindications to MR imaging. This study was approved by the Ethics Committee of those hospitals. A signed informed consent form was obtained from every subject prior to entering this study. Scanning was implemented by a Siemens Verio Tim 3.0T MR scanner (Siemens Medical Solutions, Erlangen, Germany). All subjects lie as still as possible with eyes closed and not to fall asleep. In total 176 sagittal slices covering the whole brain were acquired, using an MP-RAGE sequence. The imaging parameters were: TE = 2.48 ms, TR = 1900 ms, TI = 900 ms, FA = 9°, FOV = 256 mm×256 mm, matrix = 256×256, slice thickness = 1 mm.

All the images from the online OASIS dataset and local hospitals were combined together with their demographics and can be seen in Table 1. After combining both data sets there was 98 AD subjects and 98 HCs. In total, a 196-image dataset was created.

Table 1

Demographics of two sources

Characteristic	OASIS		Local Hospitals
	AD	HC	AD
Subject #	28	98	70
Age	77.75±6.99	75.91±8.98	76.34±7.81
Gender (M/F)	9/19	26/72	24/46
Education	2.57±1.31	3.26±1.31	2.63±1.42
Socioeconomic status	2.87±1.29	2.51±1.09	2.89±1.16
MMSE	21.67±3.75	28.95±1.20	21.12±4.62
CDR	1	0	1

AD, Alzheimer’s disease; HC, heathy controls; M, male; F, female; MMSE, Mini-Mental State Exam; CDR, Clinical Dementia Rating.

All the images were preprocessed via FMRIB Software Library (FSL) v5.0. The brain extraction tool was utilized to extract brain areas. FLIRT and FNIRT were used for spatial normalization. Smoothing was implemented by a Gaussian kernel.

Our proposed system consisted of three main steps: 1) Preprocessing, which included brain extraction, spatial normalization, smoothing, slice selection, and histogram stretch; 2) Stationary wavelet entropy as the feature extraction; 3) Single-hidden-layer neural network as the classifier; 4) The predator-prey particle swarm optimization as the training algorithm. The pipeline of our method is illustratedin Fig. 1.

Fig. 1

Pipeline of our method.

Slice selection

Only the most distinguishing slice along the axial direction for each 3D brain was selected. The selection criterion was to include the hippocampus, which is believed to shrink in AD patients. The slice was along axial direction with Z = –22 mm in MNI space. Figure 2 shows three slices of a HC subject from OASIS, an AD subject from OASIS, and an AD subject from local hospitals, respectively. Two points were observed: 1) The AD subjects had a smaller hippocampus compared to the HC subjects; 2) The image obtained in local hospitals were slightly darker than those from the OASIS dataset.

Fig. 2

Illustration of samples. HC, healthy control; AD, Alzheimer’s disease.

Histogram stretching

In addition, since there were two different image sources histogram stretching was used. Histogram stretching (HS) method was used to fulfill two objectives: 1) To increase the dynamics of all brain images; and 2) To remove the effect of having different image sources. The HS transformed original image c to a new image d as: $d (i, j) = \frac{c (i, j) - c_{min}}{c_{max} - c_{min}}$ (1) where c_min and c_max represent the minimum (0%) and maximum (100%) intensity values of the original image, respectively. In a practical scenerio, 5% and 95% was used instead of 0% and 100%, respectively.

Stationary wavelet entropy (SWE)

SWE is a novel feature extraction. It combines the stationary wavelet transform and information entropy [38]. In recent years, SWE has been reported to give excellent results in gene expression [39], multiple sclerosis detection [40], hearing loss detection [41], etc.

There are three reasonable assumptions of why SWE was chosen to extract the features from the brain slices in this study. First, the textures of brain tissues are similar to those of a fingerprint image, and the wavelet based fingerprint identification has achieved a remarkable success in both academic and commercial fields [42]. Second, the structure of the brain corresponds to gray level fluctuation in brain images, and wavelet transform is known to capture the rapid signal change (high-frequency change) in an efficient way. Third, the hippocampus shrinking rearranged the brain image gray-levels, which eventually changes the order/disorder degree, measured by entropy [43].

The input image I was used to perform a one-level SWT decomposition via a low-pass filter (l) and a high-pass filter (h), which generated four subbands: LL₁, LH₁, HL₁, and HH₁. ${LL}_{1} = {(I)}_{r}^{*} {(l)}_{c}^{*} (l)$ (2) ${LH}_{1} = {(I)}_{r}^{*} {(l)}_{c}^{*} (h)$ (3) ${HL}_{1} = {(I)}_{r}^{*} {(h)}_{c}^{*} (l)$ (4) ${HH}_{1} = {(I)}_{r}^{*} {(h)}_{c}^{*} (h)$ (5)

From above * _r represents row-wise (i.e., horizontal) filtering and * _c represents column-wise (i.e., vertical) filtering. The LL₁ subband was sent to perform another one-level SWT decomposition, which generated four new subbands: LL₂, LH₂, HL₂, and HH₂. ${LL}_{2} = {({LL}_{1})}_{r}^{*} {(l)}_{c}^{*} (l)$ (6) ${LH}_{2} = {({LL}_{1})}_{r}^{*} {(l)}_{c}^{*} (h)$ (7) ${HL}_{2} = {({LL}_{1})}_{r}^{*} {(h)}_{c}^{*} (l)$ (8) ${HH}_{2} = {({LL}_{1})}_{r}^{*} {(h)}_{c}^{*} (h)$ (9)

The decomposition iterated until desired level, as shown in Fig. 3.

Fig. 3

Two-dimensional stationary wavelet transform (j is an arbitrary integer).

For a k-level decomposition, the SWT generated (3k+1) subbands. For each subband X, it was regarded as a random variable with possible values of {X₁, X₂, … X_n }. The probability mass function, P_i, was estimated by an image histogram algorithm. Finally, the information entropy E(X) was implemented as $E (X) = - \sum_{i = 1}^{n} P_{i} log P_{i}$ (10)

Thus, the SWE outputs a (3k+1) feature was set. Compared to traditional wavelet entropy (WE) [44, 45] and biorthogonal wavelet entropy [18], the SWE had an obvious advantage that the extracted feature is “translation invariant”. This means that the SWE remains unchanged, even if the brain image is motion-shifted during MRI scanning, or the spatial normalized brain is not strictly registered to the atlas.

Single-hidden-layer neural network

The single-hidden-layer neural network (SNN) is commonly described in detail in many textbooks and literature. The SNN is a particular feedforward neural network, and it can approximate to any function at any degree, guaranteed by the universal approximation theorem [46]. From the view of structure, SNN contains three fully connected layers: the input layer, the hidden layer, and the output layer.

Figure 4 shows the structure of SNN. The extracted features from brain images were sent to the input layer, hence, the number of input neurons is equal to the number of features. The neuron number of hidden layer was determined by the grid-searching method. The output neuron corresponds to the class category, and the argmax function chose the class with highest output. The training methods were back-propagation style. First the weights were initialized randomly, and tested for error. Then, the error information was sent backwards to the neural network, in order to update the weights. This procedure was iteratively repeated and as the error over training set decreased, it terminates until the error over validation set increases.

Fig. 4

Structure of single-hidden-layer neural network.

Predator-prey particle swarm optimization

The traditional gradient-descent method cannot get stable results, i.e., the converged result depends on the initial values. Hence, scholars have proposed to use bioinspired algorithms to train the SNN. Allahkarami et al. [47] utilized genetic algorithm (GA). Buyukada [48] used PSO. Dugenci et al. [49] suggested the use of bee algorithm (BA). Yang [50] proposed the use of BBO algorithm.

The above-mentioned mechanisms can solve the unstable solution to some degree. In order to further improve the training algorithm, we used the PSO algorithm as the basis, and proposed a PP-PSO. PSO mimics the behavior of bird flock and fish swarm. It imagines each bird or fish as a particle. Each particle represents a candidate solution; hence, finding the best solution can be achieved by moving the particles near the global optimal point.

Particle swarm optimization

In the algorithm, each particle was assigned two characteristics: position P and the velocity V. The fitness function f is evaluated at each iteration over the whole particle swarm. In each iteration, two categories of best particles were updated.

One category was the previous best (B_p) position a particle has traversed so far: $B_{P} (i, t) = \underset{k = 1, \dots, t}{arg min} [f (P_{i} (k))]$ (11) where f denotes the fitness function, t the current iteration number, i the particle index, k the iteration index, P the position.

The other category was the global best (B_g) position that all particles have traversed so far: $B_{g} (t) = \underset{i = 1, \dots, N}{arg min} [B_{p} (i, t)]$ (12) where N denotes the total number of particles.

Based on B_p and B_g, the whole swarm was updated by

$\begin{matrix} V_{i} (t + 1) = w \times V_{i} (t) + x_{p} \times q_{p} \times \\ (B_{p} (i, t) - P_{i} (t)) + x_{g} \times q_{g} \times (B_{g} (t) - P_{i} (t)) \end{matrix}$ (13) $P_{i} (t + 1) = P_{i} (t) + V_{i} (t + 1)$ (14)

Here w is the inertia weight, which can balance local exploitation and global exploration. Two positive constant parameters q_p and q_g are acceleration coefficients with the aim of modifying the distance towards B_p and B_g, respectively. x_p and x_g are random variables within range [0, 1].

Velocity clamping [48] technique was used to limit particles flying out of the search space as $V_{i} (t + 1) \leftarrow min (V_{max}, V_{i} (t + 1))$ (15) where V max is the upper bound of particlevelocity.

Proposed training algorithm

The PP-PSO mimicked the behavior of sardines and killer whales. Inspired by this, Wang and Lv [51] proposed a predator-prey model, in which the predators chase the center of the swarm of prey, and the preys escape from predators using different behaviors. The swarm in PP-PSO can be divided into two types: prey swarm (marked as y) and predator swarm (marked as r). Its core idea is: the predators chase after the preys, while preys try to escape from the predators. Using this idea, the equation (13) was updated as follows:

$\begin{matrix} V_{i}^{r} (t + 1) = w_{r} (t) \times V_{i}^{r} (t) + x_{p} \times q_{p} \times \\ (B_{p}^{r} (i, t) - P_{i}^{r} (t)) + x_{g} \times q_{g} \times (B_{g} (t) - P_{i}^{r} (t)) \end{matrix}$ (16)

$\begin{matrix} V_{i}^{y} (t + 1) = w_{y} (t) \times V_{i}^{y} (t) + x_{p} \times q_{p} \times \\ (B_{p}^{y} (i, t) - P_{i}^{y} (t)) + x_{g} \times q_{g} \times (B_{g} (t) - P_{i}^{y} (t)) \end{matrix}$ (17)

The weights w_r(t) and w_y(t) are inertia weights for predator and prey swarm, respectively. They are defined as: $w_{r} (t) = 0.4 + 0.1 \times exp (- \frac{10 \times t}{t_{max}})$ (18) $w_{y} (t) = w_{max} - t \times \frac{w_{max} - w_{min}}{t_{max}}$ (19)

The whole algorithm

The whole algorithm is depicted in Table 2. The 10-fold cross validation was used to get the strict statistical results. The complete 10-fold cross validation repeated 50 times independently. For example, Hasan et al. [52] used 5 complete runs. Mu and colleagues [53] used only one run. Sanz et al. [54] used 10 runs. Compared to state-of-the-art statistical experiments, this study’s 50 complete runs can be used to reflect the distribution of the classifier performance.

Table 2

Pseudocode of our algorithm

Step 1	Input the dataset
Step 2	Select the most distinguishing slice along axial direction at Z = 61
Step 3	Employ k-th level SWE to extract the brain features
Step 4	Submit the SWE features to SNN trained by PP-PSO
Step 5	Report the evaluation result by 10-fold cross validation

SWE, stationary wavelet entropy; SNN, single-hidden-layer neural network; PP-PSO, predator-prey particle swarm optimization.

The 50 runs of 10-fold cross validation was a valid and rigorous method to avoid overfitting. It can be used to make sure the result is generalizable. Under this setting, one classifier was created in each trial, thus 10 classifiers were created for a 10-fold cross validation, and 500 different classifiers were created for the 50 different and independent runs. It is not practical to draw the receiver-operating characteristics (ROC) in this paper for all 500 classifiers. This paper reports the mean and standard deviation of sensitivity, specificity, accuracy, and area under the curve (AUC) in the experiment.

RESULTS

Histogram stretching

Histogram-stretching method was used to make the AD images from local hospitals brighter than the original. The original images and their histograms are plotted in Fig. 5. The gray levels at 0 were not counted, since there were too many pixels in the background with the gray level value of 0. Additionally, the histogram of Fig. 5h is smoothed and shown in Fig. 5h, since many density values are down to zero due to the histogram stretching method.

Fig. 5

Illustration of histogram stretching.

Statistical result

The decomposition level was set to 4, and the neuron number of hidden layer was set to 3. These parameters were obtained by grid-searching method, as shown below. The 50 runs over the sensitivities, specificities, and accuracies are presented in Table 8. The average over 50 runs show that the proposed method achieved a sensitivity of 92.69±1.29%, a specificity of 92.78±1.51%, an accuracy of 92.73±1.03%, and an AUC of 0.95±0.02.

Table 3

Algorithm Comparison (Unit: %)

Approach	Sensitivity	Specificity	Accuracy
BRC+VFI [6]	65.63	100.00	78.00
BRC+SVM [6]	96.88	77.78	90.00
KSVM+DT [7]	94.00	71.00	90.00
EB+RBF-SVM [8]	85.71±1.91	86.99±2.30	86.71±1.93
3D-EB+RBF-SVM [9]	88.36±3.07	88.59±3.10	88.54±2.37
GDA-BD+SVM [10]	80.00±4.00	N/A	92.09±2.60
MLP+BBO [11]	92.14	92.47	92.40
DIF+SVM [12]	84.93±1.21	89.21±1.63	88.27±1.89
DIF+GEPSVM [12]	88.93±1.80	92.27±1.79	91.52±1.63
VBM+RF [13]	87.9±1.2	90.0±1.1	89.0±0.7
SWE+SNN+PP-PSO (Proposed)	92.69±1.29	92.78±1.51	92.73 ± 1.03

Bold means the best. BRC, brain region cluster; VFI, voting feature interval; SVM, support vector machine; KSVM, kernel support vector machine; DT, decision tree; EB, eigenbrain; RBF, radial basis function; 3D, three-dimensional; GDA, geodesic anisotropy; BD, Bhattacharyya distance; MLP, multilayer perceptron; BBO, biogeography-based optimization; DIF, displacement field; GEPSVM, generalized eigenvalue proximal support vector machine; VBM, voxel-based morphometry; RF, random forest; SWE, stationary wavelet entropy; SNN, single-hidden-layer neural network; PP-PSO, predator-prey particle swarm optimization.

Table 4

Effect of slice selection

	Features	Sensitivity	Specificity	Accuracy
Slice Selection	13	92.69±1.29	92.78±1.51	92.73±1.03
Whole Brain	1560	92.71±1.46	92.61±1.41	92.66±1.08

Table 5

Comparison with manual interpretation

	Sensitivity	Specificity	Accuracy
Observer 1	73.91	79.49	77.42
Observer 2	69.57	74.36	72.58
Observer 3	78.26	71.79	74.19
Proposed Algorithm	91.30	92.31	91.94

Table 6

Computational time in offline training over the dataset

Step	Setting	Time
Preprocessing	198-image	40.65 h
Slice selection	198-image	4.71 s
SWE	198-image	156.42 s
Training	50-runs	182.80 min

h, hour; min, minute; s, second.

Table 7

Computational time in online identification over a single volumetric image

Step	Time
Preprocessing	14.12 min
Slice selection	0.02 s
SWE	0.85 s
Prediction	0.01 s

min, minute; s, second.

Table 8

Statistical result on our balanced dataset

R	Sen	Spc	Acc	R	Sen	Spc	Acc	R	Sen	Spc	Acc
1	93.88	89.80	91.84	18	92.86	91.84	92.35	35	94.90	94.90	94.90
2	93.88	92.86	93.37	19	93.88	94.90	94.39	36	91.84	94.90	93.37
3	91.84	90.82	91.33	20	91.84	93.88	92.86	37	93.88	94.90	94.39
4	92.86	91.84	92.35	21	94.90	91.84	93.37	38	93.88	92.86	93.37
5	92.86	94.90	93.88	22	89.80	93.88	91.84	39	90.82	89.80	90.31
6	92.86	94.90	93.88	23	92.86	92.86	92.86	40	91.84	90.82	91.33
7	92.86	92.86	92.86	24	93.88	91.84	92.86	41	93.88	91.84	92.86
8	90.82	93.88	92.35	25	92.86	93.88	93.37	42	91.84	92.86	92.35
9	92.86	92.86	92.86	26	92.86	94.90	93.88	43	92.86	92.86	92.86
10	93.88	93.88	93.88	27	94.90	93.88	94.39	44	90.82	92.86	91.84
11	94.90	93.88	94.39	28	92.86	92.86	92.86	45	92.86	92.86	92.86
12	92.86	90.82	91.84	29	92.86	91.84	92.35	46	89.80	93.88	91.84
13	92.86	93.88	93.37	30	93.88	89.80	91.84	47	91.84	91.84	91.84
14	93.88	90.82	92.35	31	91.84	93.88	92.86	48	92.86	91.84	92.35
15	91.84	93.88	92.86	32	93.88	91.84	92.86	49	89.80	90.82	90.31
16	92.86	90.82	91.84	33	90.82	90.82	90.82	50	92.86	94.90	93.88
17	91.84	93.88	92.86	34	91.84	92.86	92.35	Avr	92.69±1.29	92.78±1.51	92.73±1.03

R, Run; Sen, Sensitivity; Spc, Specificity; Acc, Accuracy; Avr, Average.

Optimal decomposition level

The grid-searching method was utilized to find the optimal value of the decomposition level. According to common knowledge, the decomposition level was set k from 1 to 8 with an increment of 1. The results are shown in Fig. 6.

Fig. 6

Optimal decomposition level.

Optimal neuron number at hidden layer

In this experiment, neuron number of hidden layer was varied from 2 to 10 with increment of 1. The results are presented in Fig. 7.

Fig. 7

Optimal neuron number at hidden layer.

Training algorithm comparison

The feature dimension and neural network structure remained unchanged in the experiment. The proposed PP-PSO algorithm was compared with global optimization algorithms: including GA [47], PSO [48], BA [49], and BBO [50]. The maximum iteration number was set to 1,000. Matlab command of “boxplot”, was used to show the median, quartile, whisker, and outlier of each algorithm over 50 runs. The results are shown in Fig. 8.

Fig. 8

Our training algorithm compared to global optimization based training algorithm. GA, genetic algorithm; PSO, particle swarm optimization; BA, bee algorithm; BBO, biogeography-based optimization; PP-PSO, predator-prey particle swarm optimization.

Next, the proposed method was compared with traditional gradient-based backpropagation methods. The competing algorithms include backpropagation, momentum backpropagation, and adaptive backpropagation. Each algorithm was run 50 times, and the iteration number was set to 1000. The results comparing the different algorithms are shown in Fig. 9.

Fig. 9

Our training algorithm compared to gradient-based training algorithms.

Comparison with state-of-the-art approaches

The SWE+SNN+PP-PSO method was compared with 10 state-of-the-art approaches: BRC+VFI [6], BRC+SVM [6], KSVM+DT [7], EB+RBF-SVM [8], 3D-EB+RBF-SVM [9], GDA-BD+SVM [10], MLP+BBO [11], DIF+SVM [12], DIF+GEPSVM [12], and VBM+RF [13]. All the simulation settings were the same as previous experiments. The results are listed in Table 3. The performance results of competing approaches were taken from different literatures. Some methods only ran once, hence, the standard deviation could not be reported.

Effect of slice selection

This section describes how the effect of the slice selection was validated. If slice selection was not used, then SWE features would have been extracted from all slices of the brain. Note that each slice will generate 13 features, for the whole brain 120 slices were used (those above and below the brains are not used), thus there are 13*120 = 1560 features. Those 1560 features were submitted to the SNN+PP-PSO. The comparison results are listed inTable 4.

Comparison with manual interpretation

We compared our algorithm with three experienced observers (O1, O2, O3) with clinical experiences longer than 10 years in neuroradiology. 62 Subjects (23 ADs, 39 age-matched and sex-matched HCs) were enrolled and scanned. The observers were blinded to the age and sex of the subjects, and they assessed only the selected slice. The comparison results can be seen inTable 5.

Computational complexity

The computational complexity of the proposed method was tested with regards to offline training and online identification. The computation time of offline training stage is listed in Table 6. The computation time to process one image of online identification stage is listed in Table 7.

DISCUSSION

Figure 5a-d shows that HC and AD subjects of the same OASIS dataset has a similar histogram envelope. From Fig. 5c-f, it can be seen that AD subjects from local hospitals is a bit darker than AD subjects from OASIS. Figure 5f validates that the histogram of AD subjects from local hospitals is a low-contrast histogram. Figure 5g shows the HS result of Fig. 5e, with the new histogram and smoothed version offered in Fig. 5h and 5i. It is clear that the histograms from two different sources (OASIS dataset and local hospitals) have similar gray level distribution envelope, as compared in Fig. 5c and 5i.

There are other excellent intensity normalization algorithms, such as whole body intensity standardization [55], Whitestripe [56], contrast-limited adaptive histogram equalization [57], etc. However, the method proposed in this paper is simple and works only on one slice. Hence, it is faster than other methods. In the future, performance of other algorithms will be tested.

As Fig. 6 shows, the curves achieved the highest point at k equals 4, hence, it was decided to choose the decomposition level as 4. The reason was two-fold: 1) When k increases from 0, the higher level will give higher resolution decomposition; 2) But if k is too large, the calculation error (such as the rounding up error) will sum up, and worsen the classification performance.

From Fig. 7, it can be observed that the effect of changing neuron number of the hidden layer is less than the effect of changing the decomposition level as shown in Fig. 6. As the size of the hidden layer increases, the performance decreases slightly. The optimal neuron number of the hidden layer was 3, which suggested the best SNN structure of this study was 13-3-1 in terms of numerical results. Nevertheless, it was shown that the classification performances with neuron number at hidden layer of 2 or 4 quite approximated to the one with neuron number at hidden layer as 3, which indicates that this problem may have multiple solutions.

It is seen in Fig. 8 that the proposed PP-PSO is the most stable algorithm, which yielded the highest performance among all algorithms. The BBO [50] ranked as the 2nd best algorithm, and the PSO [48] ranked as the 3rd best algorithm. The PSO [48] had slightly inferior performance than that of the BBO [50]. The BA [49] ranked as the 4th, and GA [47] gave the worst performance.

The comparison results in Fig. 9 shows that the PP-PSO had a better mean value than the other three algorithms in terms of sensitivity, specificity, and accuracy. Also, the PP-PSO has a much smaller variance than the other three algorithms. All the measures showed the robustness of PP-PSO as a training algorithm.

The reason is all of the three methods are based on a gradient descent, which commonly initializes the solution usually. These initialization influences the convergence performance profoundly. That means, if the initialization was near a local minimum point, then the algorithms may be stuck into that localminimum.

The proposed PP-PSO belongs to global optimization algorithms, which can solve this problem. The swarm contains several solution candidates. If one candidate is trapped into a local minimum point; other candidates with better results within the swarm will put them out of the local region. If traditional backpropagation and its variants were used, good result or obtain a bad result could be obtained due to the large variance. This indicates the necessity of using PP-PSO. There are many other interesting global optimization algorithms, such as artificial bee colony [58], bat algorithm [59], harmony search [60], gray wolf algorithm [61], etc. There are many other variants of PSO, such as quantum-behaved PSO [62], PSO with time-varying acceleration coefficient [63], bare-bone PSO [64], etc. In the future, it is necessary to make objective tests to improve the performance of the training algorithm.

Table 3 shows that the proposed method achieved the highest accuracy of 92.73±1.03%, better than the other methods. The MLP+BBO [11] obtained the second highest accuracy of 92.40%, but the authors did not report the standard deviation. The GDA-BD+SVM [10] obtained the third highest accuracy of 92.09±2.60%. However, its sensitivity was too low, with a large standard deviation of 80.00±4.00%.

In terms of specificity, the BRC+VFI [6] obtained a perfect specificity, but its sensitivity was too low at 65.63%. In terms of sensitivity, the BRC+SVM [6] obtained the highest sensitivity of 96.88%, but its specificity was only 77.78%. Considering all three measures, the proposed method is the best among all 11 algorithms.

Table 4 shows that using the whole brain did not increase the detection performance. In contrast, the detection over the whole brain slightly decreases the performance, with a sensitivity of 92.71±1.46%, a specificity of 92.61±1.41%, and an accuracy of 92.66±1.08%. The reason may be two-fold: 1) There were too many features (1560) in using the whole brain. The excessive features could have made the classifier training difficult, and thus the classifier did not converge to its optimal condition. 2) Many slices are unrelated to AD. Therefore, including these slices will make the input data more complicated than simply using slice selection method. It can be seen from Table 5 that the three human observers reach an accuracy in the range of 72% to 78%, while the developed algorithm can achieve an accuracy of 91.94%. This again shows the power of machine learning and computer vision.

On the other hand, we already know the powerfulness of deep learning used in automated diagnostic systems: Esteva et al. [65] used a deep convolutional neural network to create a dermatologist-level classifier. Suk et al. [66] used deep sparse multi-task learning in diagnosing AD. Morabito et al. [67] employed deep learning representation to check early-stage Creutzfeldt-Jakob disease. In this study, deep learning was not used due to the small size of the dataset. Yet, it is believed deep learning can help improve the proposed system in future research work.

From Table 6, one can observe that the preprocessing in offline training cost 40.65 h, slice selection cost 4.71 s, SWE cost 156.42 s, and the classifier training cost 182.80 min. Note that it was necessary to handle 198 images, and they needed to run 50 times to get the cross-validation result in training. Hence, for one image, the calculation time was reasonable. The training time for 1 run was about 3.66 min.

From Table 7, it can be seen that the preprocessing in online identification cost 14.12 min, slice selection cost 0.02 s, SWE cost 0.85 s, and prediction only on cost 0.01 s. This means that after the image is preprocessed, it only needs 0.88 s to identify if a subject is AD or not. This is quite rapid and that suggests that the proposed method meets the real-time requirement.

Table 8 shows the sensitivities, specificities, and accuracies over the balanced dataset based on 50 runs. The averaged performance showed the algorithm of this study achieved a sensitivity of 92.69±1.29%, a specificity of 92.78±1.51%, and an accuracy of 92.73±1.03%. If the data from online OASIS dataset was only used, then the unbalanced dataset may force the classifier towards the majority class (i.e., the healthy subjects).

A shortcoming of the proposed method is that this system is oriented for computer machine instead of a human. Hence, it can be regarded as a “black box”. The human experts cannot get effective rules from this “black box”. This is why the proposed method is identified as a subfield of “machine learning”. In the future, we shall try to use rule-based systems to translate the machine-oriented rules to human-oriented rules.

Another shortcoming is that the employed feature, SWE, is only suitable on two-dimensional features. This is the reason why a slice had to be selected before implementing the SWE operation. In the future, we shall try to extend SWE to three-dimensional situation. At that time, we shall try to process the volumetric image by the three-dimensional SWE feature.

Conclusion

Our team developed a novel system based on computer vision and machine learning. We proposed a novel predator-prey particle swarm optimization to help train the classifier. The proposed system is better than 10 state-of-the-art approaches in the combined dataset from both the OASIS dataset and the dataset from local hospitals. Additionally, the proposed system has a better performance than human observers do in analyzing realistic brain imaging data. In the future, we shall try to detect mild cognitive impairment. We shall also make tentative tests by using advanced classifiers, such as convolution neural network or autoencoder,

Footnotes

ACKNOWLEDGMENTS

This study was supported by Natural Science Foundation of China (61602250), Program of Natural Science Research of Jiangsu Higher Education Institutions (16KJB520025, 15KJB470010), Open fund for Jiangsu Key Laboratory of Advanced Manufacturing Technology (HGAMTL1601), Open Program of Jiangsu Key Laboratory of 3D Printing Equipment and Manufacturing (3DL201602), Open fund of Key Laboratory of Guangxi High Schools Complex System and Computational Intelligence (2016CSCI01), and Natural Science Foundation of Jiangsu Province (BK20150983).

The authors acknowledge their gratitude to the OASIS dataset that came from NIH grants P50AG05681, P01 AG03991, R01 AG021910, P50 MH071616, U24 RR021382, and R01 MH56584.

Authors’ disclosures available online ().

References

Carneiro

, Loureiro

, Delerue-Matos

, Morais

, Pereira

(2017) Alzheimer’s disease: Development of a sensitive label-free electrochemical immunosensor for detection of amyloid beta peptide. Sens Actuator B-Chem 239, 157–165.

Dronse

, Fliessbach

, Bischof

, von Reutern

, Faber

, Hammes

, Kuhnert

, Neumaier

, Onur

, Kukolja

, van Eimeren

, Jessen

, Fink

, Klockgether

, Drzezga

(2017) In vivo Patterns of Tau pathology, amyloid-β burden, and neuronal dysfunction in clinical variants of Alzheimer’s disease. J Alzheimers Dis 55, 465–471.

Lista

, Hampel

(2017) Synaptic degeneration and neurogranin in the pathophysiology of Alzheimer’s disease. Expert Rev Neurother 17, 47–57.

Shamirian

, Nalbandian

, Khare

, Castellani

, Kim

, Kimonis

(2015) Early-onset Alzheimers and cortical vision impairment in a woman with valosin-containing protein disease associated with 2 APOE ɛ4/APOEɛ4 genotype. Alzheimer Dis Assoc Dis 29, 90–93.

Kepp

(2017) Ten challenges of the amyloid hypothesis of Alzheimer’s disease. J Alzheimers Dis 55, 447–457.

Plant

, Teipel

, Oswald

, Böhm

, Meindl

, Mourao-Miranda

, Bokde

, Hampel

, Ewers

(2010) Automated detection of brain atrophy patterns based on MRI for the prediction of Alzheimer’s disease. Neuroimage 50, 162–174.

Zhang

, Wang

, Dong

(2014) Classification of Alzheimer disease based on structural magnetic resonance imaging by kernel support vector machine decision tree. Prog Electromagn Res 144, 171–184.

Zhang

, Dong

, Phillips

, Wang

, Ji

, Yang

, Yuan

(2015) Detection of subjects and brain regions related to Alzheimer’s disease using 3D MRI scans based on eigenbrain and machine learning. Front Comput Neurosci 9, 66.

Zhang

, Wang

, Phillips

, Yang

, Yuan

(2016) Three-dimensional eigenbrain for the detection of subjects and brain regions related with Alzheimer’s disease. J Alzheimers Dis 50, 1163–1179.

10.

Savio

, Graña

(2013) Deformation based feature selection for Computer Aided Diagnosis of Alzheimer’s disease. Expert Syst Appl 40, 1619–1628.

11.

Wang

, Zhang

, Li

, Jia

, Liu

, Yang

, Zhang

YD.

(2016) Single slice based detection for Alzheimer’s disease via wavelet entropy and multilayer perceptron trained by biogeography-based optimization. Multimed Tools Appl doi: 10.1007/s11042-016-4222-4

12.

Zhang

, Wang

(2015) Detection of Alzheimer’s disease by displacement field and machine learning. Peer J 3, e1251.

13.

Gray

, Aljabar

, Heckemann

, Hammers

, Rueckert

, Alzheimer’s Dis Neuroimaging Initiatve (2013) Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease. Neuroimage 65, 167–175.

14.

Wang

, Du

, Zhang

, Phillips

, Wu

, Chen

, Zhang

(2017) Alzheimer’s disease detection by pseudo Zernike moment and linear regression classification. CNS Neurol Disord Drug Targets 16, 11–15.

15.

Fang

, Chang

, Tsao

, Shih

, Wang

(2016) Channel state reconstruction using multilevel discrete wavelet transform for improved fingerprinting-based indoor localization. IEEE Sens J 16, 7784–7791.

16.

Seal

, Bhattacharjee

, Nasipuri

(2016) Human face recognitionusing random forest based fusion of á-trous wavelet transformcoefficients from thermal and visible images. AEU Int JElectron Commun 70, 1041–1049.

17.

Wang

, Chen

, Li

, Zhang

, Han

, Wu

, Du

(2015) Detectionof dendritic spines using wavelet-based conditional symmetricanalysis and regularized morphological shared-weight neuralnetworks. Comput Math Method Med 2015, 454076.

18.

Zhang

, Yang

, Lu

, Zhou

, Phillips

, Liu

, Wang

(2016) Facial emotion recognition based on biorthogonal waveletentropy, fuzzy support vector machine, and stratified crossvalidation. IEEE Access 4, 8375–8385.

19.

Wang

, Zhang

, Ji

, Yang

, Wu

, Wei

(2015) Fruit classification by wavelet-entropy and feedforward neural network trained by fitness-scaled chaotic ABC and biogeography-based optimization. Entropy 17, 5711–5728.

20.

Wang

, Yang

, Zhang

, Phillips

, Yang

, Yuan

(2015) Identification of green, Oolong and black teas in China via wavelet packet entropy and fuzzy support vector machine. Entropy 17, 6663–6682.

21.

Avci

, Dogantekin

(2016) An expert diagnosis system for Parkinson disease based on genetic algorithm-wavelet kernel-extreme learning machine. Parkinsons Dis 2016, 5264743.

22.

Chen

(2016) Computer-aided detection of left and right sensorineural hearing loss by wavelet packet decomposition and least-square support vector machine. J Am Geriatr Soc 64, S350.

23.

Wang

, Yang

, Du

, Yang

, Liu

, Gorriz

, Ramírez

, Yuan

, Zhang

(2016) Wavelet entropy and directed acyclic graphsupport vector machine for detection of patients with unilateralhearing loss in MRI scanning. Front Comput Neurosci 10, 106.

24.

Doulah

, Fattah

, Zhu

, Ahmad

(2014) Wavelet domain feature extraction scheme based on dominant motor unit action potential of EMG signal for neuromuscular disease classification. IEEE Trans Biomed Circuits Syst 8, 155–164.

25.

Zhang

, Wang

, Ji

(2015) A comprehensive survey on particle swarm optimization algorithm and its applications. Math Probl Eng 2015, 931256.

26.

Bonyadi

, Michalewicz

(2016) Stability analysis of the particle swarm optimization without stagnation assumption. IEEE Trans Evol Comput 20, 814–819.

27.

Cleghorn

, Engelbrecht

(2015) Particle swarm variants: Standardized convergence analysis. Swarm Intell 9, 177–203.

28.

Zhang

, Wu

(2011) Crop classification by forward neural network with adaptive chaotic particle swarm optimization. Sensors (Basel) 11, 4721–4743.

29.

Mallick

, Kar

, Ghoshal

, Mandal

(2016) Optimal sizing anddesign of CMOS analogue amplifier circuits using craziness-basedparticle swarm optimization. Int J Numer Modell ElectronNetwork Device Fields 29, 943–966.

30.

Zhang

, Wu

, Wang

(2013) UCAV path planning by fitness-scaling adaptive chaotic particle swarm optimization. Math Probl Eng 2013, 705238.

31.

Zhang

, Jun

, Wei

, Wu

(2010) Find multi-objective paths in stochastic networks via chaotic immune PSO. Expert Syst Appl 37, 1911–1919.

32.

Singh

, Singh

, Mahapatra

, Jagadev

(2016) Particle swarm optimization algorithm embedded with maximum deviation theory for solving multi-objective flexible job shop scheduling problem. Int J Adv Manuf Technol 85, 2353–2366.

33.

Zhang

, Wang

, Phillips

, Ji

(2014) Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowledge-Based Syst 64, 22–31.

34.

, Fong

, Siu

(2015) PSOVina: The hybrid particle swarm optimization algorithm for protein-ligand docking. J Bioinform Comput Biol 13, 1541007;.

35.

Zhang

, Wang

, Sun

, Ji

, Phillips

, Dong

(2014) Binary structuring elements decomposition based on an improved recursive dilation-union model and RSAPSO method. Math Probl Eng 2014, 272496.

36.

Ardekani

, Figarsky

, Sidtis

(2013) Sexual dimorphism in thehuman corpus callosum: An MRI study using the OASIS braindatabase. Cereb Cortex 23, 2514–2520.

37.

Wang

, Zhang

, Liu

, Phillips

, Yuan

(2016) Detection of Alzheimer’s disease by three-dimensional displacement field estimation in structural magnetic resonance imaging. J Alzheimers Dis 50, 233–248.

38.

Kamrani

, Rezaei

, Amiri

, Saberinasr

(2016) Investigating the efficiency of information entropy and fuzzy theories to classification of groundwater samples for drinking purposes: Lenjanat Plain, Central Iran. Environ Earth Sci 75(13), 1370.

39.

Nguyen

, Vo

, Choi

, Won

(2015) A stationary wavelet entropy-based clustering approach accurately predicts gene expression. J Comput Biol 22, 236–249.

40.

Zhang

, Lu

, Zhou

, Yang

, Wu

, Liu

, Phillips

, Wang

(2016) Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: Decision tree, k-nearest neighbors, and support vector machine. Simulation 92, 861–871.

41.

Zhang

, Nayak

, Yang

, Shao

, Liu

, Wang

(2017) Detection of unilateral hearing loss by Stationary Wavelet Entropy. CNS Neurol Disord Drug Targets 16, 122–128.

42.

Birajadar

, Patidar

, Shirvalkar

, Gupta

, Gadre

(2016) In International Conference on Signal Processing and Communications (SPCOM) IEEE, Banglore, INDIA, pp. 5–11.

43.

Zhou

, Zhang

, Ji

, Yang

, Dong

, Wang

, Zhang

, Phillips

(2016) Detection of abnormal MR brains based on wavelet entropy and feature selection. IEEJ Trans Electr Electron Eng 11, 364–373.

44.

Zhang

, Wang

, Dong

, Phillips

, Dong

, Ji

, Yang

(2015) Pathological brain detection in magnetic resonance imaging scanning by wavelet entropy and hybridization of biogeography-based optimization and particle swarm optimization. Prog Electromagn Res 152, 41–58.

45.

Zhang

, Wang

, Sun

, Phillips

(2015) Pathological brain detection based on wavelet entropy and Hu moment invariants. Biomed Mater Eng 26, S1283–S1290.

46.

Nguyen

, Lloyd-Jones

, McLachlan

(2016) A universal approximation theorem for mixture-of-experts models. Neural Comput 28, 2585–2593.

47.

Allahkarami

, Nuri

, Abdollahzadeh

, Rezai

, Maghsoudi

(2016) Improving estimation accuracy of metallurgical performance of industrial flotation process by using hybrid genetic algorithm - Artificial neural network (GA-ANN). Physicochem Probl Mineral Pro 53, 366–378.

48.

Buyukada

(2016) Co-combustion of peanut hull and coal blends: Artificial neural networks modeling, particle swarm optimization and Monte Carlo simulation. Bioresour Technol 216, 280–286.

49.

Düğenci

, Aydemir

, Esen

, Aydin

(2015) Creepmodelling of polypropylenes using artificial neural networkstrained with Bee algorithms. Eng Appl Artif Intell 45, 71–79.

50.

Yang

, Zhang

, Yang

, Ji

, Dong

, Wang

, Feng

, Wang

(2016) Automated classification of brain images using wavelet-energy and biogeography-based optimization. Multimed Tools Appl 75, 15601–15617.

51.

Wang

, Lv

, Chen

, Li

, Zhang

, Liu

(2016) Smart pathological brain detection system by predator-prey particle swarm optimization and single-hidden layer neural-network. Multimed Tools Appl. doi: 10.1007/s11042-016-4242-0

52.

Hasan

, Li

, Ahmad

, Molla

(2017) predCar-site: Carbonylation sites prediction in proteins using support vector machine with resolving data imbalanced issue. Anal Biochem 525, 107–113.

53.

, Hu

, Yin

(2017) Driving fatigue detecting based on EEG signals of forehead area. Int J Patt Recogn Artif Intell 31, 1750011.

54.

Sanz

, Paternain

, Galar

, Fernandez

, Reyero

, Belzunegui

(2017) A new survival status prediction system for severe trauma patients based on a multiple classifier system. Comput Methods Programs Biomed 142, 1–8.

55.

Jäger

, Nyùl

, Frericks

, Wacker

, Hornegger

(2007) Whole Body MRI Intensity Standardization In Bildverarbeitungfür die Mediz: Algorithmen – Systeme –Anwendungenroceedings des Workshops, Horsch

, Deserno

, Handels

, Meinzer

H-P

, Tolxdorff

, eds. Springer Berlin Heidelberg , Berlin Heidelberg, pp. 459–463.

56.

Shinohara

, Muschelli

, Whitestripe: White matter normalizationfor magnetic resonance images using whitestripe, https://cran.r-project.org/web/packages/WhiteStripe/index.html, Accessed May/2-2017.

57.

Zhang

, Wu

, Lu

, Wang

, Phillips

, Wang

(2016) Smartdetection on abnormal breasts in digital mammography based oncontrast-limited adaptive histogram equalization and chaoticadaptive real-coded biogeography-based optimization. Simulation 92, 873–885.

58.

Zhang

, Wu

(2011) Optimal multi-level thresholding based onmaximum Tsallis entropy via an Artificial Bee Colony Approach. Entropy 13, 841–859.

59.

, Qiu

, Shi

, Li

, Lu

, Chen

, Yang

, Liu

, Jia

, Zhang

(2017) A pathological brain detection system based onextreme learning machine optimized by bat algorithm. CNSNeurol Disord Drug Targets 16, 23–29.

60.

Giran

, Temur

, Bekdaş

(2017) Resource constrained projectscheduling by harmony search algorithm. KSCE J Civ Eng 21, 479–487.

61.

Kumar

, Chhabra

, Kumar

(2017) Grey wolf algorithm-based clustering technique. J Intell Syst 26, 153–168.

62.

Zhang

, Ji

, Yang

, Wang

, Dong

, Phillips

, Sun

(2016) Preliminary research on abnormal brain detection by wavelet-energy and quantum-behaved PSO. Technol Health Care 24, S641–S649.

63.

Zhang

, Wang

, Phillips

, Dong

, Ji

, Yang

(2015) Detectionof Alzheimer’s disease and mild cognitive impairment based onstructural volumetric MR images using 3D-DWT and WTA-KSVM trainedby PSOTVAC. Biomed Signal Process Control 21, 58–73.

64.

Campos

, Krohling

(2016) Entropy-based bare bones particleswarm for dynamic constrained optimization. Knowledge-BasedSyst 97, 203–223.

65.

Esteva

, Kuprel

, Novoa

, Ko

, Swetter

, Blau

, Thrun

(2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118.

66.

Suk

, Lee

, Shen

; Alzheimer’s Disease Neuroimaging Initiative (2016) Deep sparse multi-task learning for feature selection in Alzheimer’s disease diagnosis. Brain Struct Funct 221, 2569–2587.

67.

Morabito

, Campolo

, Mammone

, Versaci

, Franceschetti

, Tagliavini

, Sofia

, Fatuzzo

, Gambardella

, Labate

, Mumoli

, Tripodi

, Gasparini

, Cianci

, Sueri

, Ferlazzo

, Aguglia

(2017) Deep learning representation from electroencephalography of early-stage Creutzfeldt-Jakob disease and features for differentiation from rapidly progressive dementia. Int J Neural Syst 27(15), 1650039.