Abstract
Alzheimer is a degenerative disorder that attacks neurons, resulting in loss of memory, thinking, language skills, and behavioral changes. Computer-aided detection methods can uncover crucial information recorded by electroencephalograms. A systematic literature search presents the wavelet transform as a frequently used technique in Alzheimer’s detection. However, it requires a defined basis function considered a significant problem. In this work, the concept of empirical mode decomposition is introduced as an alternative to process Alzheimer signals. The performance of empirical mode decomposition heavily relies on a parameter called threshold. In our previous works, we found that the existing thresholding techniques were not able to highlight relevant information. The use of Tsallis entropy as a thresholder is evaluated through the combination of empirical mode decomposition and neural networks. Thanks to the extraction of better features that boost the classification accuracy, the proposed approach outperforms the state-of-the-art in terms of peak signal to noise ratio and root mean square error. Hence, our methodology is more likely to succeed than methods based on other landmarks such as Bayes, Normal and Visu shrink. We finally report an accuracy rate of 80%, while the aforementioned techniques only yield performances of 65%, 60% and 40%, respectively.
Introduction
Alzheimer’s disease (AD) is the most common cause of dementia, which includes a set of symptoms such as memory loss and difficulties with thinking, orientation, problem-solving or language. During patient illness, proteins build up in the brain to form structures called plaques and tangles. This leads to the loss of connections between nerve cells, and eventually to their death and loss of brain tissue. Worldwide, nearly 44 million people have Alzheimer’s or a related dementia. Only 1 out of 4 have been diagnosed with AD. Moreover, 2 out of 3 persons who get AD are currently women [1–3]. Even though an efficient cure for AD has not been identified yet, it is possible to reduce its impact throughout an early detection.
The analysis of AD is performed through an electroencephalogram (EEG) assessment nowadays. The main challenge is the similarity of patterns between EEG signals extracted from healthy and AD-affected brains at their early stage [4–6]. Therefore, computer-aided detection (CAD) systems play a crucial role in the EEG processing [7–16]. The amount of literature related to CAD suggests that only few works have been devoted with rather limited success to aid the early detection of AD. Some of the involved techniques are the use of adaptive [17], Fourier [16] and wavelet [7–11]. The latter produces remarkable results owing to its capability to handle non-linear and non-stationary signals. Moreover, empirical mode decomposition (EMD) is a currently growing technique, being widely considered as a good alternative due to its low computational complexity. Indeed, it has been reported better results than wavelet algorithms [18–23]. EMD involves the decomposition of a signal into a finite number of Intrinsic Mode Functions (IMFs). The state-of- the-art reveals that EMD has many applications in different fields such as filtering [21], medical signal analysis [23], speech processing [24], information detection [22] and [25,26]. The high performance of EMD relies on the calculation of the optimal threshold value [18,19,22–24]. Some of the traditional thresholding methods are Visu [27], detrended fluctuation analysis (DFA) [28], Sure [29] and arousal level [30]. However, the main limitations of these techniques are well-narrated in [31,32]. Recently, we have analyzed the use of Shannon entropy (SE) as an optimal thresholding approach. Although we have tested its capabilities over the previous traditional thresholders, when the signal is fractal, Tsallis entropy (TE) outperforms SE due to its non-extensive property [32].
Taking into account the aforementioned issues, the objectives of our work are: (i) to analyze EEG raw signals in the EMD domain based on Tsallis thresholding, (ii) to assess the impact of Tsallis thresholding in EMD processing, (iii) to compare the TE based thresholding with other existing thresholds such as our proposed SE, (iv) to extract the EMD processed EEG features, and (v) to develop an automatic NN-based classifier for AD diagnosis. Notice that, the type of the artificial NN considered in this study is the well-known feed forward neural network as it has demonstrated its powerful ability in the discrimination of complex nonlinear signals. The rest of the paper is organized as follows: the empirical mode decomposition method, the analytic signal representation of IMFs, the threshold selection using TE and its additional parameter q, and the reconstruction process performed by thresholding the corrupted data are presented in Section 2 and 3. The feature extraction and supervised classification phases are explained in Section 4. Then, the experimental results and discussion for the classification of AD-affected and healthy brains through EEG signals are extensively given in Section 5. Finally, Section 6 concludes the paper and exposes some future lines of research and improvement.
Foundation of EMD theory to analyze EEG
EMD uncovers supreme oscillations from the sequence so-called IMF. An input EEG signal x(t) is decomposed by EMD into a finite intrinsic mode functions (Imf1, Imf2, …, Imf
n
) and a residue signal re(t) [18,19]. The signal x(t) could be defined as:
An IMF is defined as a function that satisfies the following requirements:
The number of extrema and zero-crossings must either be equal or differ at most by one. At any point, the mean value of the envelope defined by the local maxima and the local minima is zero.
Then, EEG is decomposed through a shifting process (see Algorithm 1). For a specified key signal x(t), the obtained IMF and the residue are as follows: re(t) = x(t), if n = 0.
As the algorithm presented above, the local extrema has to be calculated. Afterwards, the formation of the lower and upper envelops is carried out. Note that, the average value is gained in Step 4. The estimation of this average value from the input signal x(t) highlights the hugely oscillation patterns. Due to the fact that a single execution does not ensure the residue signal, the algorithm iterates till the IMF conditions are fulfilled. The interestingness of AD-affected EEG is observed on the initial four IMFs. Therefore, the features will be extracted from IMF1 to IMF4 of the EEG signals.
To this end, the decomposed EMD coefficients E
n
(s) are separated into two subclasses, namely E
n1(u) and E
n2(v) where:
in which (u + v) = s and u, v = 1,2, …, s. Notice that, s represents the entire length of the wavelet coefficients. Accordingly, the following two subclasses can be defined as a function of the parameter T:
The optimal value of T can be attained by finding out the position of maximum mutual information, which is given by
Notice that, Eq. (5) specifies the importance to calculate the best threshold. According to the state-of-the-art, several approaches have been designed in order to find the optimal thresholding methodology. The most popular are based on Bayes, Sure and Visu shrink. These techniques are not totally ideal as it is shown in [27,29] through an extensive validation process. Recently, we have demonstrated that SE [31,32] can be an interesting candidate to replace the previous traditional methods. However, our latest experiments have also identified some important drawbacks in the SE-based EMD approach. In particular, it fails to detect the main spike and the bursts in EEG. For instance, in the remote sensing application, the accuracy achieved by TE was 96%, whereas SE achieved a 93% [33]. Hence, the idea of using TE [34] to accurately detect Alzheimer's disease came up.
This section is considered as the pre-processing stage of our proposed methodology (see Fig. 1). It aims to eliminate the noise present in the EEG signals. Different kinds of noise can be produced during the acquisition process, mainly caused by hardware devices (i.e., electronic noise). This denoising stage is extremely crucial as the capability of detecting Alzheimer's disease in its earlier state increases, thereby significantly helping doctors. Our pre-processing phase includes the use of TE to know the optimal threshold value, which is considered the key of our proposal. Note that, TE is a derived form of SE.
The generalization form of SE is presented as follows:
where,
Thus, the non-extensive behavior of TE is given by:
Although higher PSNR values are obtained through the application of SE thresholding method, it should not be employed in long range communicative signals. TE is a new entropy approach that solves the main problems found in SE. The optimal TE value is represented as an unique parameter
Then, the mutual information MI
n
definition is taken into account as follows:
As a final remark, the new signal is acquired through the aggregation of all denoised IMFs and the last residue k4. Hence, relevant information is captured from the recovered one.
A neural network (NN) is an information processing paradigm that is inspired by the way biological nervous systems process information. Its key element is the novel architecture of the data processing scheme. NN consists of a large number of interconnected processing neurons (elements) working in unison to solve specific problems. This section presents the post-processing phase. First, the denoised EEG signal is further prepared to perform the feature extraction. Using these feature vectors a NN is trained. Finally, the classifier output defines the specific class (AD-affected or normal brains) of the provided inputs. As better performs the denoising stage, better the features are. Respectively, a higher classification rate is obtained. Hence, the impact of our proposed EMD denoising algorithm is verified through the application of a NN.
Feature extraction
Five features are computed over each IMF obtained after the decomposition of an EEG signal by the EMD algorithm. These statistical parameters utilized as a single input vector to the classifier, could be appreciated as a feature extraction stage in the methodology of AD detection. The statistics employed in this research are: the mean, the standard deviation, the entropy, the sample entropy and the power spectral density (PSD). Our experiments show that the aforementioned features contain significant information related to the recovered data. Thus, a feed-forward neural network [37] is fed through the extracted data.
Supervised classification
The classifier, which consists of an input layer with 5 neurons, a hidden layer with 2 neurons, and an output layer (single layer with one neuron), each one multiplied by separate weight value, is trained with the previous features. Therefore, the NN uses a supervised learning approach. The training stage has been performed using a gradient descent function with an adaptive learning rate back-propagation algorithm. A log sigmoid transfer function has been used as the activation function of this network. Typically, the network is trained through 1000 epochs until a certain value is reached. This value corresponds to the maximum error allowed (0.0001). The iteration stops when this value is attained. Finally, the output phase consists of two classes: normal and AD-affected EEG. Here, the target vector is set to [1 0] where ‘1’ represents the normal EEG and ‘0’ gives the AD class. Based on the training stage, neurons are classified in one of both.
Results and discussion
In this part, the database utilized to test our proposed methodology is presented. Furthermore, an extensive comparison with other traditional methods such as Bayes, Sure and Normal shrink is performed. Peak signal to noise ratio (PSNR) and root mean square error (RMSE) are the performance measures evaluated.
Database and programming environment
The dataset utilized in this research is not publicly available, since it was collected through a Webinar on Grid EEG conducted by the DECIDE (Diagnostic Enhancement of Confidence by an International Distributed Environment) Science Gateway in 2012. DECIDE was a FP7-funded project 1 aimed at implementing e-Infrastructure and e-Service for the extraction of Alzheimer’s markers from RMI and PET/SPECT images, and EEG. Therefore, EEG from AD-affected and normal brains of four patients at different time scales are taken as the experimental data. The data are set in standard neuroscan ASCII format, i.e., a .txt file of numbers, where the columns indicate electrodes and the rows stand for consecutive time samples. Electrodes are sorted in the ASCII file as F3, F4, T3, T4, C3, C4, P3, P4, FZ, CZ, PZ, Fp1, Fp2, F7, F8, T5, T6, O1, O2. Two classes of data are used in the database. The first class consists of 24 Alzheimer’s EEG signals. Of these, the first 12 signals are the Alzheimer’s EEG from the dataset and the next 12 signals are the Synthetic Alzheimer’s EEG data. Synthetic data is generated with a sampling frequency rate of 128 Hz on a white Gaussian signal with a peak value taken from the original model (dataset). Likewise, the second class consists of 12 normal EEG signals from the dataset and 12 synthetic EEG signals created from the normal EEG models.
MATLAB software (version R2012a - 7.14.0.739) was utilized to implement the aforementioned methodology. The following experiments were run on a common- performance system configured with an Intel Core i3-2328M CPU at 2.20 GHz, 4.0 GB of RAM and through a Windows 7 Ultimate (64-bit) operating system.
Experimentation
This section performs a deep analysis of our sophisticated technique with other thresholding schemes such as Bayes, Sure and Normal shrink. The whole evaluation is carried out by means of PSNR and RMSE measures. The former offers the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. The latter describes the differences between values predicted by a model or an estimator (de- noised EEG signal) and the values actually observed (original EEG signal). The performance degree of our methodology is compared with SE and other existing methods (see PSNR and RMSE plots through Figs 2–9). Higher PSNR and RMSE values whisper that EMD with a TE thresholding substantially provide better performance than other state-of-the-art algorithms. For AD-affected brains, a higher PSNR of 27.6837 and a lower RMSE of 0.2045 are obtained. Furthermore, for normal brains a PSNR value of 16.5435 and a lower RMSE of 0.6382 are calculated. Moreover, the other existing methods shown a high PSNR value of 6.231, which is lower compared to our proposed approach.
The comparison between both entropies Shannon and Tsallis for normal and AD-affected patients is shown in Figs 6–9. Regarding the AD-affected cases, TE performs clearly better than SE with lower PSNR and RMSE values of 9.4205 and 1.15, respectively. For normal brains, values of 8.623 and 1.72 corresponding to SNR and RMSE are computed. Notice that, SE gives slightly lower rates than TE in terms of the calculated measures. The extracted features: mean, standard deviation, entropy, sample entropy and PSD are also compared with some conventional methods for both AD-affected and normal brains. The obtained results are presented through two bar diagrams (see Figs 10,11).
AD-affected EEG signals contain significantly lower values compared to normal EEG due to its low complexity. The mean and standard deviation of AD-affected brains are found to be typically lower than normal brains. Likewise, the PSD corresponding to AD-affected signals shows small values than normal EEGs because of its complexity.
The classification process in both AD-affected and normal EEG signals was per- formed by a traditional neural network. In the learning stage, the feature vectors that contain the mean, the standard deviation, the entropy, the sample entropy and the PSD of 24 neurons (half of normal type and the other half of AD type) were the input of the first layer. Subsequently, the NN was trained. During the prediction stage, another 20 different neurons were chosen and the efficiency of the whole classification was found out.
The classification accuracy was rated through the confusion matrix (see Fig. 12). It provides the true and false positive rates for all methods (i.e., our approach and Bayes shrink, Sure shrink, Normal shrink, and Shannon entropy methods). Specifically, the first column of the first row gives the correctly classified class AD (‘0’) and the second column shows the misclassified class normal (‘1’). In the same way, the first column of the second row shows the misclassified class AD-affected and the second column provides the correctly classified class as normal EEG. Among the data corresponding to normal brains, 6 were correctly classified whereas 0 were misclassified as AD-affected in our method. Similarly, for AD patients, 10 were correctly classified providing an accuracy rate of 80%, as shown in Fig. 12. Compared with the Bayes shrink methodology that achieves a 65.0%, our approach greatly improves the final classification. Unlike, Normal shrink provides a classification rate about 60.0% due to its potential on correctly detecting normal brains. On the contrary, a total of 3 AD EEG signals were misclassified. Moreover, there is a huge step in accuracy regarding the Visu shrink approach, since only 40.0% was achieved because its low prediction rate in AD-affected EEG signals. Therefore, the worst performance corresponds to the method before noise, as the obtained classification rate was about 35.0%. It can be seen that only 3 out of 20 AD EEG signals were correctly classified, while 6 were incorrectly predicted. Even worse, only 4 normal signals were perfectly classified.
Regarding sensitivity and specificity measures (see Table 1), it is important to highlight that the former does not have relevance in the case of Visu shrink, as well as without applying any denoising methodology. Results in both cases are close to 0. On the contrary, the Bayes shrink, and the Shannon and Tsallis entropy show a highest sensitivity of 0.714, 0.75 and 0.714, respectively. The specificity presents its best value through our method (1.0), while the other approaches give lower rates.
Our proposed methodology yields the highest accuracy rates over the rest of traditional methods according to Table 1. Several advantages are shown through this research. Firstly, an analysis of the EEG in the EMD domain has been performed, showing an improvement in the diagnosis rate of AD. Also, a traditional and non- complex NN classification process has been used to boost the final classification rate without increasing the computational time. Finally, noise reduction has been achieved via an optimal Tsallis entropy thresholding.
Conclusion
Alzheimer’s disease is the most common form of dementia, a general terminology for memory loss and other intellectual skills serious enough to impede some daily life tasks. Alzheimer’s currently accounts for 60 to 80 percent of dementia cases. In this work, we propose a denoising scheme based on empirical mode decomposition and Tsallis entropy to be used for early diagnosis of Alzheimer's disease. Our research emphasizes the importance of multiscale analysis in complex electroencephalograms by means of empirical mode decomposition. Furthermore, we also propose the Tsallis entropy to determine the optimal threshold, which clearly overcomes the issues of Shannon entropy. The final results computed through our proposed approach are promising. A higher peak signal to noise ratio of 27.6837 and a lower root mean square error of 0.2045 are obtained compared to other state- of-the-art thresholding methods. The classification rate is further improved thanks to the impact of Tsallis entropy based thresholding. In particular, the performance is enhanced from 70% to 80% through this advanced method. Therefore, our new methodology has a good chance to reduce the death rate of the population suffering from Alzheimer’s disease. Future work can be directed towards the use of learning-based features computed through a deep architecture. In addition, the collection of a huge database is needed in order to widely evaluate the robustness of our method.
