Abstract
BACKGROUND:
A neurological disorder is one of the significant problems of the nervous system that affects the essential functions of the human brain and spinal cord. Monitoring brain activity through electroencephalography (EEG) has become an important tool in the diagnosis of brain disorders. The robust automatic classification of EEG signals is an important step towards detecting a brain disorder in its earlier stages before status deterioration.
OBJECTIVE:
Motivated by the computation capabilities of natural evolution strategies (NES), this paper introduces an effective automatic classification approach denoted as natural evolution optimization-based deep learning (NEODL). The proposed classifier is an ingredient in a signal processing chain that comprises other state-of-the-art techniques in a consistent framework for the purpose of automatic EEG classification.
METHODS:
The proposed framework consists of four steps. First, the L1-principal component analysis technique is used to enhance the raw EEG signal against any expected artifacts or noise. Second, the purified EEG signal is decomposed into a number of sub-bands by applying the wavelet transform technique where a number of spectral and statistical features are extracted. Third, the extracted features are examined using the artificial bee colony approach in order to optimally select the best features. Lastly, the selected features are treated using the proposed NEODL classifier, where the input signal is classified according to the problem at hand.
RESULTS:
The proposed approach is evaluated using two benchmark datasets and addresses two neurological disorder applications: epilepsy disease and motor imagery. Several experiments are conducted where the proposed classifier outperforms other deep learning techniques as well as other existing approaches.
CONCLUSION:
The proposed framework, including the proposed classifier (NEODL), has a promising performance in the classification of EEG signals, including epilepsy disease and motor imagery. Based on the given results, it is expected that this approach will also be useful for the identification of the epileptogenic areas in the human brain. Accordingly, it may find application in the neuro-intensive care units, epilepsy monitoring units, and practical brain-computer interface systems in clinics.
Keywords
Introduction
Neurological disorders are a common set of ailments that represent a major public health problem that affects the functional and electrical activities of not only the human brain, but also the spinal cord. According to reports, there are over 1000 diseases of the nervous system, such as epilepsy, dementia, and Parkinson’s disease, with several symptoms such as seizures, confusion, and paralysis [1]. A report published by the World Health Organization (WHO) points out that 50 million people have epilepsy worldwide, a number that increases by 2.4 million every year [2]. Additionally, the report projected that the number of people affected by dementia will double every 20 years. As per another report published in the US in 2011, approximately 100 million Americans were afflicted by at least one of those 100 neurological diseases [3].
These brain disorder diseases are analyzed using various techniques, one of which includes electroencephalogram (EEG) signals. EEGs are time-series recordings of the electrical activity of the human brain with high spatial resolution [4]. The EEG monitoring techniques are typically simple where electrodes are placed along the human scalp to identify abnormalities in brain activity [5]. An important characteristic of EEGs is their convenient acquisition and temporal high-resolution on a millisecond scale, which makes it possible to detect rapid changes in brain activity [5]. Furthermore, the EEG recording process is non-invasive and relatively cheap in comparison to other medical imaging modalities [4,6].
Traditionally, the interpretation of EEG recordings is performed by a clinician through visual inspection, which makes it annoying and timeconsuming, especially in the case of long- term recordings. As such, the visual inspection of these massive data records by clinicians is an onerous task. In this regard, machine learning based brain-computer interface (BCI) systems premised on EEG signals are highly demanded. A great number of traditional machine learning approaches have been applied in developing BCI systems and medical imaging applications, including artificial neural networks (ANNs) [7,8]. These ANN capabilities have led to the early adaptation of ANNs in the analysis of medical imaging; however, it is also noteworthy that this analysis usually entails huge datasets that cannot be easily interpreted by traditional ANN methods, which pave the way for deep neural networks.
In the existent literature on deep learning, the dominant method for training neural networks is the back-propagation (BP) algorithm [9], which, when combined with the stochastic gradient descent (SGD) approach, is capable of modifying each neural network weight to radically decrease the loss function [10,11]. For many years, this methodology has proven to be remarkably effective for all kinds of learning tasks in various applications, albeit with major shortcomings such as slow convergence and possible trapping in a non-optimal local minima [12]. The situation would exacerbate if we increase the depth of the neural network; i.e. adding more hidden layers, which gives rise to the problem of gradient vanishing, as is the case in the convolution neural network (CNN) [13] or recurrent neural network (RNN) [14].
Natural evolution strategies (NES) refer to a family of black-box optimization algorithms that can train the traditional ANNs with evolutionary algorithms that utilize the natural gradient for the purpose of updating a parameterized search distribution in the direction of higher expected fitness [15,16]. This capability is based on the notion that was proven two decades ago, as per which natural gradients work better than their ordinary counterpart in training ANNs through the BP algorithm [17,18]. Unlike an ordinary gradient, descent-based optimizers do not impose any major restrictions on the underlying cost function (e.g. differentiability). This is attributed to the fact that the natural gradient takes into consideration the geometry of the manifold in which the weights of neurons evolve.
Motivated by the success of NES in training traditional ANNs, we developed a deep learning algorithm that comprises of many layers, such as input, convolution, rectified linear unit (ReLU), pooling, fully connected layer, and output layers [19]. All of these layers play the same role that is observed in standard CNN [20], with the exception of the ReLU layer. In the ReLU layer, we represent the function of ReLU activation using natural evolution gradients. Therefore, we emphasize the need for a proposed approach such as natural evolution deep learning (NEODL) [21]. In this paper, NEODL forms part of a complete pattern recognition approach to achieve the classification step of EEG signals. The proposed approach comprises of four steps: preprocessing, feature extraction, feature selection, and feature classification using L1-norm principle component analysis, wavelet transform, artificial bee colony (ABC), and NEODL, respectively.
The proposed approach achieves promising performance on two public EEG datasets, namely the Bern-Barcelona dataset [22] and the BNCI-Horizon 2010 dataset [23]. In this regard, several experiments were conducted using different deep learning methods. In addition, we conducted an experiment using the standard CNN with an ordinary ReLU to make a comparison with our model that uses the natural gradient ReLU for the purpose of observing the impact of using natural evolution gradient in the deep architecture. Using both datasets, the NEODL classifier was found to outperform other deep classifiers via five different performance measures wherein all experimental conditions were unified. Moreover, the overall proposed approach compares favorably with existing approaches that use both datasets and have been reported in the literature. The main contributions of this paper can be summarized as follows:
We developed a natural evolution optimization based deep learning classifier. The simulation results of counterpart classifiers revealed a maximum neurological disorder classification accuracy of 98.9% using the proposed classifier. The hybrid approach, ABC and NEODL, outperforms reported approaches, which is why it can be employed at hospitals for the automatic detection of neurological disorders.
The rest of the paper is organized as follows. Section 2 presents an overview of the recent achievements of EEG signals classification. The steps of the proposed approach and the employed EEG datasets are outlined in Section 3. Section 4 details the experimental settings and results, and the discussion and analysis are given in Section 5. Finally, Section 6 summarizes the work.
Related work
In this section, we review the recent achievements of epilepsy disease detection and motor imagery classification using both datasets employed in this paper. With regard to the epilepsy detection using the Bern-Barcelona dataset [22], Das et al. in 2016 [24] proposed a hybrid method that combines empirical mode decomposition and discrete wavelet transform for EEG feature extraction and classification, respectively, with a 89.4% classification accuracy. In 2017, Chatterjee et al. [25] presented a non-linear analysis of focal and non-focal EEG signals using multifractal detrended fluctuation analysis (MFDFA). Their study entailed the extraction of a set of four features from the MFDFA before being classified using both SVM and k-NN classifiers, with accuracies of 92.2% and 91.7%, respectively. Similarly, Sharmaa et al. [26] proposed an automated system that decomposed the EEG signal using the tunable-Q wavelet transform (TQWT) procedure. This system was able to attain a 95% classification accuracy using the least squares-SVM (LS-SVM) classifier. Furthermore Arunkumar et al. [27] proposed a system that extracted three entropy-based features and fed them into six different classifiers. As per their findings, the non-nested generalized exemplars classifier attained the highest accuracy of 98%, with a sensitivity of 100%, which is the highest sensitivity achieved in the literature so far.
Recently in 2018, Bhattacharyy et al. [28] presented an automatic detection approach that could determine the area linked to the focal epilepsy. The area parameters were classified using LS-SVM to identify both focal and non-focal classes. They achieved classification accuracies of 90% and 82.5% using 50 and 750 pairs of focal and non-focal EEG signals, respectively. Meanwhile, Raghu et al. [29] introduced a computerized system that extracts a set of 28 features from time, frequency, and statistical domains, wherein the best features were selected using neighborhood component analysis. Employing only seven features, the performance of the algorithm was assessed using SVM that attained the highest accuracy of 96.1%. In 2019, Prasanna et al. [30] proposed a detection system that implemented the local binary pattern (LBP) method for feature extraction step and ANN for classification step with an accuracy of 93.2%. Fasil et al. [31] extracted the exponential energy feature which attained a classification accuracy of 89% using SVM.
The related work in motor imagery (MI) classification using BNCI Horizon 2020 [23] dataset was utilized in our experiments. Recently, this dataset elicited the attention of researchers. For example in 2017, Liu et al. [32] presented a method that combines the learning automata and the firefly algorithm for best features selection, in addition to the spectral regression discriminant analysis for features classification. In 2018, Arnin et al. [33] proposed a study concerned with the real time processing of EEG signals for BCI systems, wherein a set of feature extraction methods were evaluated.
In this paper we demonstrated that the CNNs and RNNs outperformed other deep models [34]. Several CNN and RNN architectures have been developed which are e.g. different in number of convolution and output layers, kind of layers connectivity, type of activation functions, and input formulation. Although some research has shown high accuracies, we noticed that these deep approaches were assessed on old datasets where the number of EEG instances is small, thus reducing their practicality as clinical applications. For example, the Bonn dataset [35] includes a small number of instances (500 instances), which is not enough to train and test a deep learning model [36,37]. Moreover, all the deep approaches employed in previous works use activation functions on the basis of ordinary gradients [34]. Our proposed approach therefore employs the powerful computation of natural evolution strategies (NES) along with the deep neural networks to develop a novel pattern recognition approach for EEG signals classification. It is worth noting that we conducted a comparison between the proposed approach and the aforementioned related works.
Materials and methods
The main section of this paper shows a description of the two datasets employed in our experiments. This section describes the main steps of the proposed approach, as depicted graphically in Fig. 1.

The overall architecture of the proposed approach for EEG classification.
The EEG signals have been recorded by using various continuous examination processes, as a result of which they may suffer from poor spatial resolution, low signal to noise ratio, and various artifacts [38]. In our experiments, we enhanced the raw EEG signals by using the L1-norm principal component analysis (L1-PCA) method. L1-PCA is a mapping approach for multiple variable data analysis. Arguably, it is the most straightforward modification of the original PCA method [39]. The original PCA and its variant seek a number of orthogonal axes, where the original PCA determines the data representation as the gross of what is known as the L2-norm of the data samples mappings into the new space. In contrast L1-PCA uses the gross of the L1-norm of the data samples mappings into the new space [39].
Both approaches have been used extensively in the existent literature for dimensionality reduction with regard to data compression. In terms of EEG signals many research papers have used the original PCA method to remove noise and artifacts from EEG signals, see for example [40]. However, the original PCA is known to be affected easily by outliers where the L2-norm representation of the original PCA places squared affirmation on the magnitude of each data point coordinate, ultimately over affirmation of marginal data points, such as outliers. In contrast, L1-PCA puts linear affirmation on each data point coordinate, thus effectively restricting the outliers [39]. Accordingly, this paper employs the L1-PCA to eradicate the noise and artifacts from EEG signals as a preprocessing step.
In a mathematical representation, X is the matrix of an EEG signal, given as follows:
It is clear from Eqs (2) and (3) that the L1-norm retains the sum of absolute data entries of its argument, whereas the L2-norm retains the sum of the squared data entries of its argument. Accordingly, the L1-PCA value of each EEG signal is computed, which has a nuclear norm maximization problem that must be eliminated before removing the outliers’ information [39]. Then, the problem is similar as the original PCA and can be solved using the matrix Q containing the K dominant singular vectors X. The outlier data points are removed using the singular value decomposition (SVD) approach [41]. Subsequently, the maximization problem of Eq. (3) can be defined as follows:
It is demonstrated that if BNM denotes the actual solution to the binary nuclear-norm maximization problem of original PCA [39], then the actual solution to the L1-PCA problem described in Eq. ((3)) is:
As mentioned before, the EEG signals are non-stationary and subject to frequency alteration with time [4]. The second step of feature extraction requires decomposing of the EEG signal into a number of frequency bands, which is achieved using the wavelet transform which is a powerful tool for timefrequency representation of time series signals, such as medical signals [42]. It has the ability to capture appropriate frequency information at low frequencies as well as appropriate time information at high frequencies, a capability that is very useful for the analysis of EEG signals [42]. Furthermore, it can be used to localize the EEG signal transient variations, which is mostly present during seizure onsets [43].
In our experiments, the wavelet transform is carried out using maximal overlap discrete wavelet transform (MODWT) [44], where the fourth order Daubechies (Db4) wavelet filter was used [45]. Daubechies wavelets are essentially wavelets with a compact support, which provides the approximation properties of wavelet expansions. They lack explicit expressions of their own and are set by the coefficients of the filter itself. For this reason, they can represent high-frequency and low-frequency coefficients of the filter. The MODWT decomposes the EEG signal into five frequency bands namely delta (0.5–4 hz), theta (4–8 Hz), alpha (8–13 hz), beta (13–30 hz), and gamma (30–60 Hz). Since each sub-band is involved in a specific brain function, we are able to track any deviation present in the brain functions and activities. Once the EEG signal is decomposed into sub-band frequencies, the four statistical features included in Table 1 are computed and extracted from each sub-band of each subject’s signal in the employed dataset. Thereafter, the extracted features are fed into the feature selection step in order to select the optimized features for classification.
Feature selection
If we extract four features from each sub-band among the five sub-bands decomposed from each EEG signal out of the total EEG signals of each patient, we find that the number of total features is large in dimensions, thus posing a challenge for any classifier. Therefore, the main objective of the third step is to select a subset from the expected large number of features that are more robustly discriminated for classification. In the proposed approach, the best features are optimally selected using the artificial bee colony (ABC) algorithm that simulates the foraging behavior inside a bee colony [46].
Statistical features that are extracted from each sub-band
Statistical features that are extracted from each sub-band
The bee colony comprises of three kinds of bee groups: employed, onlooker, and scouts. The food sources information is updated by the groups through time. The objective of any kind of bee is to identify the locations of food sources rich in nectar until it finds the source with the highest nectar. The ABC optimization algorithms simulates this behavior by combining local search approaches, carried out by onlooker and employed bees, with global search approaches conducted by onlookers and scouts in an attempt to balance exploitation and exploration processes [46]. In such a simulation, the location of a food source represents a solution of the optimization problem, while the amount of nectar of a food source denotes the quality (or fitness) of such solution. The mission of bees to search best food sources is similar to seeking an optimal solution. During the searching process, the number of onlookers and employee bees is equal. This means that each problem offers possible solutions (optimized features).
In a mathematical representation, the feature space is represented as follows:
After generating the position value of neighboring food information, the fitness value is computed to select the best feature by applying a greedy property. If the fitness value of V
i
is found to be higher than the parent X
i
, then update the parent with the new generated solution V
i
; or otherwise keep the parent X
i
unchanged. The fitness value is a probability based selection via a roulette wheel selection process that can be represented as follows:
The last step of the proposed approach is the classification of the neurological disorder using the proposed natural evolution deep learning (NEODL) model. The proposed deep learning architecture comprises of the following layers: input, convolution, activation function or ReLU, pooling, fully connected, and output layers, as illustrated in Fig. 2; in a similar fashion of the standard CNN [20]. If repeated, the convolutional layer is the representative structure of the deep model. It consists of many filters where their weights will be learned through training, where some kernel values must be decided before processing another input feature. The kernel depth is equivalent to the number of features in the feature space. The outputs of the convolution layer are termed feature maps and their number is predicated on the number of employed filters. Thus, it can be inferred that the hyper-parameters of the classification problem are the number of filters (or the stride). These filters separate each application of the same filter along the input signal, which is why they need to be tuned to find an optimum number.
The feature maps produced by convolving the trained filters through the input signal are usually fed into a nonlinear gating function referred to as the activation function, such as ReLU. In the standard CNN, the ReLU activation function not only prevents the network from vanishing gradients and saturation problems, which represents common impediments when working with deep networks, but also improves the overall classification performance [47]. However, ReLU is still fraught by two major challenges: bias shift and zero gradient. The zero gradient in negative arguments will prevent, or perhaps stop information propagation, thus causing various errors in the backpropagation computation, resulting in the death of some neurons. The bias shift is a phenomenon that occurs when the average of a function’s output values is always positive. Since ReLU retains the positive arguments and compels the negative ones to zero, the average of the output values will always be positive, thus limiting the convergence speed and accuracy [48].
Motivated by the capabilities of natural evolution strategies (NES), we replaced the standard ReLU activation function with the natural evolution activation function [10]. The underlying idea of NES is to employ the search gradients in order to update the distribution parameters [49]. The search gradient is simply defined as the sampled gradient of the expected fitness. Therefore, in our treatment, the natural gradient activation function is applied to the ReLU layer because it works effectively on the natural evolution optimization algorithm. Depending on the fitness value of each search sample, this natural evolution optimized algorithm is used to train the selected features. During the training phase, the best fitness value is selected as the effective activation function.

The deep network based classification of the EEG signal.
In a mathematical representation, the expected fitness of the search distribution can be represented as follows:
The next layer is the pooling layer, which has the same role in standard CNN, and has a dimensionality reduction purpose and controls the number of parameters in order to reduce the computational burden. The next layer is a fully connected layer, which is similar to any classical neural network. Thereafter, the network output is computed based on the fully connected layer’s production. In the proposed framework, the previous layer activation function is chosen and compared with the present function of the output layer. The computed value is compared with the trained feature in order to classify the output value. This process is repeated continuously to identify the status of the incoming brain features.
Two benchmark datasets are employed in this study. The first contains epilepsy samples and the second includes motor imagery samples that are used for neurological rehabilitation. We are motivated by many reasons to specifically select these datasets, including their public availability, and extensive use of researchers, which enabled us to compare our approach with the existing work. In addition, both datasets have considerable volume, which is beneficial in providing a real statistical significance of the extracted EEG features by all methods. Furthermore, the samples of both datasets facilitate our experiments since they are available in MATLAB formats.
Description of dataset I
The first dataset is the Bern-Barcelona focal and non-focal EEG dataset [22]. The samples of this dataset pertain to patients who suffered from drug-resistant temporal lobe epilepsy [31]. The dataset contains intracranial EEG recordings from five pharmacoresistant epilepsy patients who were admitted to the hospital for surgical operations. It comprises of two classes of EEG signals: focal and non-focal. Each class contains 3750 pairs of simultaneously recorded signals in a random fashion via two channels: x and y. Each of these 3750 EEG records contain samples with a 20 second duration, with a total of 10,240 samples, where the sampling frequency was 512 Hz. However, the frequencies above 60 Hz can be utilized to identify the seizure origin [50,51]. In our experiments, we only considered frequencies in the range of 0–60 Hz [52]. The reason behind this is that we wanted to unify our experimental conditions with those of other reference models. Figures 3 and 4 illustrate a sample of EEG focal and non-focal signals, respectively, where x and y are simultaneously recorded from adjacent channels.

A pair of focal EEG signals. (a) “x” signal and (b) “y” signal.

A pair of non-focal EEG signals. (a) “x” signal and (b) “y” signal.
The second dataset contains samples of motor imagery (MI) selected from an EEG datasets archive known as brain neural computer interaction (BNCI) Horizon 2020 [23]. It is important to note that BNCI Horizon 2020 is a coordination project funded by the European Commission’s Framework Program 7, with the aim of enhancing the collaboration among the BCI developers. MI is an essential tool for stroke patients, as it can facilitate the recovery of damaged nerves and plays an important role in the training of patients’ rehabilitation. The classification accuracy of MI has a great impact on the operation of BCI systems [53].
This dataset consists of three bipolar recordings where the electrodes were placed at C3, C4, and Cz in accordance with the international 10–20 system, as depicted in part (a) of Fig. 5. Using feedback, the subjects learn to control their brain activities through MI signal acquisition sessions. During each acquisition session, the subject was asked to imagine two movements: left and right, with a directional visual cue presented on a screen. The imagination takes four seconds after the cue is initialized once the subject hears a short beep sound. Part (b) of Fig. 5 depicts the overall scenario of the MI task used to record this dataset. Each subject participated in an experiment of two sessions that was conducted in two days. Each session contained six runs, with each run repeating ten trials. Then, a total of 240 trials per subject were gathered, with 120 repetitions for each of the right and left MI class during the two sessions, subsequent to which the EEG signals were sampled at 128 Hz frequency [54].

BNCI Horizon 2020 dataset. (a) Places of electrodes C3, Cz, and C4 on the head and (b) time scheme scenario during an acquisition session. This diagram is adapted from the original publication [54].
Experimental setting
For the proposed deep learning model, the convolution layer includes 36 filters, with a 1 × 5 kernel size, stride = 1, and same padding. We tried to add another convolution layer with a pooling layer in between, but the performance was found to improve a little in terms of the computation cost. The number of units in the fully connected layer depends on the size of the tensor after the convolutional layer. In order to avoid overfitting during the training phase, there is a dropout layer after both the convolutional and the fully connected layer. The number of units in the output layer is one based on the number of classes in the data at hand. For the sake of simplicity, the output layer uses a sigmoid function as an activation function. The network was trained for datasets I and II with a maximum of 500 and 600 epochs, respectively.
All experiments were conducted using MATLAB toolbox running on 64-bit Windows 10 operating system at a processor Intel Core i7-6700 CPU @ 2.6 GHz with a memory DDR3 of 16 GB. In order to create a unified experimental setting with those reported articles in the literature that used both datasets, a tenfold cross-validation strategy was adopted during our experiments, after which the average maximum accuracy was retained. For dataset I, in each fold of the ten folds, the EEG signals belong to focal and non-focal classes for both channels “x” and “y” that are divided into ten smaller subsets. Of the ten subsets, two subsets are used to train the classifier, whereas eight sets are used to assess the classifier. For dataset II, the same experimental setting as in the original publication [54] was adopted.
The performance of all classifiers adopted in this paper is evaluated in terms of the following five performance measures:
Equations (14)–(18) show the five performance measures used in our experiments, namely accuracy (ACC), sensitivity (SEN), specificity (SPE), precision (PRE), and F1-Score (F1S). It is widely demonstrated that the most important measures are the first three measures, which we relied on during comparison. The result of ACC measures the percentage of correctly classified instances out of the total number of instances in the whole dataset. The metric SEN measures the proportion of actual positive instances which is correctly identified as such. The metric SPE measures the proportion of negative instances which is correctly identified as such, i.e. negative [52].
In this section, the results of two major elements of the proposed framework, namely ABC-based feature selection and NEODL-based feature classification, are given sequentially.
ABC-based feature selection
The performance of ABC-based feature selection is evaluated and subsequently compared with similar evolutionary optimization methods, such as the genetic algorithm (GA), particle swarm optimization (PSO) and iterated local search (ILS) [55]. Table 2 depicts the performance of the ABC and reference methods to achieve the feature selection task in our approach, where all methods were implemented impartially in unified experimental conditions. It is evident that the number of features selected by ABC in both datasets and by PSO in the second dataset, is less than those selected by other methods. At the same time, the accuracy of selecting the optimum features by ABC was found to outperform other methods. As mentioned in the previous section, the ABC method optimally selects the features in the particular work space by employing the probability fitness function before eventually selecting the most optimized features as the input for the classifier [46].
Results of the number of selected features (SF) and selection accuracy ACC (%) of the ABC and other optimization methods for both datasets
Results of the number of selected features (SF) and selection accuracy ACC (%) of the ABC and other optimization methods for both datasets
The features selected in the previous step will be used to train and test the classifiers in the current step. It is without a doubt that the successful optimized selection of brain features enhances the overall process of neurological disorder classification. Then, the proposed deep learning model identifies the deviation of computed output values from the expected ones. In our experiments, the deviation is measured using the five performance measures defined in Eqs ((14))–((18)) for both datasets. Tables3 and4 depict the performance results of the proposed NEODL approach in comparison to other deep learning approaches for both datasets. The reference approaches are multilayer perceptron (MLP) [56], radial basis function (RBF) [57], deep belief networks (DBN) [58] and convolution neural network (CNN) [59]. The parameters selection of any deep model is performed automatically and the highest accuracy of the corresponding model is retained.
It is worth mentioning that the results of CNN in both tables use the standard ReLU, while we have used the natural evolution gradient of ReLU in our deep algorithm, which was similar to the standard CNN. From both tables, it is clear that the NEODL approach attains the best performance values in comparison to other classifiers. Importantly, it must be noted that the standard CNN with natural evolution ReLU outperforms the standard CNN with the standard ReLU used in the existing literature [60].
Comparison of feature classification results based NEODL and other deep learning algorithms through five performance metrics for dataset I
Comparison of feature classification results based NEODL and other deep learning algorithms through five performance metrics for dataset I
Comparison of feature classification results based NEODL and other deep learning algorithms through five performance metrics for dataset II
In subject-wise experiments, Tables 5–9 show a detailed performance of all classifiers through five performance measures separately for each subject in dataset II. We attempted to sort the methods and their results in an ascending order. Evidently, the detailed performance of the proposed NEODL classifier is much better than the other deep model as well as standard CNN that uses the ordinary ReLU.
Subject-wise classification accuracy for all deep models for dataset II
In the existant literature, several methods that analyze both datasets have been employed in this paper. These methods are listed in Tables10 and11 for datasets I and II, respectively. For dataset I, some of these studies used the first 50 EEG signals for experiments, whereas our results were based on the entire dataset. In contrast, very few research studies have used dataset II because of its big size. In this section we will focus on the feature selection and feature classification steps since these are the major contributions of this paper.
Subject-wise classification sensitivity for all deep models for dataset II
Subject-wise classification sensitivity for all deep models for dataset II
Subject-wise classification specificity for all deep models for dataset II
Subject-wise classification precision for all deep models for dataset II
Subject-wise classification F1-score for all deep models for dataset II

Feature classification of deep models. (a) Dataset I and (b) dataset II.
As shown in Table 2, it is clear that ABC outperformed other algorithms for the feature selection step. All the reference algorithms have been used frequently in the existant literature for the feature selection task due to their computational features. The major advantage of the GA algorithm lies in the ability to achieve the global search of data for the group search strategy and settings of evolutionary operators. However, GA cannot take full advantage of local information of search space. Moreover, it takes a lot of time to converge it into the optimal solution [32]. Similarly, PSO is an efficient algorithm, where it can adaptively select the feature based on their weights and ensures that the selected particles are capable of performing a fine global search ability and rapid convergence rate. Nevertheless, it is easy for PSO to fall into the local optimal solutions [32].
In contrast, in an attempt to consider both exploitation and exploration, ABC combines the properties of local search methodologies, carried out by employed and onlooker bees, with the properties of global search methodologies, carried out by onlookers and scouts. Additionally, it has the ability to increase the classification accuracy by selecting the appropriate sub-features to reduce the information redundancy. Table 2 shows that the ABC method selects fewer features to converge in comparison to other methods while at the same time attaining the highest accuracy. Accordingly, the feature selection step using ABC is a prerequisite for the feature classification step.
For the feature classification task, Figure 6 shows a graphical evaluation of the improvement achieved by the NEODL-based classification as compared to other deep learning models through five performance measures for both datasets. Needless to say, the reference models have been used extensively in the literature for EEG classification [56–59], and we have recently employed these models in their best standard settings. As mentioned before, the activation function used in CNN model experiments is the standard ReLU. Therefore, the superiority of NEODL results is attributed to the impact of the proposed ReLU based on the natural evolution gradient. It is apparent that this performance is not random or incidental, since NEODL outperformed its counterparts on the five performance scales. In fact, the higher accuracy rates is an indication of superior discriminatory performances during the assessment of mental tasks.
Comparison with existing approaches for dataset I. Results are listed in an ascending order
NoF: Number of features, KoF: Kind of features, Stat: Statistical, Fre: Frequency, SM: Spectral moment.
Comparison with existing approaches for dataset II. Results are sorted in an ascending order
More evidence for stable performance is provided by the comparison that we conducted with the recent existing models in the literature. Tables10 and11 conclude these comparisons for both datasets. In order to provide a fair comparison, both tables show, when applicable, all relevant information such as the number and kinds of features, as well as the methodology used in each step. We can highlight several observations from Table10, which shows the comparison with 15 existing methods. For the first observation, the proposed approach using NEODL showed a good improvement for the existing approaches in two of the three measures, with an accuracy of 98.9% and specificity of 99.5%.
For the sensitivity measure, NEODL attained 95.6%, which is less than that the methods used by Arunkumar [27], Raghu [29], and Sharma [26], which attained sensitivities of 100%, 97.6%, and 96.4%, respectively. This degradation in sensitivity implies that a number of erroneous detections of abnormal features caused such a reduction in sensitivity compared to other approaches. Nevertheless, it worth noting that the authors in [27] pointed out that they implemented their method only on a limited size of data. They also stated that the method and their entropy computations need to be checked with other large EEG databases in order to yield consistent results. Of course, more data translates into more challenges to the classifiers, which may attribute the hype of using deep learning in biomedical applications [68].
The second observation in Table10 is that the specificity result of NEODL is close to its accuracy result. This implies that the NEODL could classify normal samples as they are much better than other approaches. According to the aforementioned observations in Table10, it seems that SVM is capable of detecting abnormal features better than NEODL, whereas the potential of NEODL to classify normal features in EEG signals is better than SVM. The third observation is that the proposed approach that uses CNN as a classifier attained an accuracy of 95.7%, which is higher than that of several of existing works that are listed in the upper half of Table10.
For dataset II, the proposed approach outperformed the existing approaches on the accuracy level only, where the reference methods did not calculate any of the other performance measures. The proposed approach attained 80%, whereas other methods attained results between 57%–73%. This superiority relates to the impact of the proposed hybrid approach: ABC and NEODL, together. Overall, in both tables, the performance of NEODL is compared favorably with common classifiers, such as SVM and its variant LS-SVM, which has been widely adopted in previous works. In addition, most of these approaches handled handcrafted features, but deep learning approaches have demonstrated a better classification results based on the automatic selection of the optimized features via ABC.
A final observation we observed is that there is a degradation in the overall performance when using dataset II in comparison to dataset I. In our opinion, this is attributed to the characteristics of each dataset. The Bern-Barcelona EEG dataset [22] was recorded using intracranial sensors from only five patients, which showed a consistently higher energy for focal signals. In contrast, the BNCI dataset [23] includes recordings from nine patients from surface electrodes that introduce a higher level of noise and a different frequency pattern as compared to intracranial sensors.
In this paper, we presented a signal processing chain for the classification of epilepsy disease and motor imagery in human brain using EEG signals. The following conclusions can be drawn:
The proposed framework includes four elements: preprocessing, feature extraction, feature selection, and feature classification. For the preprocessing step, we used L1-principal component analysis. Meanwhile, we used the wavelet transform for the feature extraction step. The artificial bee colony is employed for optimally selecting the best features. A natural evolution optimization based deep learning (NEODL) approach is developed in this chain for the purpose of feature classification. The NEODL approach is inspired by the computation capabilities of natural evolution strategies (NES) that offer a principled method to real-valued evolutionary optimization through the calculation of the natural gradient of the expected fitness on the parameters of search distribution. We reformulate the activation function layer using NES in the architecture of the NEODL deep architecture. The experimental results validated the proposed framework, where we obtained 98.9% and 79.9% for classification accuracies for two benchmark datasets, namely the Bern-Barcelona dataset and the BNCI Horizon 2010 dataset. Our classification rates in both datasets were found to outperform other deep learning-based classifiers as well as many works in the existing literature. In addition, the proposed approach was found to outperform existing approaches in specificity and sensitivity. The proposed framework is expected to be useful for the identification of the epileptogenic areas. For this reason, it may find application in the neuro-intensive care unit, epilepsy monitoring unit, and practical brain-computer interface systems in clinics. Another expected expansion of the proposed framework is to investigate its performance for larger datasets with more patients, including the dataset of the EPILEP-SIAE project (http://epilepsy.uni-freiburg.de/epilepsiae-project) that was recently introduced by the European Epilepsy Project. Moreover, it can also be employed to identify other specific neural diseases, such as alcoholism, autism, and dementia.
