Abstract
The classification accuracy has become a significant challenge and an important task in sensory motor imagery (SMI) electroencephalogram (EEG) based Brain Computer interface (BCI) system. This paper compares ensemble classification framework with individual classifiers. The main objective is to reduce the inference of non-stationary and transient information and improves the classification decision in BCI system. The framework comprises the three phases as follows: (1) the EEG signal first decomposes into triadic frequency bands: low pass band, band pass filter and high pass filter to localize
Keywords
Introduction
The advancement in technology like wireless recording, signal processing techniques, computer algorithms, and brain sciences made possible task of converting brain signals into control signals for computer or any other electronic devices. Specifically, rehabilitation of physically impaired patients and people with brain damage/diseases (e.g., amyotrophic lateral sclerosis suffering people, stroke patients) has been positively impacted on large scale [1, 2, 3]. Electroencephalography (EEG) is a method to record the brain signals non-invasively from the scalp of the brain [4, 5, 6]. Despite of high signal to noise ratio, this method is considered convenient as it does not require any surgical procedure. The use of BCI system comprises two phase: in first phase system is calibrated and known as training phase, in second phase the BCI system is used online to translate the recognized brain activity patterns into control commands for a computer [7]. The basic framework to implement an online EEG-based brain computer interface system is a closed-loop, which starts with acquiring/recording specific EEG patterns of user (e.g. Motor imagery or using visual stimuli), preprocessing of acquired EEG signals, feature extraction and feature selection from these preprocessed signals, classification of EEG signals, and user feedback. Every step of this protocol involves algorithms which can be optimized for better performance of brain computer interface system [8, 9].
EEG signals are nonstationary and transient signals, thus extracting patterns from them efficiently and accurately are challenging tasks. There exists data shift in datasets which occurs due to change in the class definitions with time. The cause of non-stationarity in EEG signal is inter-session variability and inter-subject variability like neurophysiological conditions, noise and artifacts, psychological parameters etc. [10]. According to the authors of [10], the two main contributing factors of non-stationarity are (i) difference between samples of training session and samples of online session, and (ii) change in brain activity of the subject. Active and passive learning are two approaches adopted by machine learning algorithms for non-stationary systems [10]. Passive learning approach consider the assumption of continues change in input data distribution over time. Thus, systems have to learn the change continuously whenever the new data is coming or in batches of data. Whereas, active approach first detect the data shift and adapt the changes accordingly. Transfer learning and domain adaption handles this problem by exploiting the knowledge about one task, while learning another related task. A detailed survey has been presented by Pan et al. [11] on transfer learning for more detailed illustration on transfer learning. EEG based brain computer systems also suffers from the problems associated with non-stationary characteristics of EEG signals. The quasi-stationary dynamics in EEG is approximately 0.25 sec [12]. The inter-session variations and inter trial variations causes a significant cause of data shift in EEG based BCI. Ensemble learning is a method to combine a set of classifiers to solve the classification problem where each classifier takes part in decision. The combining of classifiers in an ensemble can be done using weighted or un-weighted voting, bagging and boosting [13]. The ensemble classification aims to build combinations of classifiers which learns from training data sets and validated on the test data sets. The final output is then computed by combining the output predicted class decision of these classifiers. A wide range of literature is available on combining strategies; this paper considers Weighted Majority (WM) [55] and Dynamic Weighted Majority (DWM) [56]. As ensemble of classifier uses diversity of information and provide good results as compare the individual classifier. So, the main objective of the paper is to use the advances of WM and DWM to increase the classification accuracy of the EEG classification based on the frequency band found by non-dyadic wavelet decomposition in signal preprocessing phase. The framework follows the steps as: (1) the EEG signal first decomposes into triadic frequency bands: low pass filter, band pass filter and high pass filter to localize
Rest of the findings proceeds as follows: Section 2 cover the background information on non-dyadic wavelet decomposition, and Ensemble Learning methods in BCI systems. Section 3 explains the proposed methodology for Dynamic weighted majority (DWM) [56] ensemble classification framework to classify the features extracted using CSP and train variety of classifiers. Further, select the subset of classifiers based on global prediction system from ensemble library to improve the classification accuracy. Section 4 gives the empirical results and performance discussions. Paper concludes in Section 5.
Background
Non-dyadic wavelet decomposition of EEG signals
Wavelet series expansion decomposes the finite energy functions for analysis of the same. Thus, basis functions must be regular, well localized and of finite energy [60]. It is convenient to take special values for s and
with
The time-frequency localized basis functions are popular among the researchers for the applications like analysis of acquired signals [15], image coding [16, 17], features extraction [18, 19, 20]. Orhan et al. [21] and Ubeyli et al. [22] have implemented two-band wavelet filter banks and extracted the features, then classified the features into predefined classes. Authors [23] took two frequency bands each of
where
The
where the factored matrices
Ensemble based learning uses set of classifiers for classification of new data points by taking votes of these classifiers. It could be weighted or un-weighted voting scheme. Originally Bayesian averaging was the concept, but along with time many more ensemble methods has been developed like Bagging, Boosting, and many variants of it. Author Dietterich [13] has suggested that ensemble method helps to improve the statistical problems like when hypothesis space is too large or when the computationally it is not possible to find best hypothesis. Ensemble learning can either be independent or dependent. In independent ensemble method the decision of classifiers are independent of each other and a voting technique is used for final decision. Whereas, in dependent ensemble learning, output of one classifier is used to construct another classifier. The basis of the ensemble framework depends upon the diversity among the base learners i.e. high individual classification error of classifiers involves high accuracy. Bagging and boosting are examples of independent ensemble learning and dependent ensemble learning approach respectively. Stacked ensembles technique has used by authors Liu et al. [49]. They have used LDA, ANN, to identify the motor imagery movement (left and right hand) using ERD/ERS (Event-related desynchronization, event-related synchronization) features. Liyanage et al. [50] have done the multi-class classification (left hand movement, right hand movement, and both feet, tongue) of EEG signals using dynamic weighted ensemble classification. They have used the different variants of Support vector machines as base classifiers for their ensemble. Rahimi et al. [51] have done same work of multi-class classification using stacking and bagging ensemble techniques. They proposed ensemble combination of decision tree, SVM and LDA to classify extracted CSP (filter bank) features. Mohammadpour et al. [52] has used their ensemble technique of bagging and boosting to classify different features, common spatial pattern (CSP), power spectral density (PSD), adaptive autoregressive coefficients, discrete wavelet coefficients, and the the results. The base classifiers includes Decision stump and K-nearest neighbor classifiers. Ramos et al. [53] has used eleven base classifiers and used Stacking for classification of motor imagery movements. The eleven different classifiers were SVM and its variants of SVM, K-nearest neighbor, LDA and its variants of LDA, wavelet packet decomposition, and radial basis function. Three different types of ensemble arrangements has been proposed by Datta et al. [54]. They have used bandpower, wavelet-based energy-entropy, and adaptive autoregressive features and compared the results classified by these three ensemble techniques. The base classifiers they have used are logitboost, adaptive boost, and SVM for classification of event pattern of left hand and right hand class prediction. The authors Littlestone and warmuth has suggested an ensemble of weighted experts based prediction known as Weighted Majority (WM) [55]. They have replaced the poor performing classifiers with new classifiers after a putting the threshold on overall accuracy. Authors of [56] has proposed Dynamic Weighted Majority (DWM) which dynamically changes the classifiers in accordance with the global performance, and deletes an expert having a weight less than or equal to a given threshold value. They have used weighted majority vote of its classifier’s predictions for global prediction of the system.
Proposed methodology
Problem formulation
Given a set of data instances
The degree of support of each class is defined as the label output generated by the each classifier in ensemble and can be represented as
Dataset description
The algorithm is tested on the publically available data set provided by Graz university in BCI competition 2008 (dataset IIa) and bci comprtion 2003 (dataset IVa) [57]. The EEG data were recorded non-invasively using twenty-two Ag/AgCl electrodes along with 3 channels recording EOG data. The electrode placements for EEG recording are shown in Fig. 1. The recording of EEG and EOG data is performed with the sampling frequency of 250 Hz. The provided signal is filtered using a band-pass filter of 0.5–100 Hz. To avoid the effect of power line noise an additional notch filter of 50 Hz is also applied in this dataset. The dataset consists of four imagery signals for movement of the left hand, right hand, both feet and tongue. Data is recorded for nine subjects in two parts: training and evaluation set. Each data set consists of 288 trials distributed as six sets of twelve time’s random imagination of each movement. The timing diagram of the paradigm is given in Fig. 2.
Electrodes used in database collection.
Timing diagram of paradigm used for each trial.
The proposed method is also tested over BCI competition III dataset IVa. This data set contains EEG signals for 3 motor imageries i.e. (L) left hand, (R) right hand, (F) right foot imagery movements. EEG signals were recorded using 118 EEG channels and markers were indicated for the time points of 280 cues for each of the 5 subjects. The recording was made using BrainAmp amplifiers and a 128 channel Ag/AgCl electrode cap from ECI. 118 EEG channels were measured at positions of the extended international 10/20-system. The given signals were band-pass filtered between 0.05 and 200 Hz and then digitized at 1000 Hz with 16 bit (0.1 uV) accuracy.
In this work, the EEG data is decomposed using three filter banks (triadic) as discussed above in Section 2.1. It reduces the computational complexity for obtaining the required frequency band for feature extraction. For example if a frequency band of 0–3.5 is needed from the input signal of sampling frequency 10 Hz, then a single triadic filter will solve the problem in single level decomposition. For the same problem, the dyadic wavelet filters will require three level decomposition (0–5 and 5–10, 0–2.5 and 2.5–5, 2.5–3.75 and 3.75–5). It will also cause the higher cutting edges. Low number of cutting edges reduces information loss at the edges and helps in a better approximation of the required brain rhythms. However the choice of non-dyadic wavelet transform is made up on the base of required frequency band of EEG signals for feature extraction. The proposed triadic decomposition divides the frequency bandwidth of input signal infrequency range from 0 to 0.33
The selected frequency bands for both datasets are 6.17–9.25 Hz, 9.25–18.5 Hz, 18.5–27.7 Hz and 74–83.3 Hz (highlighted in Fig. 3).
Three levels triadic wavelet decomposition of the EEG signal for signal for input data sampled at 250 Hz.
Common spatial pattern (CSP)
Event-related de-synchronization (ERD) and event-related synchronization (ERS) are two techniques to find the changes in EEG signal during imagination of movement task. ERS/ERD can be calculated by the increased/decreased percentage of EEG power in a particular frequency band. The change in power of specific frequency band in channels from the opposite cortex of the brain is discussed in [27]. Filter bank CSP is one of the advance algorithm used to extract features for different frequency sub bands and the classification can be performed on the basis of the difference of variances in the features extracted at different channels. However, the selection of frequency band for CSP extraction is an important task which is discussed in later part.
The CSPs can be collected for the signals by calculating the covariance matrix as calculated by the given Eq. (8). The signals have change in variance for different classes due to ERS and ERD.
where
The eigenvectors matrix
The whitening transformation is performed due to the no correlation between the variables of the transformed matrix. It helps to maximize the difference between the two classes. So, CSP projection matrix uses the eigenvectors of the whitened covariance matrix of each class. The whitened co-variance matrix for each class can be generated by multiplication of the whitening matrix [61] with a covariance matrix as shown in Eq. (9).
and the extraction of the eigenvector matrix from
The projection matrix will be
The obtained projection matrix can be used to perform the spatial filtering by simple multiplication with EEG data. The filtered signal is further used to extract the change in variance for a different class of signal.
The total training set is further divided into 7:3 ratio, 70% for training data and for validation 30% subset of data. Feature space has been reduced by support vector machine - recursive feature elimination (SVM-RFE), an efficient feature selection algorithm which uses support vectors to assign weights to features [59].
Framework of signal processing and ensemble classification pipeline proposed in the paper. A 22-channel dataset is preprocessed via triadic multi-resolution wavelet decomposition on training dataset as well as testing dataset, CSP algorithm for feature extraction followed by SVM-RFE feature selection algorithm foe reduction in dimensionality. Finally ensemble E of K classifiers fed with selected features as input and final output classification labels are identified based on DWM algorithms.
This paper focuses on the classification of extracted and selected CSP features using a majority vote based classifier. This model overcomes the restrictions of conventional techniques (statistical as well as representational) by using the collection of heterogeneous classifiers. In our approach, DWM based classifier which is a novel arrangement of classifiers is used to perform the classification that performs the classification. For classification J48, Naive Bayes, SVM and k-Nearest Neighbor algorithms are used as base learners. The ensemble classification model provides accurate classification and prediction by improving the recall and precision value as compared to individual classifiers. To know the validation of the proposed approach, it is compared with the single classifier approaches which include kNN, SVM, Naïve Bayes, J48 to know the exact outcomes. Figure 4 Framework of signal processing and ensemble classification pipeline proposed in the paper. A 22-channel dataset is preprocessed via triadic multi-resolution wavelet decomposition on training dataset as well as testing dataset, CSP algorithm for feature extraction followed by SVM-RFE feature selection algorithm foe reduction in dimensionality. Finally ensemble E of K classifiers fed with selected features as input and final output classification labels are identified based on DWM algorithms.
-values for features extracted of BCI competition IV dataset IIa using non-dyadic filter banks
Experimental results
The proposed framework applied on dataset to extract features and classification.
-values for features extracted of BCI competition III dataset IVa using non-dyadic filter banks
Comparative analysis of accuracy (in %) of DWM decision approach with number of individual classifiers for all subjects from BCI competition 2008 (dataset IIa)
Comparative analysis of accuracy (in %) of DWM decision approach with number of individual classifiers for all subjects from BCI competition 2003 (dataset IVa)
Subject wise classification accuracy obtained for BCI competition 4 dataset IIa using DWM ensemble classification framework based on Non-dyadic filter bank
(Signal processing): Temporal filtering using triadic wavelet decomposition. (Feature extraction): Spatial filtering using common spatial pattern (CSP) algorithm. (Feature selection): Support Vector Machine- Recursive Feature Elimination Algorithm. (Ensemble classification): SVM, Naïve Bayes, J48, k-NN. (Ensemble decision approach): Dynamic Weighted Majority.
Comparative analysis of DWM decision approach with number of individual classifiers in tabular form in Table 3 and graphical representation in Fig. 5 for all subjects from BCI competition 2008 (dataset IIa). Comparative analysis of DWM decision approach with number of individual classifiers in tabular form in Table 4 and graphical representation in Fig. 6 for all subjects from BCI competition 2003 (dataset IVa).
Graphical representation of comparative analysis of DWM decision approach with number of individual classifiers on BCI competition 2008 (dataset IIa).
Graphical representation of comparative analysis of DWM decision approach with number of individual classifiers on BCI competition 2003 (dataset IVa).
Using results in Table 3, classification accuracy in (%) of DWM ensemble framework for dataset BCI competition 2008 (dataset IIa) is slightly greater from used individual classifiers for all the subjects except subjects A02 (decrease from 70 to 68.2415), and A08 (decrease from 95 to 94.7469). But overall average is greater than individual classifiers. A significant increase can be seen i.e. from 85.6 to 86.615444.
Also considering results from Table 4 classification accuracy in (%) of DWM ensemble framework for dataset BCI competition 2003 (dataset IVa) is significantly greater from used individual classifiers for all the subjects except subjects al (decrease from 99.032285 to 98.94511). But overall average is greater than individual classifiers. A significant increase can be seen i.e. from 83.8 to 84.9 (rounded values).
Results using DWM in terms of event specific imagery movement and overall accuracy are shown in Table 5. The Kappa values are also obtained during the experiment and shown in Table 5. The kappa values compared the observed accuracy with the expected accuracy. Results show the considerably high kappa values for all subjects except A04 and A06 using DWM ensemble classification framework.
In this paper, the classification for the left hand and right-hand motor imagery EEG signals is proposed. The triadic wavelet transforms has been explored with common spatial patterns for the extraction of features of multivariate EEG signals. Then Dynamically Weighted Majority ensemble classification is used for final class label decision. Four statistical features have been computed over CSP coefficients of wavelet decomposed signals. These features were classified using four well-known classification techniques (kNN, SVM, Naïve Bayes, and J48) and compared with DWM classification ensemble framework. The proposed wavelet-based features were found efficient to discriminate between different EEG activities by all four classifiers. However, the ensemble based learning method gave the highest accuracy among all. The average sensitivity, specificity, and accuracy of 85.4%, 86.5%, and 85.6% were achieved with a kappa value of 0.59 using DWM classification. The proposed non-dyadic filter bank based DWM ensemble classification approach outperformed the single classifier based techniques.
Conclusion and future work
The statistical properties of data changes over time, which affect the accuracy of any machine learning model. The hidden relationships of variable varies in training and testing data i.e. concept drift in data. The ensemble of classifiers uses diverse information to handle this concept drift present in the data for the classification process. This paper focuses on the classification of extracted and selected CSP features using a dynamic weighted majority (DWM) based classifier. The DWM arrangement has been used to find the concept drift in the dataset to overcome the loss of information in the EEG signal. The proposed model overcomes the restrictions of conventional techniques (statistical as well as representational) by using the collection of heterogeneous classifiers to depict different statistical properties of a variable in training data. This leads to accurate prediction of statistical properties of the variable in training data. The empirical results have proven the same. The proposed work may be extended for multilevel classification for four classes as the selected data contains four class problems. The better selection of classifiers and further study on unsupervised algorithms may be used with proposed work to achieve useful results in four class classification problem. Our future work is inclined to the early dynamic change of weights for the set of ensemble classifiers. This will be the extension of DWM ensemble classification framework.
