Motor imagery task classification using intelligent algorithm with prominent trial selection

Abstract

Brain Computer Interface (BCI) enables us to record and process the information generated by the brain and process them. Due to high variability of the Electroencephalogram (EEG) data, multiple trails are recorded for a particular task. The present work aims to improve the accuracy for motor imagery task classification by selecting the most prominent trail from the multiple trails recorded during motor imagery. In this paper, we propose a novel weight optimization algorithm for common spatial filtering (CSP) using evolutionary algorithms (i.e. cuckoo search algorithm (CSA), firefly algorithm (FA) and gravitational search algorithm (GSA)) to select the most prominent trial from the multiple trails recorded for feature extraction. The features extracted from the selected trials were thus used for motor imagery task classification. The performance was evaluated on the extracted features from the selected trials using two classifiers namely linear discriminant analysis (LDA) and support vector machines (SVM). It is observed that FA with band power as a feature gives the best performance in comparison to the earlier reported methods i.e. average, error based and alternating direction method of multipliers (ADMM).

Keywords

Brain computer interface common spatial pattern cuckoo search algorithm electroencephalography firefly algorithm gravitational search algorithm linear discriminant analysis support vector machine

1 Introduction

Brain computer interface (BCI) is one of the recently developed paradigm attempting to merge the fields of machine learning and neuroscience. BCI acts as an interface to facilitate communication between the brain and the external world. A non-invasive method of recording the brain activity is through the EEG (Electroencephalogram) but the recordings have high variability, susceptibility to noise, low spatial resolution. Assessing the quality of these trials is a challenge, which is overcome by weighting the trails according to their quality. For example real and imagery changes of hand and feet movement invoke different signals in different regions of the brain. So it is important to capture the changes of EEG signal according to the different motor tasks such as hand movement, leg movement, tongue movement, arm movement etc. which will lead to better classification [1]. EEG signals are non-stationary and does not present any discriminative information. CSP assigns weights to the channels in a manner that it maximizes the variances and produces the maximum distinction between any two classes of EEG data. Physically, the goal of CSP is to reduce the volume conduction effect and localize the EEG to specific channels. The basic framework of CSP is to create a covariance matrix which generates the maximum variance. The generation of the covariance/weight matrix can be viewed as an optimization problem, diagonalization and Eigen value decomposition methods have been used for the purpose, but they fail to produce discriminative information for multi-class scenario. For multiple trials, such matrices for each trails are averaged to from an average covariance matrix to get a generalized spatial filter, which would be able to give better classification results. However, poor quality trails or the trails contaminated with artifacts leads to a poor estimation of the covariance. The problem of finding optimal weights for different trials is highly non-linear. Hence the present work aims only to generate an optimal weight matrix (CSP filter) for spatially filtering motor-imagery data for multi trail scenario using meta-heuristic algorithms (cuckoo search algorithm (CS), firefly algorithm (FA) and gravitational search algorithm (GSA)). The weight matrix thus generated was used to extract features followed by classification of the EEG data using linear discriminant analysis (LDA) and support vector machine (SVM) and finally comparing them with the existing methods.

2 Literature survey

The proposed work focuses on the development of an optimally weighted common spatial filter. Hence the literature survey presents the various types of CSP’s available. The common spatial pattern (CSP) technique is an unsupervised spatial filtering approach which is well known in the field of neuro-science. It filters out the unwanted channels and assign weights to only those channels that are important. The CSP Algorithm tries to maximize the covariance between the channels and filters out rest of the channels and gives them in order [9]. CSP does not consider the temporal filtering assuming it to be performed separately. To rectify this further work was carried out where spatial and temporal filtering were performed together to achieve more optimality. In common spatio-spectral patterns (CSSPs) [9] algorithm an augmented data matrix is constructed for each trial that contains two copies of the signal. In common sparse spectral spatial pattern (CSSSPs) [10] multiple time copies of the signal are considered for a channel and a single temporal filter is constructed for all the channels, thus giving better flexibility over the CSSP but a closed form solution is not available. Discriminative Filter Bank Common Spatial Pattern (DFBCSP) [11] algorithm maximizes the variance between the classes by designing multiple spatio-temporal filter pairs. The algorithm cannot optimize higher-dimensional spatio-temporal filters. Several algorithms were designed to optimize the spatial and temporal filter at different stages. These algorithms can be divided into two categories; one is iterative and the other is sequential [11]. It is also important to point out that FBCSP filters perform distinct frequency filtering on the channels but CSP filters different frequency sub bands. In case of iterative algorithms SPEC-CSP and ISSPL [12] the frequency and spatial filters are changed iteratively with fisher ratio for optimizing spectral filters by creating a maximum margin hyper plane. Up till now all algorithms only tried to optimize the weights only in a single trial not considering the case of multi-trial where each trials are different. Assigning weights to the trials according to their quality was then proposed. The first approach was introduced where the weights were assigned to the trial based on the error after joint diagonalization [9]. It is not enough to simply assign weights according to the quality of the trials, removal of the low quality trials to reduce the impact of the low quality trial is important, another method was proposed to assign weights to the trials by joint diagonalization [15]. The ADMM weight optimization technique was used, where it was proposed to assign weights to different trials based on their quality [14]. Riemannian geometry was used for averaging covariance matrices which improved the performance for small dimensional problems, but decreased with the increase in dimensionality [20]. A sparse filter band CSP (SFBCSP) was described by Zhang et al. where CSP features were estimated from multiple signals filtered at a set of overlapping bands. The filter bands that produced the most discriminative features were selected in a supervised way by using sparse regression. However the authors only presented errors in classification accuracies without the actual value of classification accuracy [21]. Alessio et al. used generic learning to reduce the covariance matrices of two classes into an identity matrix and a generic covariance matrix and obtained a good classification accuracy of 60–70% but on different dataset [22]. Shiu et al. proposed an improved DFBCSP method using mutual information for selecting the bands that gave the most optimal results and then features were calculated onthem [23].

3 Methodology

The steps followed in the proposed approach are given in the Fig. 1 for the motor imagery classification. First the signals are preprocessed, followed by extraction of features from the signals and finally classification on the extracted features.

Fig.1

Methodology for classification using trial selection.

4 Preprocessing

In the preprocessing phase the signal is prepared for processing, where artifacts are removed from the signal, weights are assigned to the trials based on their quality and finally spatially filtered for classifying the signals.

4.1 Artifact removing

Artifact can be described as noise which corrupts the EEG signals. They originate from various sources, which include the power-line interference, body and eye movements made by the subject himself or from the blinking of eyes by the subject. There are three main approaches to handle eye artifacts: avoidance (avoid the artifact and let them pass for further processing), rejection (reject the trials which contain artifact) and removal (process the signal to remove the artifacts). In the removal approach, the simplest method is to perform band pass filtering because eye artifact lies in 0–4 Hz range. Other methods of artifact removal include linear regression, blind source separation, and wavelet transform.

4.2 Trial selection

The EEG signals are highly variable and have a low spatial resolution in nature, hence to ensure the reliability of the EEG trials for motor imagery task multiple trials are recorded.

The various ways to get the prominent trials are as given below.

4.2.1 Average

Each trial plays equal part in average selection. $S_{avg} = \frac{1}{s} S_{1} + \frac{1}{k} S_{2} \dots + \frac{1}{k} S_{k - 1} + \frac{1}{k} = \frac{Σ_{k \in d_{i}}^{k} s_{k}}{K}$ (1)

4.2.2 Weighted average

Weights are assigned to each trial so that better quality trials contribute more and bad quality trials contribute less. $S_{avg} = w_{1} S_{1} + w_{2} S_{2} + \dots + w_{k} S_{k} = Σ_{k \in d_{i}}^{k} w_{k} S_{k}$ (2)

Where $\sum_{i = 1}^{k} w_{i} = 1$ . From the above equation it can be concluded that the low quality trials contribute equally in the average method. In weighted approach weights are assigned according to the quality of the trials. In order to assign weights it is required to find the quality of the trials. For this Approximate Joint diagonalization approach is used. The Approximate Joint Diagonalization of a set of square matrices is to find orthonormal changes in the basis which make the matrices diagonal. When the matrices commute, it can be diagonalized exactly, but if they donot commute, a joint diagonality criterion is determined [10 –14]. Two most popular methods for Approximate Joint Diagonalization are:

Fast Frobenius AJD (FFDIAG)(uses frobenius norm of matrices to find optimized Eigen vectors).

Jacobi AJD (uses angle to find optimized Eigen vectors) [15].

In the proposed work, Fast Frobenius AJD method is used because it is faster than any other AJD method.

4.2.3 Weight optimization

Given two classes, d₁ and d₂, the weights of k_d trial matrices S_k (k ∈ d_i) is designed. The matrix should be diagonalizable if the observed EEG is stationary, but this assumption is not true. Hence, it cannot be diagonalized exactly but an approximate joint diagonality criteria can be considered given as: $S_{k} = U Σ^{k} U^{T} + E^{k} (k \in d_{1} \cup d_{2})$ (3)

Here U is a common factor, E^k and Σ^k are the error matrix and the diagonal matrix at the kth trial respectively. The trials, for which the Frobenius norm of E^k (denoted by ||E^k||_F) is large are regarded as low-quality trials, and hence smaller weights are assigned to those trials. Thus, we simply choose $q_{k} = | | E^{k} | |_{F}$ (4) for a single class. For optimizing the weight vector the fitness function is formed using l1 & l2-norms of matrix [15]. Our fitness function is as follow: ${}_{ω}^{\min}α F_{l 1} + f_{l 2}$ (5)

Where $F_{l 1} = \frac{1}{\sum_{k \in d_{1}} w_{i}} \sum_{k \in d_{1}}^{k} w_{k} E^{k}$ (6)

And $F_{l 2} = \frac{1}{2 \sum_{k \in d_{1}} | | S^{k} | |_{F}^{2}} | | \sum_{k \in d_{1}}^{k} (\frac{1}{k_{d}} - w_{k}) S^{k} | |_{F}^{2}$ (7)

Where S^k is the trial matrix. The value of α is taken as 1, 0.1 and 0.01. But it is observed that 0.01 gives better performance than the rest. Brief description of the various meta-heuristic algorithms used in this work is given in the next section.

5 Common spatial filtering

Common spatial pattern (CSP) is a widespread method used in the domain of signal processing for disassociating a multivariate signal into additive sub-components having the maximum differences in variance between two windows. A CSP filter can be represented as: $S = W^{T} E$ (8)

Where, W = R^p×ch is the spatial filter matrix and S ∈ R^p×times represents the matrix of filtered signal. The CSP can be expressed as: $maximize trace (W^{T} Σ_{1} W)$ (9)

Subject to, W^T (Σ₁ + Σ₂) W = I, Where $\sum_{1} =_{E_{n} \in d_{1}}^{E x p} \frac{E_{n} E_{n}^{T}}{t r (E_{n} E_{n}^{T})}$ (10) $\sum_{2} =_{E_{n} \in d_{2}}^{E x p} \frac{E_{n} E_{n}^{T}}{t r (E_{n} E_{n}^{T})}$ (11)

The above equation can be solved by generalized Eigen value decomposition, i.e. by diagonalizing ∑₁ & ∑₂ and getting the channels that maximizes the covariance between the two signals and satisfies the subjected conditions. Where d1 and d2 represents the classes to be classified. By this method only the desired channels are retained and the unwanted channels are filtered out. CSP is useful only in two class case. For multi-class CSP there are twomethods:

One class vs the rest- In this approach the class for which CSP is being calculated in kept on one side and the rest of the classes are kept on the other side and then CSP algorithm is applied and the optimal channels are found out for further steps.

Combination of any two classes-In this approach combination of any two classes is taken and spatial filtering is performed. By applying this method the unwanted channels are removed from the signals.

6 Meta-heuristic algorithm

A meta-heuristic algorithm is an iterative search process which incorporates intelligent features for enhancing exploration and exploitation in the search for near global optimal solutions. Meta-heuristic algorithms vary in range from simple local search procedures to complex learning processes.

Examples of meta-heuristic algorithms include GSA, Bat Algorithm, Firefly Algorithm, Ant colony optimization etc. In this paper CSA, FA and GSA are used for optimizing the weights of the trials, using the Equation (5) as the objective function and their comparative performance are investigated.

6.1 Gravitational search algorithm

GSA was described by Rashedi et al. in 2009 for solving optimization problems. This is a search strategy which uses the Newton’s law of motion and the law of gravitation. In this algorithm the gravity of each agent with respect to others are compared. Agents are the objects and their fitness’s are taken as Mass. The movement of objects towards each other due to gravitational force results in global movement (exploration). The lighter objects moves fast but heavier objects move slowly. The slow movement of heavier objects results in local search (exploitation). The force between two object is given as: $F = \frac{{GM}_{1} M_{2}}{r^{2}}$ (12)

Where F represents the gravitational force between the objects, G is the universal gravitational constant and M₁ and M₂ are the masses of the two objects respectively and r represents the distance between them. For details about the algorithm [19] may be referred.

6.2 Cuckoo search algorithm

Cuckoo search algorithm mimics the behavior of the cuckoos in obligate brood parasitism, where the cuckoos lay their eggs in a host bird’s nest. From those eggs, the eggs having maximum resemblance to that of the host bird’s eggs have the probability to grow up to a mature cuckoo. Rest of the eggs are detected and destroyed by the host bird. The eggs that survive in a particular area represent the favorability of the nests in that particular area. Based on the probability of discovery by the host bird and the location of the nest, the algorithm optimizes the breeding behavior of the cuckoos and can be used for various optimization problems. For details about the algorithm [16] may be referred.

6.3 Firefly algorithm

Firefly is a meta-heuristic algorithm for optimization based on the behavior of firefly insects. Firefly uses their flashing behavior for sending signal to the opposite sex, to attract other fireflies etc. But mathematically all fireflies are unisex, so any firefly can be attracted to any other firefly. Attractiveness of a firefly is enumerated by its brightness, the brighter the firefly, the more attractive is the firefly and the lesser bright firefly eventually fly towards them. But the intensity of brightness decreases with the increase in distance. If there are no brighter fireflies then they fly randomly. For details about the algorithm [17] may be referred.

7 Feature extraction

We have considered two methods for feature extraction in this work, namely band power and time domain parameter.

7.1 Band power

In this method the Butterworth infinite impulse response filter is used. So the signal x[t] contains only the required frequency components. The average power for a window size w is: $p [n] = \frac{1}{w} Σ_{k = 0}^{w} p [n - k]$ (13)

This means the power is averaged over window w. The final feature value is equal to $ln (\bar{p} [n])$ .

7.2 Time domain parameter

The variance of first k derivative of the signal is called time domain parameter (TDP). $p_{i} (t) = \frac{d^{i} x (t)}{{dt}^{i}}, i = 0, 1, 2, \dots k$ (14)

The obtained value is later smoothed using exponential moving avg. window filter. $y [n] = u . p_{i} [n] - (1 - u) . y [n - 1]$ (15)

Where, p_i is the input signal and y is the filtered signal. The u value is used as a parameter to calculate the time domain parameter. So the final feature is equal to ln(y[n]).

8 Classifier

Classification is the two-step process. In first step the classifier is trained with the help of the data set whose class is already known formally known as training data. And in second step we apply trained classifier on the data set whose class is not known formally called test data. In the proposed work, linear discriminant analysis (LDA) and support vector machine (SVM) have been used for classifying EEG data.

8.1 Linear discriminant analysis

The Fisher’s Linear discriminate analysis (LDA) classifies data based on mean and covariance matrices of patterns for individual classes. A d-dimensional vector x is transformed to a scalar z as: $z = w^{T} x$ (16)

The LDA gives an optimal projection w so that the distribution of z is easy to discriminate. The creation of LDA is given by $max J (w) = \frac{{(m_{1} - m_{2})}^{2}}{s_{1} + s_{2}}$ (17)

Where, m₁ and m₂ denote averages for Z_n ∈ class₁ and Z_n∈ class₂, respectively. S₁ and S₂ denote the scatters for Z_n∈ class₁ and Z_n∈ class₂, respectively.

8.2 Support vector machine

Support vector machine classifier is a separating hyper-plane. This hyper-plane is chosen such that it maximizes the margin between two classes. So the most important training points are the support vectors. The hyper-plane is defined as $y = w^{T} x + b$ (18)

The goal of this hyper-plane is to divide the classes in such a way that it maximizes the margin between the classes. In this work soft margin method is used which tolerates some error to maximize the margin. The soft margin tool introduces a C parameter which is used to control the importance of proper classification of the classifier. If the value of the C parameter is large then the algorithm does not allow miss-classification. SVM can be transformed into the nonlinear method of classification if nonlinear kernel function is used. In this work Gaussian kernel function is used as follow. $K (x, y) = e^{- \frac{{| | x - y | |}^{2}}{2 σ^{2}}}$ (19)

The value of the function is in between 0 and 1 depending on the similarity of x and y (1 if the two values are equal). If the features are calculated using the Gaussian kernel, non-linear class boundaries can be learned by the SVM classifier. The σ parameter (standard deviation) controls the width of the kernel function (the larger the σ value, the wider the function).

9 Experiments

9.1 Data description

Data is taken from BCI Competition IV-2a dataset provided by Institute for Knowledge Discovery, Graz University of Technology Austria. The data describes four different motor imagery tasks imagining left hand movement, right hand movement, both feet movement and tongue movement collected from 9 subjects employing a cue based paradigm. Two sessions were recorded and each session consisted of 6 runs. Again each run consisted of 48 trial (12 trial for each class). Only session 1 is used for processing and training, session 2 data is kept for testingpurpose.

9.2 Evaluation method

In BCI Competition IV for result submission a kappa Coefficient was defined such that κ ∈[0, 1]. The value of the kappa coefficient can be calculated using the following equation $k = \frac{p_{0} - p_{e}}{1 - p_{e}}$ (20) where, p₀ is the classification accuracy and p_e is the hypothetical accuracy of a random classifier on the same data. In our case, p_e = 0.25, we calculated the κ value for each subject and then averaged over 9-subjects and found the mean κ value. 8-fold cross-validation is used to avoid over fitting.

9.3 Experimental set-up

The coding and simulation is done in Octave 4.0.0 environment. In addition, the biosig toolbox is used to load the EEG data (.gdf) in octave. The whole experiment was carried out on standard computer configuration with intel core i7 CPU processor and 4 GB RAM.

9.4 Results

Implementation starts from preprocessing the EEG data, the implementation contains both the cases, one is average weighted and the other is optimally weighted using the meta-heuristic algorithms as discussed before for trial selection over the trials of motor imagery task. Two feature extraction techniques are used, one is band power and the other is time domain parameter. Two classifiers (LDA, SVM) are applied to further classify the result. The steps are described below.

9.5 Pre-processing

The EOG channels are removed. These channels are ignored from further processing as they contain eye activity, only the channels relating to the brain activity are considered. Not only do the signals contain the artifacts relating to the eye, but also other kinds of artifacts like the higher frequency components (More than 30 Hz) are also present. To remove them, the Butterworth band pass filter that only allows 7–30 Hz frequency components (smoothing window size is 1 sec) is used. Rest frequency components are filtered out.

9.6 Trimming

As mentioned in the dataset, each trial was recorded in 8 second paradigm. In which initially black screen was shown to the subjects. And then short acoustic warning tone was presented to make them alert. Then after 2 seconds the motor imagery task to be performed was shown, 1 second for the subject to think about the motor imagery task and then to imagine it in his mind. After 6 second again the subject was allowed to rest. It was assumed that the imagination of the motor task began after 3 seconds and ended at 6th second. The signals other than that are trimmed off so that better accuracy in classification is achieved.

9.7 Multi-trail selection

Trimmed signals are provided by the previous preprocessing step. In context of the data set there are 72 trials for each class for a particular subject. Because trials are recorded in two sessions for 9 subjects one is for training purpose and the other for testing purpose. So in training session for each subject 6 runs are recorded and in each run 48 trials are recorded (12 trials for four class). So there are 288 trials for 4 classes and for one class there are 72 trials for each subject. The trimmed signal matrix for each trial is of size 1500×23 where 1500 is the total number of frames (recording frequency is 250 Hz and the signal is trimmed for 3 second (3–6 sec) i.e. 250×3 = 750 samples for 2 sessions for a total of 25 channels in which 3 EOG channels are removed leaving only 23 channels. Now the quality of the trial is found by FFDIAG algorithm and then the weight vector is optimized with the help of meta-heuristic algorithms as discussed previously. The population size in meta heuristic algorithm is varied from 100–250 and the accuracy for all the features are given in Table 1 (In Table 1 each result is run 5 to 7 times and then the average is taken to overcome the randomness effect). So from the table it is concluded that after an initial population size of 200 the results converge and does not change too much. So initial population is set to 200 and all meta-heuristic algorithms are allowed to optimize the fitness function for 1000 iterations and the convergence curve is shown in theFig. 2.

Fig.2

Convergence curves for Subject 2.

Table 1

Mean k value for different initial population with fixed iteration size (200)

	Initial Population Size
Features		100	150	200	250
Band Power	Train	0.671466	0.68347	0.68447	0.68724
Band Power	Test	0.5294	0.5396	0.53376	0.54042
Log Power	Train	0.597223	0.598426	0.59991	0.59414
Log Power	Test	0.422393	0.43168	0.43377	0.43818
Log Amplitude	Train	0.418726	0.41855	0.41924	0.4177
Log Amplitude	Test	0.1652	0.172593	0.17396	0.17464
Log Power & Log Amplitude	Train	0.62004	0.62379	0.62397	0.62402
Log Power & Log Amplitude	Test	0.334723	0.33574	0.36385	0.36395

From the plot in Fig. 2 and results it is observed that gravitational search algorithm in all classes for each subject does not optimize the weights that efficiently but the firefly algorithm optimizes the function much better and also converges quickly. The performance of the cuckoo algorithm is somewhere in between both the algorithms. It should be noted here that GSA and firefly converge in between 300 to 400 iterations but cuckoo algorithm converges much faster.

9.8 Classification with the features

The optimized weights of the trials for different subjects for each class are obtained. So after applying these weights on the trials, features are extracted and the classifiers are applied. Obtained classification accuracy in term of κ(kappa) and percentage for both training and testing data are given in Tables 3 and 4 respectively. Besides the best classification accuracies for individual subjects are presented in Table 2 along with the feature and the classifier for which the best results were obtained. From the Table 2 it is observed that Firefly Algorithm for weight selection, Band power feature and SVM as classifier yields the best accuracy. Figure 3 gives the plot for the average kappa value for the subjects. The accuracies for the training data have been refined with the help of 8-Fold cross validation and are presented in Table 3 and for testing data are presented in Table 4. It is observed that our proposed method gives better accuracy in comparison to the other reported methods such as average based, ADMM based and error based for selection of trail for motor imagery taskclassification.

Fig.3

Average Classification Accuracy in kappa with Band power as feature and SVM as classifier for all subjects.

Table 2

Best Classification Accuracies for the subjects with Firefly Algorithm for weight Optimization, Band Power as Feature and SVM as Classifier

Subject Number	Best Accuracy	Best Accuracy
Number	in kappa (k)	in percentage (%)
Subject 1	0.98833	99.1%
Subject 2	0.80093	85%
Subject 3	0.98685	99%
Subject 4	0.82926	87.1%
Subject 5	0.71759	78.8%
Subject 6	0.77315	82.9%
Subject 7	0.93056	94.7%
Subject 8	0.97685	98.2%
Subject 9	0.96296	97.2%

Table 3

Average classification accuracies for various features using LDA and SVM as classifier in terms of kappa (k) and percentage (%) using 8 fold cross validation for training data

Classifier	Feature	Classification Accuracy: in kappa (k)
Classifier	Feature	Average [9]	Error [15]	ADMM [14]	GSA (Proposed)	Cuckoo (Proposed)	Firefly (Proposed)
LDA	Band Power	0.61265	0.62448	0.67335	0.65175	0.66358	0.67489
	Log Power	0.52881	0.62448	0.58076	0.65175	0.66358	0.67489
	Log Amplitude	0.393	0.39866	0.40277	0.41307	0.43512	0.45666
	Log Power &Log Amplitude	0.56173	0.57201	0.61625	0.60031	0.60597	0.62191
SVM	Band Power	0.62968	0.83024	0.68724	0.85493	0.86111	0.87294
	Log Power	0.65381	0.63631	0.69598	0.69341	0.67695	0.71165
	Log Amplitude	0.39249	0.38323	0.40535	0.39814	0.41152	0.42924
	Log Power &Log Amplitude	0.71656	0.70524	0.74897	0.74434	0.7506	0.77772
		Classification Accuracy: in percentage (%)
		Average [9]	Error [15]	ADMM [14]	GSA	Cuckoo	Firefly
LDA	Band Power	70.9%	71.8%	75.5%	73.8%	74.7%	75.6%
	Log Power	64.6%	71.8%	68.5%	73.8%	74.7%	75.6%
	Log Amplitude	54.4%	54.8%	55.2%	55.9%	57.6%	59.2%
	Log Power &Log Amplitude	67.1%	67.9%	71.2%	70%	70.4%	71.6%
SVM	Band Power	72.2%	87.2%	76.5%	89.1%	89.5%	90.4%
	Log Power	74%	72.7%	77.1%	77%	75.7%	78.3%
	Log Amplitude	54.4%	53.7%	55.4%	54.8%	55.8%	57.1%
	Log Power &Log Amplitude	78.7%	77.8%	81.1%	80.8%	81.2%	83.3%

Table 4

Average classification accuracies for various features using LDA and SVM as classifier in terms of kappa (k) and percentage (%) for testing data

Classifier	Feature	Classification Accuracy: in kappa (k)
Classifier	Feature	Average [9]	Error [15]	ADMM [14]	GSA	Cuckoo	Firefly
LDA	Band Power	0.45906	0.45838	0.50211	0.51824	0.51687	0.54828
	Log Power	0.37881	0.45838	0.42714	0.51824	0.51687	0.53828
	Log Amplitude	0.15764	0.15581	0.18936	0.16953	0.20106	0.22844
	Log Power &Log Amplitude	0.29893	0.30833	0.33875	0.37321	0.38787	0.44148
SVM	Band Power	0.46116	0.38364	0.49717	0.4844	0.48818	0.49891
	Log Power	0.33958	0.36260	0.39892	0.38938	0.39735	0.41318
	Log Amplitude	0.14762	0.13107	0.17472	0.13932	0.18902	0.19657
	Log Power &Log Amplitude	0.27435	0.28784	0.32173	0.29924	0.32925	0.34912
		Classification Accuracy: in percentage (%)
	Band Power	Average [9]	Error [15]	ADMM [14]	GSA	Cuckoo	Firefly
LDA	Log Power	59.4%	59.3%	62.6%	63.8%	63.7%	66.1%
	Log Amplitude	53.4%	59.3%	57%	63.8%	63.7%	65.3%
	Log Power &Log Amplitude	36.8%	36.6%	39%	37.7%	40%	42.1%
SVM	Band Power	47.4%	48.1%	50%	53%	54.0%	58.1%
	Log Power	59.5%	53.7%	62.2%	61.3%	61.6%	62.4%
	Log Amplitude	50.4%	52.1%	54.9%	54.2%	54.8%	56%
	Log Power &Log Amplitude	36%	34.8%	38.1%	35.4%	39.1%	39.7%
	Band Power	45.5%	46.5%	49.1%	47.4%	49.6%	51.1%

10 Conclusion and future work

This paper investigates the performance of intelligent search algorithms in selecting the prominent trails to be used for motor imagery task classification and further two types of classification methods have been applied. It has been observed that all three meta-heuristic algorithms have performed better than all other methods reported earlier. However, amongst the three algorithms firefly algorithm gives the best performance in terms of faster convergence, and better quality of solutions. These optimized weight vectors provide the most prominent trials for each subject to be used in feature extraction via CSP, so that the most discriminative features from multiple trials of a subject is extracted, followed by classification by two methods, LDA and SVM. The classification accuracy is tested for different features and different classifiers. From the results it can be concluded that among all weight optimization methods for trail selection, firefly algorithm always gives the best results with band power as feature and SVM as the classifier. Further, this work discussed only about the weight selection methods of the trials and thus this idea can be extended to individual frames in a single trial so that only those frames which contain the most discriminative information can be selected and not the other frames of the trials.

Footnotes

Acknowledgments

The above work was supported by the Ministry of Communications & Information Technology No. 13(13)/2014-CC& BT, Government of India.

References

Graimann

, Allison

and Pfurtscheller

, brain– computer interfaces: A gentle introduction, In Brain-Computer Interfaces (2010), 1–27. Springer, Chapter 1.

Sanei

, Ferdowsi

, Nazarpour

and Cichocki

, Advances in electroencephalography signal processing [Life Sciences], IEEE Signal Process Mag30(1) (2013), 170–176.

Blankertz

, Tomioka

, Kawanabe

L.M.

and Müller

K.R.

, Optimizing spatial filters for robust EEG single-trial analysis, IEEE Signal Process. Mag25(1) (2008), 41–56.

Ramoser

, Müller-Gerking

and Pfurtscheller

, Optimal spatial filtering of single trial EEG during imagined hand movement, IEEE Trans. Neural Syst.Rehabil. Eng8(4) (2000), 441–446.

, Chen

, Gao

and Brown

E.N.

, A hierarchical Bayesian approach for learning sparse spatio-temporal decompositions of multichannel EEG, NeuroImage56(4) (2015), 1929–1945.

, Chen

, Gao

, Li

, Brown

E.N.

and Gao

, Probabilistic common spatial patterns for multichannel EEG analysis, IEEE Trans. Pattern Anal. Mach.Intell37(3) (2015), 639–653.

and Guan

, An extended EM algorithm for joint feature extraction and classification in brain-computer interfaces, Neural Comput18(11) (2011), 2730–2761.

Bashashati

, Fatourechi

, Ward

R.K.

and Birch

G.E.

, A survey of signal processing algorithms in brain– computer interfaces based on electrical brain signals, J. Neural Eng4(2) (2005), R32–R57.

Lemm

, Blankertz

, Curio

and Müller

K.R.

, Spatio-spectral filters for improving the classification of single trial EEG, IEEE Trans Biomed Eng52(9) (2005), 1541–1548.

10.

Dornhege

, Blankertz

, Krauledat

, Losch

, Curio

and Muller

K.R.

, Combined optimization of spatial and temporal filters for improving brain-computer interfacing, IEEE Trans Biomed Eng53(11) (2006), 2274–2281.

11.

Higashi

and Tanaka

, Simultaneous design of FIR filter banks and spatial patterns for EEG signal classification, IEEE Trans Biomed Eng60(4) (2013), 1100–1110.

12.

, Gao

, Hong

and Gao

, Classifying single-trial EEG during motor imagery by iterative spatio-spectral patterns learning (ISSPL), IEEE Trans Biomed Eng55(6) (2008), 1733–1743.

13.

Tomida

, Higashi

and Tanaka

, A joint tensor diagonalization approach to active data selection for EEG classification, Department of Electrical and Electronic Engineering Tokyo University of Agriculture and Technology (2013), 983–987.

14.

Tomida

, Tanaka

, Ono

, Yamagishi

and Higashi

, Active data selection for motor imagery EEG classification, IEEE Trans Biomed Eng62(2) (2015), 458–67.

15.

Tomida

, Higasi

and Tanaga

, A joint tensordiagonalization approach to active data selection for EEG classification, IEEE Int Conf Acoust, Speech, Signal Processing (2013), 983–987.

16.

Osman,

I.H.

and Laporte,

, Metaheuristics: A bibliography, Ann Oper Res63 (1996), 513–623.

17.

Yang

X.S.

and Deb

, Cuckoo search via levy flights, world congress on nature & biologically inspired computing(NaBIC), IEEE Publications (2009), 210–214.

18.

Fister

, Fister

Jr , Yang

X.S.

and Brest

, A comprehensive review of firefly algorithms, Journal of Swarm and Evolutionary Computation13(1) (2013), 34–46.

19.

Yang

X.S

, Natural-Inspired Metahuristic Algorithm, Luniver Press, UK, (2008).

20.

Rashedi

, Nezamabadi-pour

and Saryazdi

, GSA: A gravitational search algorithm, Information Sciences179(13) (2009), 2232–2248.

21.

Yger

, Lotte

and Sugiyama

, Averaging covariance for EEG signal classification based on the CSP: An empirical study, 23rd European Signal Processing Conference (EUSIPCO) (2015), 2771–2775.

22.

Zhang

, Zhou

, Jin

, Wang

and Cichocki

, Optimizing spatial patterns with sparse filter bands for motor-imagery based brain– computer interface, Journal of Neuroscience Methods255 (2015), 85–91.

23.

Buccino

A.P.

, Keles

H.O.

and Omurtag

, Hybrid EEG-fNIRS asynchronous brain computer interface for multiple motor tasks, PLOS ONE, doi:10.1371/journal.pone.0146610January 5, (2016).

24.

Kumar

, Sharma

and Tsunoda

, An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information, kumar et al. BMC Bioinformatics18(Suppl 16) (2017), 545. doi:10.1186/s12859-017-1964-6, pp. 125–137.