Abstract
In order to develop an efficient brain-computer interface system, the brain activity measured by electroencephalography needs to be accurately decoded. In this paper, a motor imagery classification approach is proposed, combining virtual electrodes on the cortex layer with a convolutional neural network; this can effectively improve the decoding performance of the brain-computer interface system. A three layer (cortex, skull, and scalp) head volume conduction model was established by using the symmetric boundary element method to map the scalp signal to the cortex area. Nine pairs of virtual electrodes were created on the cortex layer, and the features of the time and frequency sequence from the virtual electrodes were extracted by performing time-frequency analysis. Finally, the convolutional neural network was used to classify motor imagery tasks. The results show that the proposed approach is convergent in both the training model and the test model. Based on the Physionet motor imagery database, the averaged accuracy can reach 98.32% for a single subject, while the averaged values of accuracy, Kappa, precision, recall, and F1-score on the group-wise are 96.23%, 94.83%, 96.21%, 96.13%, and 96.14%, respectively. Based on the High Gamma database, the averaged accuracy has achieved 96.37% and 91.21% at the subject and group levels, respectively. Moreover, this approach is superior to those of other studies on the same database, which suggests robustness and adaptability to individual variability.
Introduction
Currently, clinical evaluation of Electroencephalography (EEG) is essential for monitoring electrical activities in the brain. As an important diagnostic tool, it can investigate patients with unexplained changes in mental state to provide valuable information for the neurophysiological evaluation of patients with epilepsy and brain injuries [1, 2]. The brain-computer interface (BCI) is a channel for the brain to communicate with the outside world. Through this interface, electrical signals are directly read from human brain, with their meanings analyzed and converted into control signals to control external devices. The BCI can assist patients with movement disorders such as amyotrophic lateral sclerosis (ALS), spinal cord injury (SCI), epilepsy and brain injuries in rehabilitation training [3–8].
The key is to identify motor imagery (MI) based on EEG and translate MI into specific motor intentions. EEG is the comprehensive manifestation on the surface of the scalp caused by the electric potential and magnetic field generated by the electrophysiological activity of nerve cells in the brain. Electrical activities in the brain are measured by the non-invasive method which uses electrodes fixed to the head as sensors. The electrical signals in the deep part of the brain have been attenuated to the surface of the scalp through multiple layers of brain tissue, resulting in a low signal-to-noise ratio, susceptibility to interference, difficulty in feature extraction, and low classification accuracy. Therefore, how to quickly and efficiently extract motion intention information from MI EEG signals has become a focus in current researches. In this study, a deep learning framework is developed to meet the challenge.
Literature survey
Recently, deep learning (DL) has attracted extensive attentions due to its excellent performance in multiple fields. Many studies have shown that DL plays a key role in the EEG motion intention detection and has achieved satisfying results [5, 9]. Kim et al. proposed the SUTCCSP algorithm that achieved an accuracy of 77.77% [10]. Kumar et al. proposed the CSP-DNN framework, which obtained an accuracy of 90% [11]. Sakhavi et al. used transfer learning, and its FBCSP algorithm achieved an accuracy of 70% in classifying four types of EEG [12]. Pinheiro et al. used the RNA algorithm which obtained an accuracy of 74.965% [13]. Corley et al. proposed a novel deep EEG super resolution (SR) approach based on generative adversarial networks (GANs), which achieved an accuracy of 82% [14].
As a prospective EEG recognition method, the convolutional neural network (CNN) has been applied to decode EEG signals as well in BCI [15]. Hou et al. created 10 sensitive areas of the MI- EEG on the cortex, performed time-frequency analysis on them, and proposed an ESI and CNN method that achieved a global accuracy of 94.5% for four classifications on the Physionet database in 2019 [16]. Li et al. divided the MI cycles and frequency bands covered by mu and beta rhythms into ten time windows and three sub-bands respectively, performed fast Fourier transform on the signals of each electrode, and then took the calculated average power as the time-frequency characteristics of MI EEG, and the global accuracy of VGG network reached 92.28% in 2019 [17]. Kwon et al. proposed a subject-independent framework based on deep CNNs, formulated discriminative feature representation as a combination of the spectral¨Cspatial input embedding the diversity of the EEG signals in 2019 [18]. Amin et al. proposed the MCNN method to fuse different features and CNN architecture in 2019, which achieved an accuracy of 75.7% and 95.4% respectively on the BCI Competition IV-2a dataset and the High Gamma dataset [19]. In 2020, they also proposed a multiple-CNN feature fusion architecture with variable filter size and split convolution which can extract spatial and temporal information from the raw EEG data, with certain potential in the four types of MI related research [20]. Lun et al. selected the raw MI EEG signals of nine pairs of electrodes on the motor cortex area, then used the deep CNN to classify the subjects’ four MI tasks on the Physionet database, which achieved an average accuracy of 97.28% in 2020 [21]. Song et al. proposed a method for recognition of MI EEG signals based on S-transform time-frequency image combined with CNN and extreme learning machine (ELM), and successfully applied it in the BCI competition dataset in 2020 [22]. Roy et al. used the CNN framework to continuously decode MI EEG signals of subjects in 2020, and the average continuous decoding accuracy among subjects reached 71.49% on BCI competition IV-2b dataset [23].
Although many scholars have made remarkable achievements in this field, the classification accuracy of BCI system still has room for improvement, and there is still a gap with the practical application standard. The large amount of multidimensional data increases the difficulty of processing and calculation. It also becomes an obstacle of theoretical researches applied to practice. In this paper, an innovative method is presented for human motion image recognition based on virtual electrode pairs. First, the EEG signals are inverted from the scalp layer to the cortical regions of the brain, creating nine pairs of virtual electrodes with equal coverage and each pair containing higher SNR information. Then, time-frequency analysis is made to extract the features of the time and frequency sequence from the virtual electrodes on the cortex layer. Finally, a new CNN model is proposed to classify MI tasks in which only one pair of virtual electrodes is selected as the data source. Then, excellent MI EEG classification effect is obtained.
Contribution of this paper
The main contributions of this paper are summarized as follows.
(1) A novel CNN structure is introduced to detect four-class MI intentions based on the virtual EEG electrode pairs.
(2) The individual and group-wise performance of subjects on two public MI database is superior to the existing studies, which verifies that this method can decode the related features of the EEG.
(3) This framework requires only one pair of virtual electrodes, which reduces the dimension of the sample and the processing difficulty. Therefore, the MI classification tasks are easily transferred and implemented, which provides a new idea for simplifying the design of the BCI system.
Organization of this paper
This paper is organized as follows: In section 2, the dataset, preprocessing and CNN are introduced. In section 3, the results of classification are analyzed and discussed in detail. In section 4, a summary is made.
Materials and methods
The framework
The proposed framework is shown schematically in Figure 1.

The framework of the proposed approach.
(1) The original EEG data of 64 channels were obtained from the Physionet MI EEG database, including four MI tasks: left fist, right fist, both fists and both feet.
(2) The EEG signal is mainly interfered by external non-biological noises and internal biological noises. The external non-biological noises mainly refer to power frequency interference. The Physionet MI database is the dataset collected by BCI2000 instrument at a power frequency of 60Hz [24, 25]. This paper used a 60Hz notch filter to effectively remove power frequency interference. The internal biological noises were generated by other biological activities of the subject. Since EEG signals contain sensorimotor rhythms (SMRs), the mu rhythms and the beta rhythms contain the most relevant brain activities when imagining limb movements, corresponding to the frequency bands of alpha (8-13Hz) and beta (13-30Hz) [26]. In this paper, a band pass filter of 8-30 Hz was used to remove respiratory or eye movement noises at low frequencies (<1 Hz) and muscle noises at high frequencies (40 Hz-140 Hz).
(3) Due to the reason that the subject’s brain anatomy cannot be obtained, the colin template [27] was used to calibrate magnetic resonance imaging (MRI) and EEG at the three points of nasion, left preauricular point, and right preauricular point [28]. A real head model was established conforming to the actual human brain and inverted the EEG signal of the scalp layer to the cortex layer by computing sources which contain forward and inverse problems.
(4) Nine pairs of virtual electrodes were created on the cortex layer, and their signals were subjected to the Morlet wavelet time-frequency analysis.
(5) A CNN classification model was proposed, using four CNN layers to learn EEG features, four layers of max pooling to reduce dimensions, and a fully connected (FC) layer to classify four MI tasks.
In this work, the Physionet database was used which contained 109 different subjects [25, 29]. Each subject performed 4 MI tasks, i.e., left fist, right fist, both fists, and both feet, which are called T1, T2, T3, and T4 respectively. Twenty-one trials were conducted for each MI task. Each trial is 4 s duration with 160 Hz sample rate.
The symmetric boundary element method (BEM) was used to solve the positive and negative problems of the EEG, and the EEG signals of the scalp layer were inverted to the cortex layer by calculating the source. According to the international 10-10 system, the nine electrodes on the scalp layer (FC5, FC3, FC1, C5, C3, C1, CP5, CP3, CP1) were manually projected onto the cortex layer to form virtual electrodes (LP1-LP9), and electrodes (FC6, FC4, FC2, C6, C4, C2, CP6, CP4, CP2) were projected onto the cortex layer to form virtual electrodes (RP1-RP9). Virtual electrodes LP1-LP9 and RP1-RP9 respectively form electrode pairs P1-P9. The placement of the virtual electrodes is shown in Figure 2.

The placement of the virtual electrodes. The positions of the virtual electrodes on the cortex layer are shown in the center and on the left and right.
The Morlet wavelet time-frequency map of the signals of each pair of virtual electrodes contains both the time information and frequency information of the MI signal. The EEG data after band-pass filtering contain all the information in the frequency range of 8-30Hz. Nine pairs of virtual electrodes were selected for time-frequency analysis of EEG signals in the range of 8-30Hz with 8Hz as the starting point and 1Hz as the step length to generate 23 frequency bands. The data length of each frequency band is 640, that is, the number of features on each band is 640. Therefore, the size of the time-frequency graph matrix of the generated virtual electrode pair is 640∗46.
Ten subjects (S1∼S10) were selected to train the model and verify the classification performance. After time-frequency analysis on each task of each subject, the dataset of each subject was divided into seven parts, and six pieces were selected as the training set in turn, while the rest were used as the test set. Cross validation was carried out for all the experiments.
Preprocessing
Nerve excitation can be triggered by the internal stimulation of human body, such as the voluntary movement of limbs [30]. It is generally believed that somewhere in the brain makes the decision to move, and then the command signal is sent to the motor cortex to control the skeletal muscle to perform the corresponding action [31, 32]. MI signals can be detected by electrodes on the scalp, which is generated inside the brain and attenuated to the scalp through the cortex and skull layers. Therefore, effective data preprocessing operations are required to improve the signal-to-noise ratio of the signals [15].
First, raw EEG signals are filtered, and then the EEG signals on the cortex are extracted. The acquisition of the cortical EEG signals mainly comprises three parts: head model construction (EEG positive problem), source model construction (EEG inverse problem) and virtual electrodes construction.
The relationship between the nerve source and the measured value of the external sensor can be established by solving the positive problem of the EEG [33, 34]. The Poisson’s equation helps describe the relationship between the dipole source and the electric potential of a given geometry. Dipoles represent neurons that generate EEG signals in the brain, and the electrode potential can be detected by sensors. Therefore, the Poisson equation effectively connects neurons and sensors. The Poisson’s equation lays an effective foundation for solving the forward problem of the EEG [35]. It can be described as follows:
Where σ is the conductivity, I i is the volume current density generated by the current source in the brain, and ϕ represents the recorded scalp data.
The positive problem of EEG is to obtain the scalp EEG based on the distribution characteristics of the cortical source and the structural characteristics of the brain tissue. The expression is:
Where ϕ = [ϕ1, ϕ2 . . . ϕ n ] T represents the recorded scalp data, A is the lead field, N = [n1, n2 . . . n m ] T is the measurement noise, and X = [x1, x2 . . . x m ] T is the source activity [36, 37].
The inverse problem of the EEG is to derive the distribution of the intracranial cortex source based on the EEG of the scalp and the structural characteristics of the brain tissue. That is, given a scalp potential input, the dipole parameters are obtained through a certain state transition [38, 39].
The noise modeling is the foundation of the head model construction, which effectively prevents noise isomorphism and the phenomenon of false sources. Because different subjects have different noise levels, each subject separately constructs a noise covariance matrix, which is calculated from the EEG data of the resting state.
In this paper, a three-layer head volume conductivity model (cortex, skull, and scalp) is established by the symmetric boundary element method (BEM) to fit the known EEG [40, 41].
The influence of the head volume conduction model can be described by secondary sources located on the boundaries of different conductive areas:
Where ϕ (r) is the electric potential at point r, σ
r
is the conductivity at point r, and σ
s
is the conductivity at the source. ϕ∞ (r) is the potential of the source at point r in an infinite medium with σ
s
conductivity. Sn is the boundary of the nth region,
Where B (r) is the magnetic induction at point r, and B∞ (r) is the contribution of the primary source.
Both the cortical and subcortical nerve cells can produce bioelectrical signals as part of the measured EEG signal. However, the cortex layer contributes more to the EEG for two reasons: it has more nerve cells than the subcortical layer, and it is much closer to the electrodes on the scalp layer than the subcortical layer.
After BEM calculation, different areas on the cortex have different colors. These colors correspond to matrix analysis to indicate the degree of dipole activity. An active dipole has a more obvious image color. On the contrary, an inactive dipole has a less visible image [37, 42]. By setting appropriate thresholds, this paper can make the result more obvious.
The Morlet wavelet time-frequency analysis was performed on nine pairs of virtual electrodes on the cortex layer, and the theory was as follows:
The central frequency of the wavelet is f c , and its temporal resolution is defined by the full width at half maxima (FWHM) of its Gaussian core, as shown in formula (5) [43, 44]. When the frequency is 1 Hz, the FWHM of the Gaussian core at the central frequency is set to 3 s [28]. Then, the wavelet lengths of all other frequencies f are scaled by formula (6) to better capture the components of oscillatory activity. According to formula (7), the wavelet coefficients are calculated at an interval of 0.5 Hz and convolved with the selected waveform.
The convolutional layer, the pooling layer, and the FC layer are the basic components of a CNN. The convolutional layer mainly performs feature extraction. The local connection and weight sharing are mainly used in the operation of the convolutional layer. The local connection is that each neuron only connects with a region of the upper input layer, known as the receptive field, to extract features of appropriate size. The weight sharing refers to all the neurons in the same convolution subgraph with the same weights and deviation. Both of them can greatly reduce the parameters of CNN and accelerate the training speed [45–47].
The convolution kernel slides windows one by one on the upper input layer. Each parameter in the convolution kernel is equivalent to the weight parameter in the traditional neural network, which is connected to the corresponding local pixels. The sum of the parameters in the convolution kernel multiplied by the corresponding local pixels is added with a bias parameter to obtain the result of the convolutional layer [48, 49]. The formula is as follows [50]:
Where,
The pooling layer is usually used to reduce the size of the model, speed up the calculation, and improve the robustness of feature extraction. Three common pooling methods are: mean pooling, max pooling and stochastic pooling.
After feature extraction, the FC layer is used to perceive global information. Each neuron in this layer is connected not to the other neurons in the same layer but to all neurons in the previous layer. Therefore, the local features learned can be gathered together to form the global features.
In this paper, the output of the FC layer adopted the softmax function [16]:
The output is classified into four categories, [y1, y2, y3, y4], and presented in the form of probability. It derived the final prediction and mapped the output of multiple neurons to the interval (0, 1).
To determine the network structure of CNN, experimental methods are usually adopted, and small datasets are used to try as many combinations of network configurations as possible. We selected one subjects’ data as the data source and performed a series of experiments to determine the number of CNN layers and their parameters. In the experiment, each network ran 2,000 iterations. The results of several typical network structures are shown in Table 1.
Results of several typical network structures
Results of several typical network structures
The models of these network structures are convergent, and their loss values decrease with the increase of iterations and then reach the minimum value and stay basically stable. As can be seen from Table 1, the 4CNN+4Pool model has the highest accuracy and the lowest loss value, which makes the classification result reach the best level. We also compared the results of max and mean pooling. Since max pooling can effectively suppress the mean shift caused by network parameter error, the effect of max pooling is better.
Finally, we determined the model architecture of 4-layer CNN and 4-layer max pooling. Table 2 summarizes the design decisions and various parameters of the CNN model. The flow chart of the CNN classification model is shown in Figure 3.
Proposed CNN model architecture

The flow chart of the CNN classification model.
The first layer is the input layer; the second, third, fourth, and fifth are all CNN+Pool layers; the sixth, seventh, and eighth layers are Flatten, FC, and Softmax, respectively. The final layer is the output layer.
Among them, the structure of the four layers of CNN+Pool is the same, including convolution, batch normalization (BN), dropout, and pooling functions. In this paper, to avoid the vanishing gradient problem, we used the leaky ReLU function as the activation function [15, 16]. In addition, in order to reduce the risk of overfitting and enhance the generalization ability of the model, batch normalization (BN) [51] and 50% dropout [52] were adopted. The essence of neural network training is to learn the distribution of data. The EEG signal has a low signal-to-noise ratio and the differences among subjects are large, resulting in inconsistent distribution. BN places additional constraints on the distribution of the data, which can enhance the generalization ability of the model. Dropout refers to the "temporary dropping" of some neuron nodes with a 50% probability during network training, so for any one neuron, all training is randomly optimized with a different set of neurons. This process not only speeds up the training network, but also weakens the joint adaptability between neurons, reducing the risk of overfitting.
The output data of the CNN+Pool layer is then flattened into one-dimensional data and fed into an FC layer, and four classified outputs with one-hot encoding are generated by the softmax layer.
The loss and classification accuracy are calculated on the final output layer. We adopted the Adam algorithm as the optimizer; this can minimize the loss function and update the weight and bias through a backpropagation algorithm [53]. The learning rate of the optimizer is 1 × 10-5. Because each sample has a corresponding label with one-hot, the classification accuracy calculation is based on the comparison of four classified results and labels.
To evaluate the performances of the MI classification, multiple metrics were adopted, including accuracy, single class accuracy on each task, precision, recall, F1-score, Kappa, and Receiver Operating Characteristic (ROC) curve [15, 21].
Among these, accuracy, precision, recall, and F1-score are calculated by the four parameters of true positive, true negative, false positive, and false negative. The formulas are as follows:
TP is true positive; TN is true negative; FP is false positive; FN is false negative.
The Kappa coefficient is used for consistency tests and also to measure classification accuracy, which is calculated based on a confusion matrix.
The area under the ROC curve is represented by AUC, and the range is 0.5 ∼ 1. The closer the AUC is to 1.0, the higher the authenticity of this method and the better the classification effect.
We used the Physionet database to analyze feature information obtained from the virtual electrode pair and conducted subject-specific and group-wise experiments on the dataset of ten subjects. In addition, a public High Gamma database was selected to verify the generalization performance of this method. In all experiments, 7-fold cross validation was used: each experiment was carried out seven times, and the average value was taken as the global average accuracy, which reduced the randomness introduced by data division and helped to improve the stability of the model.
The time-frequency analysis results
In this paper, the Morlet wavelet time-frequency analysis method was used to obtain the feature information of the time and frequency sequence from the virtual electrode pair.
A set of time-frequency maps of virtual electrode pair P6 for Subject S5 in task T1 (left fist) is shown in Figure 4. The horizontal axis is the time range from 0s to 4s, and the number of sampling points is 640 at a sampling frequency of 160Hz. The vertical axis is frequency, and the step length is 1Hz, which increases linearly from 8Hz to 30Hz in the frequency range.

A set of time-frequency maps of virtual electrode pair P6 for Subject S5 in task T1 (left fist). (A) Time-frequency maps of LP6. (B) Time-frequency maps of RP6.
The time-frequency maps show the intensity of the MI signal at different times and frequencies. The legend in the figure shows the relationship between the signal strength and the color of the time-frequency maps. The brighter the time-frequency maps, the stronger the signal strength.
On the dataset of ten subjects from the Physionet MI database, we conducted ten groups of subject-specific experiments to get the average accuracy and the accuracies of four MI tasks. The results of the subject-specific have been reviewed in Figure 5.

Subject-specific accuracy. (A) Line chart of the accuracy. (B) Column chart of four MI tasks’ accuracy: left fist (T1), right fist (T2), both fists (T3), and both feet (T4).
From Figure 5A, it can be seen that the highest accuracy is 99.13%, achieved by Subject S4 and S10, and the lowest is 97.50%, achieved by S2. Therefore, the average accuracy of a single subject is 98.32%. The accuracies of four MI tasks are shown in Figure 5B. The highest accuracy on T1 is 99.89%, achieved by S10, and the lowest is 98.81%, achieved by S9. The highest accuracy on T2 is 99.33% (S9), and the lowest is 96.90% (S3). The highest accuracy on T3 is 99.89% (S10), and the lowest is 96.10% (S2). The highest accuracy on T4 is 98.70% (S9), and the lowest is 95.82% (S2). The results indicate that this proposed method has succeeded in classifying four MI tasks.
The average accuracies of four MI tasks on ten subjects are 99.46% (T1), 98.23% (T2), 98.09% (T3), and 97.53% (T4), respectively. The accuracy of T1 is the highest, indicating that the classification effect on the left fist is the best, while that of T4 is the lowest, indicating that the classification effect on both feet is the worst.
For each subject, we also conducted nine groups of experiments to get the accuracy of each pair of electrodes. As listed in Table 3, the highest accuracy is 100%, and the lowest is 94.73%. On average, the highest accuracy is 99.21% on P1, and the lowest is 97.59% on P6. The proposed method has achieved competitive results and can handle the challenge of MI classifying.
The accuracy of electrode pairs in the subject-specific experiments
We also conducted the group-wise experiments to get the classification performance of ten subjects. The classification effect of the group-wise prediction is measured by common evaluation metrics, such as accuracy, precision, recall, and F1-score, as shown in Figure 6A. The larger the values, the better the classification performance. The median values of these evaluation metrics, in turn, are 96.23%, 96.21%, 96.13%, and 96.14%. Kappa is used to represent the consistency of classification, with a value between 0 and 1. The Kappa value in this paper reaches 94.83%, indicating that our method has high consistency and low error cost.

Classification performance of ten subjects. (A) Box plot of ten subjects. (B) Accuracy confusion matrix of ten subjects.
Figure 6B illustrates the confusion matrix of ten subjects for each task. The values in the diagonal lines represent the correct classification, and the other values represent the misclassification. It can be observed that correct classification rates for four MI tasks are at 99.12% (T1), 95.45% (T2), 94.99% (T3), and 96.08% (T4), respectively.
For each pair of electrodes, the ROC curve and the AUC of ten subjects are shown in Figure 7A. We can see that the AUC stands out at 0.999 on P6 and P7, followed by 0.998 on P1, P2, and P3, 0.997 on P7, P8, and P9, and the lowest value being 0.994 on P5. The results demonstrate that our method has lower class skew and error costs.

Performance comparison of ten subjects on each pair of electrodes. (A) Comparison of ROC curve and AUC. (B) Comparison of the global average accuracy.
The global average accuracy curves of ten subjects are visible in Figure 7B. The curves of the proposed method on each pair of electrodes can reach the stable state after 500 iterations. It can be seen that in the whole iteration process, the accuracy curve is smooth, with little burr and stable performance.
As can be seen from the above results, our proposed method has achieved excellent performance in group-wise prediction and is effective and efficient in the classification of human motion intents.
In this paper, we selected the public High Gamma database to verify the generalization performance of our proposed method. This database performed four EEG tasks: left hand movement, right hand movement, both feet movement, and rest. It consisted of 14 subjects (eight males and six females), using 128 electrodes to record EEG signals at a sampling rate of 500Hz. For each subject, 880 trials were conducted for training and 160 for testing. In this paper, only 18 electrodes highly correlated with the MI task were used. We selected the data in the time window [0s-4s] of each trail and resampled the signals to 160Hz. More details about this database can be found in the literature [54, 55].
Based on a data set of 14 subjects, all the experiments used 7-fold cross validation. Using the global average accuracy as the final model performance can prevent data imbalance and avoid model bias. When classifying specific subjects, the accuracies of subjects, from 1 to 14, are 95.74%, 96.03%, 96.33%, 97.20%, 95.96%, 96.51%, 96.59%, 95.86%, 95.90%, 97.54%, 96.31%, 95.08%, 97.48%, and 96.64%, respectively. The global average accuracy is 96.37%. In the group-wise classification, accuracy and Kappa can reach 91.21% and 90.17%, respectively. The results indicate that our proposed method can deal with individual variation well due to its effectiveness and generalization on different data sets.
Discussion
In BCI systems, the EEG can change over minutes due to the differences in the physiological and psychological characteristics of each subject at each time. Studies on subject-independent BCIs based on DL are scarce. Conventional methods worked well in the subject-dependent case but fared poorly for the subject-independent case because brain signals from different subjects are highly variable, discriminative, and semantic. In this paper, a new MI classification method combining cortical virtual electrode with CNN, able to effectively improve the performance of BCI, is proposed.
In this work, the training set was sent to the CNN for operation. The loss value decreased with the increasing of iteration and was kept basically stable so as to obtain the optimal training model. The test set was then sent to the optimal model to test the classification effect. In the whole process, we used spatial dropout and BN to reduce the risk of overfitting.
We conducted anti-overfitting experiments on subject-specific and group-wise to obtain their training and testing curves. Figure 8 shows the comparison of loss curves under anti-overfitting measures. It can be seen that the loss function curves without spatial dropout and BN have sharp burrs and unstable performance in the whole iteration process. However, the loss function under anti-overfitting measures has a smooth curve with small burrs.

The loss function curve of different subjects on the training set and the test set.
From the loss function curves of one subject under anti-overfitting measures, it can be seen that the training loss value almost reaches 0 when the iteration is about 200; then, the optimal training model is obtained. After testing the model, when the iteration is about 300, the loss value drops to about 0.008. However, under anti-overfitting measures, the training loss function curve of ten subjects reaches equilibrium after about 600 iterations, and the loss value reduces to about 0. The curve of test loss function gradually decreases to about 0.019 with the increase of iteration times.
In conclusion, our method is convergent in training and testing the model and can effectively reduce the risk of overfitting and improve the generalization and classification effect.
Based on the Physionet database, we compared the performance of our work with Kim et al. [10], Dose et al. [15], Ma et al. [56], and Hou et al. [16], as seen in Table 4. We also compared our work with Schirrmeister et al. [54], Amin et al. [19], and Tang et al. [55] in the High Gamma database, as shown in Table 5. It can be observed that this work has achieved superior performance in group-level and subject-specific prediction on different public databases.
Results comparison on the Physionet database
Results comparison on the High Gamma database
In general, the performance results illustrate that the proposed method improves the generalization ability and classification effect of the CNN model by increasing the inter-subject variability and is effective and efficient in the classification of human movement intention. The focus of our future work is to apply this method to the design of BCI systems, build a real-time EEG signal acquisition system, and use the self-built database to realize and verify the effectiveness and robustness of the proposed method.
In this paper, a novel approach of MI classification that combines virtual electrodes on the cortex layer and CNN is proposed. This method can accurately classify four MI tasks with only two electrodes, and converge to the prediction of specific subjects and groups. It has achieved the averaged accuracy, 98.32% and 96.23% (Physionet database), 96.37% and 91.21% (High Gamma database) respectively at the subject and group level. The results show that the proposed method can construct a generalized representation for both individual and group variation. In addition, through repeated experiments with 7-fold cross validation, the stability of the proposed method is verified. In general, this paper provides a new idea for developing effective and efficient BCI systems.
Footnotes
Acknowledgments
The authors would like to thank PhysioNet for providing EEG Motor Movement/Imagery Dataset.
