Abstract
Although electroencephalography (EEG) brain-computer interface (BCI) has been quite successful, multi-command control is still one of the key issues for external applications. Multimodal BCI represents the direction of dealing with this problem. In our study, five healthy subjects performed the experiment cooperatively. EEG and electromyography (EMG) were recorded synchronously. For individual EEG, after Laplacian filtering, the C3 and C4 channels were determined. Then, the EEG was decomposed into the third layer by wavelet packet transform (WPT), and the average, sub-band energy and mean square deviation were calculate at particular nodes. Finally, these features were fed into support vector machine (SVM) either singly or in combination, and the EEG classification accuracy was obtained. For individual EMG, the mean absolute value (MAV) and root mean square (RMS) were calculated. Then, probabilistic neural network (PNN) was employed, and the EMG classification accuracy was also obtained. Different mental and gesture tasks were combined to represent multi-class and these commands were ranked depending on their performance. The results showed that the subjects were able to obtain multi-class with satisfactory performance by multimodal BCI. The proposed interface could support multi-command control for external applications.
Keywords
Introduction
Brain-computer interface (BCI) is a unique channel of exchanging information between people and outside. Unlike normal neural and muscular channels, users can control or manipulate external applications only through brain activity, for example, electroencephalography (EEG), near-infrared spectroscopy (NIRS), magnetoencephalography (MEG), functional magnetic resonance imaging (fMRI) [1–5]. Due to many advantages of EEG, such as noninvasive, high temporal resolution, low cost, easy acquisition, it has been widely used in the current studies. According to the difference of brain pattern, there are three kinds of common EEG BCI, event related desynchronization/synchronization (ERD/ERS), P300 and steady-state visual evoked potential (SSVEP). Each type has its own requirements and characteristics. Despite the rapid progress of EEG BCI over the past two decades, multi-command control remains one of the major problems and prevents it from getting out of the lab and into reality. For example, the six control commands, up, down, left, right, confirmation and cancel, are required to satisfy the normal operation of the computer mouse. More complex operations or actions usually require more instructions. In order to increase the number of control instructions, the most straightforward solution is to use multi-class, such as five or seven classes [6, 7]. However, as the number of class increases, the complexity of mental tasks increases and the classification accuracy decreases gradually. In this case, multimodal BCI provides a solution to multi-class problem.
A purely simple BCI only uses one brain signal or brain pattern. Each type has its own characteristics and shortcomings. In order to give play to the advantages of various types of BCI, different brain signals or brain patterns are combined, known as hybrid BCI, also known as multimodal BCI [8, 9]. In layman’s terms, multimodal BCI is not only a combination of different BCI systems, but also a combination of different brain signals, such as EEG and NIRS [10]. In the following examples multimodal BCI was applied to address multi-command control problem for external devices or applications. Through a combination of different brain patterns, Li et al. employed P300 and mu/beta rhythm in a parallel manner to control the two dimensional computer cursor [11]. Yu et al. extended this system to browser operations, and Long et al. further employed the system to control the direction and speed of the wheelchair [12, 13]. Likewise, Allison et al. also controlled the vertical and horizontal position of the two dimensional cursor through ERD and SSVEP signals, respectively [14]. Through a combination of different brain signals, Ma et al. adopted electrooculography (EOG) and EEG to manipulate humanoid robots and mobile robots [15]. In the vast majority of multimodal BCI systems, the generation of independent instructions corresponds to one brain signal or brain pattern. In EEG BCI, SSVEP and P300 require scintillation stimulation, which can easily lead to fatigue. ERD relies on motor imagery (MI) or motor execution, which is easily controlled by the subjects autonomously without additional stimulation. In comparison, it is more suitable for practical applications as an appropriate choice. Among various brain signals, due to the high accuracy and simple convenient measurement of electromyography (EMG), it has been widely used in human-computer interaction. So it is treated as another option. Mental tasks and limb movements are closely related to EEG and EMG activity.
In the proposed scheme, the parallel combination of EEG and EMG is explored to generate multi-class order. The classification accuracy is obtained for a variety of different combinations, so that we can select the more appropriate commands for external devices or applications. The remainder of this paper is arranged as follows. Section 2 shows the data acquisition and methods. Section 3 describes the experimental results. Section 4 discusses the results, and Section 5 summarizes this paper.
Methods
In our experimental design, different mental and gesture tasks were combined to express multi-class. More specifically, the implementation of this system was illustrated in Fig. 1. Here the participants completed left hand (LH) or right hand (RH) MI and four hand gesture tasks simultaneously. The four hand gestures were identified as the wrist flexion (FLWR), wrist extension (EXWR), palm extension (EXPM) and index finger extension (EXIF). The EMG signal with the high accuracy was integrated into two-class BCI to obtain multi-class with satisfactory performance.

Multimodal BCI indicates 32-channel EEG and 4-channel EMG locations. It also points to signal processing and combination principle.
Five healthy college subjects performed the experiment cooperatively. Their age group was between 25 and 30 years old. The work was in accordance with the Declaration of Helsinki for experiments involving humans. The subjects agreed to participate. Prior to this, they had no similar BCI experience. Before collecting the data, they were brought together and were told the experimental procedures. They were fully aware of their tasks and agreed tocooperate.
The experiment was carried out in a quiet, closed dark lab. The subjects naturally leaned against the armchair, and a computer monitor located at about one meter in front of them to indicate the task. Each trial was divided into the preparation, execution, and rest phases. During the preparation phase 0–3 s, the black fixation cross always existed in the center of the computer screen. At 2–2.2 s, a voice prompted the subjects to be alert. The execution phase 3–7 s is the critical step, and the subjects followed the instruction to complete the corresponding task. At 3–7 s, an arrow pointing the left or right appeared continuously in the upper part of the computer screen, indicating LH or RH mental task. Meanwhile at 3.1–3.9 s, 4.1–4.9 s, 5.1–5.9 s and 6.1–6.9 s, four hints were randomly expressed in the lower part of the computer screen, and each appeared only once. The subjects also completed the corresponding action behavior. During the rest phase 7–9 s, the computer screen was white and the subjects could relax slightly. In this way, the trial ended along with the next start.
Data acquisition
Data acquisition was accomplished through the instrument actiCHamp manufactured by brain products GmbH. The original EEG and EMG were captured through the 32-channel actiCAP and four BIP2AUX adapters. For EEG, The position of the electrode was determined according to the international 10–20 system. For EMG, four pairs of electrodes were arranged in the flexor digitorum superficialis, palmaris longus, brachioradialis, and extensors. The common ground electrode was attached to the flexor digitorum superficialis. These muscle groups were closely related to the choice of hand gestures. The distribution of all the electrodes could be seen from Fig. 1. Prior to the measurement, all EEG electrodes had an impedance of less than 25k ohms. EEG sampling rate and filter range were set to 100 Hz and 0.5–200 Hz, and 50 Hz notch filter was employed to suppress power line interference. EMG was obtained through 1000 Hz sampling rate and 10–500 Hz band-pass filter [16].
For each subject, the data was collected in three sessions. Each session consisted of five runs with three minutes each, and contained 100 trials. In other words, the number of LH or RH and each hand gesture were 50 and 100 in one session. Between the sessions, the subjects were able to relax appropriately to concentrate. Each subject needed to be prepared before the measurement. The specific data acquisition time lasted one hour approximately. Different subjects completed the data acquisition on different days. EEG and EMG were recorded synchronously, and all data was stored in another computer foranalysis.
EEG identification
EEG identification consisted of three steps, namely preprocessing, feature extraction and classification. The role of preprocessing is generally used to remove artifacts and enhance the signal to noise ratio (SNR). Feature extraction produces distinctly different feature vectors. Using the features, EEG pattern recognition is done by the classification.
For the BCI study, it has previously been demonstrated that Laplacian filtering works better than other references, such as the common average reference (CAR). 100 Hz resampling EEG was enhanced by Laplacian filtering. ERD appeared in 8–12/18–25 Hz mu/beta rhythm when the subjects performed or imagined LH and RH movements, and the phenomenon was more pronounced in the C3 and C4 channels. According to previous experience, the C3 and C4 channels were selected for feature extraction [17–19].
EEG was typical non-stationary signal. Wavelet packet transform (WPT) was a common way to deal with this signal. The EEG was decomposed by WPT, and then the average, sub-band energy, and mean square deviation were calculated at particular nodes. According to the Shannon sampling theorem, the Nyquist frequency was half of the sampling frequency of 100 Hz. For different subjects, the optimal time period for obtaining the best performance is different. The optimized time period during the execution phase 3–7 s was determined, and then the EEG was decomposed into the third layer by WPT. The frequency range corresponding to each node was shown in Fig. 2.

Frequency range corresponding to each node for three layer WPT.
ERD phenomenon corresponding to LH and RH movement was mainly concentrated in the range of mu/beta rhythm. The two nodes S(3,1) and S(3,3) completely covered the range. Using the base function ‘db3’, three quantities were calculated in the determined nodes and channels. From two channels, two nodes and three quantities, 12-dimensional feature vectors were received.
Support vector machine (SVM) has been applied to pattern recognition of synchronous BCI, and achieved remarkable results. Here, SVM was realized through the toolbox libsvm to complete the EEG recognition task [20, 21].
EMG recognition was relatively simple, and was accomplished by feature extraction and classification. The mean absolute value (MAV) and root mean square (RMS) of EMG have been widely used as two typical time domain features due to their simple calculation and good performance. MAV was the most intuitive, and RMS could indicate muscle contraction strength. Four time periods 3.15–3.9 s, 4.15–4.9 s, 5.15–5.9 s and 6.15–6.9 s were determined for each trial. The features were extracted from five segments with 250 ms length and 125 ms overlap. These two features were estimated from each segment. All four channels were selected to form 40-dimensional feature vectors [22].
Probabilistic neural network (PNN) has become a popular solution to multi-class problem. It had many advantages, such as fast calculation speed, slightly complicated structure, etc. Therefore, it was identified as a tool for EMG classification[23].
Multi-class design
The purpose of this work was to explore the solution to BCI multi-class problem. To expand the number of control commands, different mental tasks and hand gestures were combined. First, single-mode EEG and EMG classification accuracy was calculated independently. Then, according to the combined scheme, the accuracy of the extended control commands was extrapolated from the aboveresults.
As mentioned earlier, limited control instructions have become an obstacle of BCI on the way forward, and could not meet the practical application requirements. The number of control commands should be determined depending on the actual situation. For example, three control commands, forward, reverse, and stop were sufficient for simple operation of the motor. In other cases, more control instructions were required. Depending on the situation, the number of control commands was determined, and the instructions were selected from high to low performance. Our multimodal BCI would provide two to eight commands for external devices or applications. It gave us the flexibility to select the number of control commands. The accuracy of each class was mathematically calculated as follows:
Single-mode
For each subject, we randomly selected two sessions as the training samples and the remaining session as the testing samples. There were 100 training trials and 50 testing trials in each MI task. For each hand gesture, the number of training and testing trials was 200 and 100.
The identification process involved building the model and testing. In the EEG training data, the average, sub-band energy, and mean square deviation were calculated independently and in combination, and then normalized to [0,1], and fed into the SVM classifier, where the penalty parameters and kernel functional parameter had been optimized. Different subjects had different optimized parameters. Through training, the required classification modes were established. In the next step, the similarly processed testing data was fed into the classification models, respectively. Here, LH and RH were the classes and the tuning parameters in SVM had been previously optimized. The classification results were received. When the combined features were used, the confusion matrix was shown in Table 1.
Confusion matrix for SVM using the combined features
Confusion matrix for SVM using the combined features
The EEG classification results were listed in Table 2 through independent feature and their combinations. When the average, sub-band energy and mean square deviation were employed as independent feature, the results were 0.808±0.043, 0.802±0.019 and 0.72±0.027, respectively. The average and sub-band energy were better than the mean square deviation approximately 0.08. When combining three independent feature as a whole, the performance rose to 0.88±0.016. The combination of features could represent more complete information, and was more effective than individual feature. This was useful for controlling external applications [24].
Results for different EEG feature and each kind of task
Similarly, using the EMG training data, the PNN model was established with the radial basis function of the optimized spread value. Different subjects had different optimized parameters. The different spread values had been previously tried and optimized in PNN classifier. The EMG testing data was sent to the PNN, and the results were given in Table 2. It also listed the LH and RH classification results through all three features, and the performance of MI task was above 0.86. The average classification performance of hand gestures FLWR, EXWR, EXPM and EXIF were 0.862±0.013, 0.854±0.021, 0.902±0.022 and 0.814±0.029, respectively.
In addition, EXPM was easier to recognize than the other three hand gestures. It would be preferred in the four options. These hand gestures achieved the desired performance.
Through the combined scheme, the performance of each command was indirectly estimated. In our design, the number of control commands was extended to eight classes. The results were illustrated in Fig. 3. When the number of control commands was eight, the corresponding performance of each class, namely LH+FLWR, LH+EXWR, LH+EXPM, LH+EXIF, RH+FLWR, RH+EXWR, RH+EXPM and RH+EXIF, was 0.745±0.025, 0.738±0.019, 0.780±0.032, 0.704±0.037, 0.772±0.023, 0.765±0.017, 0.808±0.033, and 0.730±0.039. These eight commands were ranked depending on performance. Among them, the best-performing command was RH+EXMP, and the worst-performing command was LH+EXIF. The average performance of these commands was greater than 0.7.

Average performance of eight combinations.
In addition, we could sacrifice some of these commands to further improve performance. In Table 3, the eight control commands were numbered, and the corresponding selection was listed.
Eight command number and selection results corresponding to two to eight commands
When the number of control commands was from two to eight, their corresponding performance was 0.794±0.020, 0.787±0.019, 0.781±0.019, 0.774±0.023, 0.768±0.025, 0.763±0.027, and 0.755±0.032. With the increase in the number of commands, the performance decreased, but the degree of decline was not obvious. When the number of commands was increased to eight, the lowest performance was reduced to 0.755±0.032. Therefore, this scheme could support multi-command output and improve the performance of multi-class. It provided a revelation for solving the limited types of MI tasks.
Feature extraction finds useful information hidden in the signal, and is critical to the subsequent classification. WPT as a time-frequency analysis method has been widely used in feature extraction of non-stationary signals, especially MI EEG signal. The Laplacian filtering improved the SNR of the EEG signal and promoted classification. The combination of features would provide more comprehensive measurement, and obtained the average performance 0.88±0.016. The individual feature, namely the average, sub-band energy, mean square deviation obtained the average performance 0.808±0.043, 0.802±0.019 and 0.72±0.027. The combined feature was superior to individual feature. Their combination was better than the average and sub-band energy about 0.8, and was better than the mean square deviation 0.16. These two points were conducive to produce satisfactory performance. When using WPT, there were usually four types of features, namely partial decomposition coefficients, sub-band energy, coefficients statistics and coefficient transformation [25]. The average in the time domain and the sub-band energy in the frequency domain were very typical features, which were superior to the mean square deviation. For the EMG signal, the MAV and RMS application has been matured in feature extraction, and achieved satisfactory performance in gesture recognition.
This work explored the increase in the number of control commands through multimodal BCI. This system was accomplished by combining EEG and EMG signal, which was induced by mental and gesture tasks. The results confirmed that this approach was feasible for increasing the number of BCI commands and meeting the accuracy requirements. The proposed interface could flexibly support multi-command, from two to eight, with the accuracy of greater than 0.75. For the four to eight classes, the average performance was 0.781±0.019, 0.774±0.023, 0.768±0.025, 0.763±0.027, and 0.755±0.032, respectively. This made sense for the development of BCI, and provided a way to further improve the performance by sacrificing the low performance control commands under a large number of classes. Through MI task as an option, many multi-class BCI studies have been conducted. The present work has achieved satisfactory performance for multi-class. For the four types of mental tasks of BCI Competition IV dataset 2a, Nicolas-Alonso et al. used the novel algorithm called the sequential updating semi-supervised spectral regression kernel discriminant analysis (SUSS-SRKDA) to obtain the average performance 0.77 [26]. This work obtained the average performance 0.781. Long et al. employed P300 and MI to generate five commands with the performance 0.754, and the performance of this work 0.774 was better than it [13]. For the six classes, Gandhi et al. employed an intelligent adaptive user interface (iAUI) to control multi-degrees-of-freedom robot [27]. Besides, Jiang et al. adopted the Morse code-inspired method based on permutation theory to obtain the performance 0.894. Despite its superior performance, the command generation process took more time [28]. Yi et al. classified seven kinds of mental tasks and obtained the average performance 0.704 [7]. But this work produced the seven classes with the average performance 0.763, and was better than it. Due to the limited number of mental task, more than eight classes were rarely explored in the current BCI studies. However, the actual applications usually required more classes. In this case, if we could produce enough classes, the overall performance was improved by sacrificing low performance commands. Due to the high performance of hand gesture recognition through EMG signal, in the next step we could combine more hand gestures with MI task to produce multi-class. Through this idea, enough commands are generated to meet the requirements. This solved the problem of multi-command control for external applications.
Through experimentation, we also found that which combinations could achieve better performance and were preferred. This interface supported multi-control commands for external devices or applications. The number of specific commands was determined depending on the actual situation. Since the EMG electrodes were arranged in the right arm, this may have certain impact on the EEG classification, resulting in relatively higher performance of RH.
Conclusions
In this study, we designed multimodal BCI that combined EEG and EMG to produce multi-class. In particular, the different mental and gesture taskswere combined to produce eight control commands, namely LH+FLWR, LH+EXWR, LH+EXPM, LH+EXIF, RH+FLWR, RH+EXWR, RH+EXPM and RH+EXIF. The entire signal processing analysis has been carried out by the Matlab program. Depending on the average performance, these commands were ranked as priority candidates. Depending on the ranking, the selected commands were determined and the interface would support two to eight output control commands for external applications. When the eight classes were used, the lowest accuracy was 0.755.
In summary, in the design of multimodal BCI, the combination of synchronous EEG and EMG produced multi-class with reliable performance. It provided the enlightenment for increasing the number of BCI commands. Since the algorithms were relatively simple and took less time, it was possible to control the online BCI applications in the near future.
In addition, future work for the study of multimodal BCI should consider brain mechanisms, experimental design, neural feedback and clinical application. The underling brain mechanisms, the design of novel experimental paradigm, the means of effective neural feedback, and the practical clinical applications are worth exploring, and these aspects make sense for the further development of multimodal BCI.
Footnotes
Acknowledgments
The author thanks the friendly associate editor and the anonymous reviewers for giving constructive suggestions that led to an improved version of the paper. The author would also like to thank all of the participants for their cooperation.
