Training set extension for SVM ensemble in P300-speller with familiar face paradigm

Abstract

BACKGROUND:

P300-spellers are brain-computer interface (BCI)-based character input systems. Support vector machine (SVM) ensembles are trained with large-scale training sets and used as classifiers in these systems. However, the required large-scale training data necessitate a prolonged collection time for each subject, which results in data collected toward the end of the period being contaminated by the subject’s fatigue.

OBJECTIVE:

This study aimed to develop a method for acquiring more training data based on a collected small training set.

METHODS:

A new method was developed in which two corresponding training datasets in two sequences are superposed and averaged to extend the training set. The proposed method was tested offline on a P300-speller with the familiar face paradigm.

RESULTS:

The SVM ensemble with extended training set achieved 85% classification accuracy for the averaged results of four sequences, and 100% for 11 sequences in the P300-speller. In contrast, the conventional SVM ensemble with non-extended training set achieved only 65% accuracy for four sequences, and 92% for 11 sequences.

CONCLUSION:

The SVM ensemble with extended training set achieves higher classification accuracies than the conventional SVM ensemble, which verifies that the proposed method effectively improves the classification performance of BCI P300-spellers, thus enhancing their practicality.

Keywords

Brain-computer interface familiar face paradigm P300 speller SVM ensemble training set extension

1. Introduction

Brain-computer interfaces (BCIs) establish a new means of communication between the human brain and computers [1, 2, 3]. This communication method does not depend on the muscles, but uses the potential changes of the cerebral cortex to help patients with amyotrophic lateral sclerosis (ALS) or other severe motor disabilities to achieve real-time interaction with the outside world [4, 5].

The P300-speller is a widely used BCI system that allows users to communicate characters by focused attention. It is so named because it relies heavily on the P300 potential, which is a large positive potential, typically elicited at 300 ms to 500 ms after the subject is exposed to rare target stimulus in a series of conventional stimuli [6, 7, 8, 9]. A P300-speller with row-column paradigm (RCP) was first presented by Farwell and Donchin [10]. In the P300-speller, a 6 $\times$ 6 matrix consisting of letters and numbers is presented on a computer monitor. The user is then asked to fixate on a specific character he/she wishes to spell, and to silently count the number of times it flashes. Then, six rows and six columns flash consecutively in pseudo-random order, which is defined as a sequence. A P300 potential is elicited by the flashing row or column containing the target character, because the probability of each row (column) flashing is 1/6, which is a rare event (oddball stimulus). Thus, the target character can be determined by detecting the row and column that elicited the P300 potential. Owing to the low signal to noise ratio (SNR) of electroencephalography (EEG), one sequence is not enough to achieve satisfactory detection of P300 potential. In general, the sequence is repeated more than two times in most P300-speller studies, and the results of P300 potential detection in the sequences are averaged to improve the detection accuracy. Based on traditional RCP, Kaufmann proposed the familiar face paradigm, in which translucent pictures of familiar faces overlay a flashing row or column [11]. In addition to P300 potential, this paradigm also emphasized N170 and N400f potentials associated with the recognition of familiar faces [12]. Their results showed that a P300-speller with the familiar face paradigm has a higher SNR and significantly improves the classification accuracy [13]. However, it still cannot satisfy practical requirements and therefore needs to be further optimized.

In addition to the stimulus paradigm, researchers also focused on optimization of signal processing and classification algorithms to improve the performance of the BCI P300-speller [14, 15, 16, 17]. A simple linear discriminant analysis (LDA) with regularization can detect P300 potentials based on a large number of training sets [18]. Further, researchers have also introduced the zero-training framework with transfer learning [19, 20]. However, the transfer learning method also requires a significant amount of training data, even if these do not come from the user. Support vector machine (SVM) has been successfully applied for classifying the P300 potential [21]. SVM ensemble is an improvement to SVM that combines several individual SVM sub-classifier results to produce a final result. These differences among the sub-classifiers of SVM ensemble reduce the interference of EEG noise and the impact of sub-classifier errors [22]. SVM ensemble achieves more satisfactory classification performance than SVM [23]. It is well-known that the larger the scale of the EEG training set is, the better the classification performance can achieve [24]. However, during EEG data collection, subjects would gradually become fatigued, resulting in an obvious difference and inconsistency in the EEG data from the beginning to the end of the period. Using EEG data from different periods, SVM sub-classifier results would have deviations and affect the classification performance.

In this study, based on a P300-speller with the familiar face paradigm, we shrank the collection time for training data and developed a training set extension method for SVM ensemble that extends the small-scale training set to a larger scale training set by superposition and averaging of two corresponding training datasets in two sequences. Figure 1 shows the flow diagram for a P300 speller system with the training set extension.

Figure 1.

Flow diagram for the P300 speller system with training set extension and parameter $C$ optimization.

Figure 2.

The familiar face paradigm (the characters were overlaid with translucent pictures of a famous face (David Beckham) when one row or column was flashed). (The photos of David Beckham are replaced here by that of one of the subjects owing to lack of a print license).

2. Materials and methods

2.1 Familiar face paradigm

In this study, visual stimulation was achieved via the familiar face paradigm [11, 25]. Thirty-six spelling characters were presented in a 6 $\times$ 6 matrix and arranged on a grid of 13.4 ${}^{\circ}∼{}\times$ 19.4 ${}^{\circ}$ (Fig. 2.). The rows and columns of the matrix were continuously flashing in a pseudo-random order. When one row or one column flashed, the characters were overlaid with translucent pictures of the famous face (David Beckham).

2.2 Participants

Seventeen right-handed university students (six females: 21–26 years old, mean: 24.6 years) participated in this experiment. All participants were native Chinese speakers who were very familiar with the English alphabet. No participant used drugs or alcohol or had any history of neurological disease. They had normal or corrected-to-normal vision and normal hearing. Before the experiment, the subjects were required to get adequate sleep and be in good mental condition. Upon receipt of a complete description of the purpose and risk of the experiment, the subjects signed a written informed consent form for participation and received 100 RMB as compensation. The study was approved by the ethics committee of Changchun University of Science and Technology (CUST).

2.3 Data acquisition

Each subject was seated in a comfortable chair at a distance of about 70 cm in front of the computer monitor in a laboratory room shielded from sound and electromagnetic interference. Stimuli were presented on a 19 ${}^{\prime\prime}$ screen with a resolution of 1280 $\times$ 1024 px and a refresh rate of 60 Hz [26]. They were instructed to focus on the target character, try to avoid eye movements during the course, and mentally count the number of times the target character flashed. Six different five-character words were chosen as the experimental spelling task, and the spelling of each word was considered as one separate session. There were five runs in each session, and a target character was communicated in one run. In order to further analyze the classification performance with the increased number of sequences offline, the sequence was repeated 15 times in one run. We set the inter stimulus interval (ISI) as 250 ms to minimize responses overlap, in which, each character changed to the famous face picture for 200 ms, and then returned to the gray character for 50 ms [25]. Stimulus presentation was controlled by a computer running the application software Presentation 0.71 (Neurobehavioral Systems Inc.) (Fig. 3).

Figure 3.

Experimental procedure. There are six separate sessions, and a five-character word was required to communicate in each session. Between each session, a two- to five-minute break was arranged. There were five runs in each session, and a target character was communicated in one run. One row or column flash was one trial. Six rows and six columns flashing consecutively in pseudo-random order was defined as a sequence. The sequence was repeated 15 times in one run. The inter stimulus interval (ISI) was 250 ms, in which, each character changed to the famous face picture for 200 ms, and then returned to the gray character for 50 ms.

Figure 4.

Placement of 14 electrodes on the scalp (Fz, F3, F4, FC1, FC2, Cz, C3, C4, Pz, P3, P4, Oz, O1, and O2).

Before the formal experiment, subjects were assigned a 20-second practice run to familiarize themselves with the paradigm and the familiar face used as stimuli. Between each session, a two to five minutes break was arranged so that the subjects could maintain optimal focus.

EEG data were recorded via 14 electrodes (Fz, F3, F4, FC1, FC2, Cz, C3, C4, Pz, P3, P4, Oz, O1, and O2). Figure 4 depicts the placement of electrodes on the scalp [25]. The left mastoid was used as ground, and the right mastoid as the reference. A pair of horizontal electrooculogram (HEOG) electrodes was placed at the outer canthus of the left eye and the right eye to detect horizontal eye movements. Simultaneously, a pair of vertical electrooculogram (VEOG) electrodes was placed above and below the left eye to measure vertical eye movements. All impedances were kept below 5 k $\Omega$ . The bandpass of the hardware filter was 0–100 Hz. EEG data were amplified by a NeuroScan amplifier (SynAmps 2, Neuroscan Inc., Abbotsford, Australia), and recorded at a sampling rate of 256 Hz. Data were acquired and collected with the application software Scan4.5 (Neuroscan Inc.).

2.4 Preprocessing and feature extraction

EEG data were digitally filtered with 0.01 Hz high and 30 Hz low pass. The ocular artifacts were corrected with both HEOG electrodes and VEOG electrodes by a regression analysis algorithm [27]. The EEG data were then divided into epochs ranging from $-$ 100 ms to 500 ms relative to the start of a row or column’s flash, which we called EEG trials. All EEG trials elicited by the same sequence constituted an EEG sequence (six row EEG trials and six column EEG trials). Baseline correction was performed at the EEG trials from $-$ 100 ms to 0 ms.

Long feature vectors require longer processing time. To compress the EEG data, down-sampling was adopted for feature extraction. The sampling rate was reduced from 256 Hz to 64 Hz by taking one point for every four points. The original EEG trial consisted of 128 points for 500 ms. Following down-sampling, the number of points decreased to 32. As 14 channels were used for collection, the feature vectors of the EEG trials for the 14 channels were combined in the order F3, Fz, F4, FC1, FC2, C3, Cz, C4, P3, Pz, P4, O1, Oz, O2 as feature vector. Thus, the length of each feature vector was $32\times 14=448$ .

3. Training set extension and classification

3.1 Training set extension

An extension strategy that extended the small-scale training set to a larger scale training set by superimposing and averaging the corresponding EEG trials of two different EEG sequences was adopted. We assumed that a matrix of $M\times N$ characters was presented in a P300-speller, and $Q(Q\geqslant 2)$ sequences are required to communicate a target character. Each sequence consisted of $M+N$ trials, comprising $M$ column trials and $N$ row trials. $\textit{EEG}_{im}$ and $\textit{EEG}_{jm}$ respectively represent the column EEG trials elicited by the $m$ th column trial of the $i$ th sequence and the $j$ th sequence, respectively. The average column EEG trial is given by Eq. (1):

$\displaystyle\textit{AvgEEG}_{m}=\frac{\textit{EEG}_{im}+\textit{EEG}_{jm}}{2}% ,1\leqslant i\leqslant j\leqslant Q,1\leqslant m\leqslant M$ (1)

Similarly, $\textit{EEG}_{in}$ and $\textit{EEG}_{jn}$ represent the row EEG trials elicited by the $n$ th row trial of the $i$ th sequence and the $j$ th sequence, respectively. The average row EEG trial is given by Eq. (2):

$\displaystyle\textit{AvgEEG}_{n}=\frac{\textit{EEG}_{in}+\textit{EEG}_{jn}}{2}% ,1\leqslant i\leqslant j\leqslant Q,1\leqslant n\leqslant N$ (2)

Thus, the training set was extended by applying the following steps:

1) 1)

Get two EEG sequences from $Q$ EEG sequences.

In accordance with Eqs (1) and (2), average two corresponding column EEG trials ( $M$ columns) and two row EEG trials ( $N$ rows) in two EEG sequences, respectively.

$M∼{}\textit{AvgEEG}_{m}(m=1,2,\ldots,M)$ and $N∼{}\textit{AvgEEG}_{n}(n=1,2,\ldots,N)$ constitute a new EEG sequence.

Select another two EEG sequences, at least one of which differs from the selected EEG sequences.

Repeat Steps 2) to 4) until a total of $C_{Q}^{2}$ kinds of combinations are obtained, and extend the training set from $(M+N)\times Q$ to $(M+N)\times C_{Q}^{2}$ EEG trials. Figure 5 presents the flow diagram of the training set extension process, in which the training set is used to communicate one target character with $Q$ sequences.

In our case, $M=6$ , $N=6$ , $Q=15$ , $C_{Q}^{2}=C_{15}^{2}=105$ , and one session was chosen as the training set. Therefore, the number of EEG trials in the training set was extended from $(6+6)\times 15\times 5=900$ to $(6+6)\times 105\times 5=6300$ .

Figure 5.

Flow diagram of training set extension process, in which the training set is used to communicate one target character with $Q$ sequences.

3.2 SVM and regularization parameter C

SVM is a binary classifier that constructs a hyperplane by dividing the data space into two subsets [21]. The hyperplane is always described by weight vector $w$ and bias $b$ ; thus, the label of a data vector $x$ is predicted as

$\displaystyle f(x)=w\cdot x+b$ (3)

Among several hyperplanes, only the one with the largest separation margin $\gamma$ is described as an optimal hyperplane, and the vectors are called support vectors. To yield the margin $\gamma=2/{\left\|w\right\|}$ , it is necessary to satisfy Eq. (4):

$\displaystyle y_{i}(w\cdot x_{i}+b)\geqslant 1$ (4)

To measure the deviation of training data outside the $\varepsilon$ -insensitive zone, we introduced slack variables $\xi_{i}(\geqslant 0,i=1,2,\ldots,n)$ . Thus, SVM is evolved into a minimization of the following formula:

$\displaystyle\frac{1}{2}\|w\|^{2}+C\sum\limits_{i=1}^{n}{\xi_{i}}$ (5)

In terms of positive Lagrangian multipliers $\alpha_{i}$ , we rewrite Eq. (5) subject to

$\displaystyle\begin{cases}0\leqslant\alpha_{i}\leqslant C\\ \sum\limits_{i}{\alpha_{i}y_{i}}=0\\ {w=\sum\limits_{i}^{n}{y_{i}\alpha_{i}x_{i}}}\\ \end{cases}$ (6)

Integrating Eqs (3) and (6), and replacing $(x\cdot x_{i})$ by kernel function $K(x\cdot x_{i})$ , we get,

$\displaystyle S(x)=\sum\limits_{i}^{n}{y_{i}\alpha_{i}K(x\cdot x_{i})+b}$ (7)

In this study, a Gaussian kernel was used for all SVM sub-classifiers:

$\displaystyle K(x\cdot x_{i})=\exp\left({-\frac{\|{x-x_{i}}\|^{2}}{2\sigma^{2}% }}\right)$ (8)

The performance of the SVM classifiers depends on the setting of hyper-parameter $C$ , which cannot be derived [28]. In this study, we performed a grid search using six sessions of EEG trials to optimize parameter $C$ . The performance of the SVM ensemble classifier was estimated by six-fold cross validation with values of hyper-parameter $C$ of 0.01, 0.05, 0.1, 0.5, and 1.0 [29, 30, 31].

All EEG trials in each session were treated as one group. Thus, $T(T=6)$ groups were constructed. It was assumed that $g_{t}(t=1,2,\ldots,6)$ was the $t$ th group. Five groups were used as the training set, and the remaining group was used as the test set. The cross-validation process was then repeated $T$ times (the folds), with each of the $T$ groups used exactly once as the test set. The average classification accuracies of each group were computed. $A(g_{t})$ represents the $t$ th group’s mean accuracy. Thus, the average classification accuracy of the $P(P=17)$ subject is

$\displaystyle\textit{acc}=\frac{1}{P}\frac{1}{T}\sum\limits_{p=1}^{P}{\sum% \limits_{t=1}^{T}{A(g_{t})}}$ (9)

where acc is the character’s average input accuracy.

The average accuracies for the different values of $C$ were calculated, and the best value was set in the SVM classifier to use in the classification.

3.3 Classification

$K(K=7)$ SVM sub-classifiers were built, with each sub-classifier having its own training subset. To eliminate time-dependence and ensure effectiveness, six-fold cross validation was applied, in which one session after extension was used as the training set to train the sub-classifiers and the remaining five sessions were used as the test set. One session after extension as the training set in total comprised $12\times 105\times 5=6300$ EEG trials. The bagging algorithm was adopted to cluster these training EEG trials into seven partitions. Bagging is based on bootstrap [32], in which 900 EEG trials are taken from the training set randomly and used as a training subset, and then returned to the training set. This procedure was iterated seven times and seven training subsets were obtained for the seven sub-classifiers. Each sub-classifier was trained by its own training subset. To integrate the final classification results, each sub-classifier of the SVM ensemble was assigned a weight $w_{k}(k=1,2,\ldots,7)$ using the equal prediction error weighting method [33, 34]. $\sigma_{k}$ is the standard deviation of the $k$ th sub-classifier’s accuracy error. The coupling weights of the sub-classifiers are determined by

$\displaystyle w_{k}=\frac{1}{\sigma_{k}\sum\limits_{i=1}^{K}{\frac{1}{\sigma_{% i}}}}$ (10)

In addition to the training session, the five remaining sessions were arranged as a testing set. Each sequence consists of $M+N=6+6=12$ trials. It was assumed that $\textit{EEG}_{im}(i=1,2,\ldots 15;m=1,2,\ldots,6)$ is the $m$ th column EEG trial of the $i$ th EEG sequence, and $\textit{EEG}_{in}(i=1,2,\ldots,15;n=1,2,\ldots,6)$ is the $n$ th row EEG trial of $i$ th EEG sequence. $\textit{VEC}_{im}$ was applied as the feature vectors of $\textit{EEG}_{im}$ , and $\textit{VEC}_{in}$ as the feature vectors of $\textit{EEG}_{in}$ . When $\textit{VEC}_{im}$ and $\textit{VEC}_{in}$ were classified by all $K(K=7)$ sub-classifiers of the SVM ensemble, they were respectively labeled with $\textit{LAB}_{\textit{imk}}(k=1,2,\ldots,7)$ and $\textit{LAB}_{\textit{ink}}$ . When sub-classifier $k$ identified the EEG trial as containing P300, the label was set to 1, and $-$ 1 when there was no P300.

After SVM classification, weight voting was applied to integrate the results. The feature vectors of each sequence in the 12 trials were classified by seven sub-classifiers; thus, each trial had seven results matched with seven sub-classifier weights. Each EEG trial’s seven results were summed up by seven weights to obtain a final score.

Table 1

Average classification accuracy for 17 subjects with different values of $C$

Number of sequences	$C=$ 0.01	$C=$ 0.05	$C=$ 0.1	$C=$ 0.5	$C=$ 1
1	75.3	72.9	73.5	74.1	74.1
2	94.7	89.4	91.2	90.0	90.0
3	97.6	97.1	97.1	97.1	97.1
4	99.4	97.6	97.1	97.1	97.1
5	99.4	98.8	98.2	97.6	97.6
6	99.4	98.8	98.2	98.8	98.8
7	100	99.4	99.4	99.4	99.4
8	100	99.4	100	100	100
9	100	99.4	100	100	100
10	100	99.4	100	100	100
11	100	100	100	100	100
12	100	100	100	100	100
13	100	100	100	100	100
14	100	100	100	100	100
15	100	100	100	100	100

The number of sub-classifiers used for SVM ensemble was $K(K=7)$ . Assuming that $w_{k}$ is the weight of $k$ th sub-classier. $Q(Q=1,2,\ldots,15)$ referred to the sequence number used for classification, and the score of $Q$ sequences for $\textit{EEG}_{m}$ was

$\displaystyle S_{Q}(m)=\sum\limits_{i=1}^{Q}{\sum\limits_{k=1}^{K}{(w_{k}% \times\textit{LAB}_{\textit{imk}})}}$ (11)

The score of $Q$ sequences for $EEG_{n}$ was

$\displaystyle S_{Q}(n)=\sum\limits_{i=1}^{Q}{\sum\limits_{k=1}^{K}{(w_{k}% \times\textit{LAB}_{\textit{ink}})}}$ (12)

To derive only one decision from these seven SVM sub-classifiers, we considered that the row and column with the highest score should be the row and column that expected to contain the target character. To get the most probable column trial in $Q(Q=1,2,\ldots,15)$ sequences, Eq. (12) was applied:

$\displaystyle\textit{column}_{Q}=\arg\max\limits_{1\leqslant m\leqslant 6}(S_{% Q}(m))$ (13)

The most probable row trial in $Q(Q=1,2,\ldots,15)$ sequences was

$\displaystyle\textit{row}_{Q}=\arg\max\limits_{1\leqslant n\leqslant 6}(S_{Q}(% n))$ (14)

According to $\textit{column}_{Q}$ and $\textit{row}_{Q}$ , a target character can be confirmed.

Table 2

Average sequences when accuracy achieved 100% with different values of $C$

Subjects	$C=$ 0.01	$C=$ 0.05	$C=$ 0.1	$C=$ 0.5	$C=$ 1
Subject 1	1.5	1.5	1.5	1.5	1.5
Subject 2	2	6	4.5	4.5	4.5
Subject 3	3	4	4	4	4
Subject 4	2.5	2.5	2.5	2.5	2.5
Subject 5	3.5	3.5	3.5	3.5	3.5
Subject 6	2	2	2	2	2
Subject 7	1.5	1.5	1.5	1.5	1.5
Subject 8	1	1	1	1	1
Subject 9	1.5	2	2	2	2
Subject 10	2	1.5	1.5	1.5	1.5
Subject 11	2	3	3	3	3
Subject 12	2	2.5	1.5	1.5	1.5
Subject 13	2	2.5	2	2.5	2.5
Subject 14	2	2	2	2	2
Subject 15	2	3.5	3.5	4.5	4.5
Subject 16	2	1.5	1.5	1.5	1.5
Subject 17	1.5	2	2	2	2
Average $\pm$ SD	2 $\pm$ 0.6	2.5 $\pm$ 1.2	2.3 $\pm$ 1	2.4 $\pm$ 1.1	2.4 $\pm$ 1.1

Note: SD refers to standard deviation among the 17 subjects.

Table 3

Average classification accuracy for the 17 subjects with the number of sequences

Number of sequences	Non-extended training set	Extended training set
1	27.5 $\pm$ 17.0	–
2	45.9 $\pm$ 19.7	58.7 $\pm$ 14.7
3	59.7 $\pm$ 17.0	68.7 $\pm$ 13.1
4	65.2 $\pm$ 19.0	85.5 $\pm$ 9.9
5	72.3 $\pm$ 15.7	92.2 $\pm$ 9.0
6	79.1 $\pm$ 14.8	95.8 $\pm$ 7.8
7	84.3 $\pm$ 11.4	97.6 $\pm$ 5.6
8	86.4 $\pm$ 9.4	99.1 $\pm$ 3.1
9	89.7 $\pm$ 9.4	99.7 $\pm$ 1.3
10	91.2 $\pm$ 8.3	99.9 $\pm$ 0.2
11	92.7 $\pm$ 6.4	100
12	93.5 $\pm$ 5.8	100
13	94.1 $\pm$ 5.3	100
14	94.8 $\pm$ 5.1	100
15	95.4 $\pm$ 5.3	100

Note: The format of results is Average $\pm$ SD, where Average is the average accuracy of the 17 subjects, and SD is the standard deviation among the 17 subjects.

4. Results

4.1 Results for regularizing parameter C

To compare the effects of different values of parameter $C$ for the SVM ensemble, we calculated the average accuracy of the 17 subjects with different values of parameter $C$ . The results are listed in Table 1. In addition, the fewer sequences used for classification, the higher the character transfer rate (CTR) achieved. Therefore, we counted the average sequences of 17 subjects with different values of $C$ for 100% accuracy achieved (Table 2).

4.2 Results for training set extension

To verify the effect of training set extension with small training set, one session was used to train the sub-classifiers, and the other five sessions were used to test. We calculated the individual and average accuracies of the 17 subjects, respectively using the non-extended training set and the extended training set. The average classification accuracies with the number of sequences are listed in Table 3. The individual and average classification accuracies for the 17 subjects are shown in Fig. 6.

Figure 6.

Individual and average classification accuracy for the 17 subjects.

5. Discussion

5.1 Optimization of regularizing parameter C

It is clear that when $C$ is 0.01, the accuracy achieved is approximately 95% using only two sequences, and seven sequences are needed to achieve 100% accuracy. When $C$ is 0.05, two sequences result in almost 90% accuracy, and all subjects need 11 sequences to achieve 100% accuracy. When $C$ is 0.1, 0.5, or 1, the performances have no distinct difference, achieving 90% accuracy with two sequences and 100% accuracy with eight sequences (Table 1). In addition, when $C$ is 0.01, the accuracy reached the goal of 100% with 2 $\pm$ 0.6 (mean standard deviation) sequences. When $C$ is 0.05, 0.1, 0.5, and 1, respectively 2.5 $\pm$ 1.2 sequences, 2.3 $\pm$ 1 sequences, 2.4 $\pm$ 1.1 sequences, and 2.4 $\pm$ 1.1 sequences are needed to achieve the 100% classification accuracy (Table 2). Integrating the results, it is found that $C$ as 0.01 can ensure the highest classification accuracy and the minimum number of sequences to achieve 100%.

5.2 Training set extension

The average accuracies of the 17 subjects increased with the number of sequences (Table 3). A repeated-measures ANOVA on accuracy of classification with the within-subjects factors of sequences (2–15 sequences, 14 levels) and training set (extended and non-extended) showed that the classification accuracy using the extended training set for training is significantly higher than that of the non-extended training set [F (1, 16) $=$ 35.47, $p<$ 0.001]. When the extended training set is used for training, only four sequences are needed to reach 85.5% accuracy, and 11 sequences for 100%. When the non-extended training set is used for training, the accuracy achieved is 65.2% with four sequences, 92% accuracy with 11 sequences. Pairwise comparisons analysis showed that both the accuracy with four sequences ( $p<$ 0.001) and that with 11 sequences ( $p<$ 0.001) by the non-extended training set are much less than the accuracies achieved by the extended training set.

In addition, the accuracy of each subject also improved with increasing number of sequences (Fig. 6). When trained by the extended training set, all subjects achieved 100% accuracy after 15 sequences, compared to only five subjects (subjects 4, 6, 7, 9, and 16) achieving 100% after 15 sequences with the non-extended training set. Among the 17 subjects, the results for 16 subjects show that being trained by the extended training set results in higher classification accuracy than being trained by the non-extended training set. Further, the fewer sequences used for classification, the greater the superiority that can be achieved. In addition, as the number of sequences increases, training using the extended training set results in greater increase in accuracy than training by the non-extended training set.

Therefore, our results prove that training using the extended training set produces higher performance than the non-extended training set. The training set extension has a large advantage in the case of a small training set.

6. Conclusion

This study proposed a small training set extension method that reduces the training set collection time for SVM ensemble based on a BCI P300-speller with the familiar face paradigm. The offline results verify that the extended training set indeed leads to better performance by the BCI P300-speller compared to the traditional non-extended training set, thereby enhancing its practicality. In future work, this training set extension method will be applied to an online P300-speller and to practical BCI applications.

Footnotes

Acknowledgments

The study was financially supported by the National Natural Science Foundation of China (61773076) and the Science and Technology Development Program of Jilin Provincial Science and Technology Department in China (20180519012JH). The authors would like to thank all the subjects who participated in the experiments and the anonymous reviewers for their valuable comments.

Conflict of interest

The authors declare that they have no conflict of interest.

References

Birbaumer

Ghanayim

Hinterberger

Iversen

Kotchoubey

Kubler

, et al. A spelling device for the paralysed. Nature 1999; 398(6725): 297-8.

Wolpaw

Birbaumer

McFarland

Pfurtscheller

Vaughan

. Brain-computer interfaces for communication and control. Clinical Neurophysiology: Official Journal Of the International Federation of Clinical Neurophysiology 2002; 113(6): 767-91.

Allison

Wolpaw

. Brain-computer interface systems: Progress and prospects. Expert Review of Medical Devices 2007; 4(4): 463-74.

Kubler

Kotchoubey

Kaiser

Wolpaw

Birbaumer

. Brain-computer communication: Unlocking the locked in. Psychological Bulletin 2001; 127(3): 358-75.

Kubler

Neumann

. Brain-computer interfaces – the key for the conscious brain locked into a paralyzed body. Progress in Brain Research 2005; 150: 513-25.

Nijboer

Sellers

Mellinger

Jordan

Matuz

Furdea

, et al. A P300-based brain-computer interface for people with amyotrophic lateral sclerosis. Clinical Neurophysiology: Official Journal Of the International Federation of Clinical Neurophysiology 2008; 119(8): 1909-16.

Allison

McFarland

Schalk

Zheng

Jackson

Wolpaw

. Towards an independent brain-computer interface using steady state visual evoked potentials. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology 2008; 119(2): 399-408.

Bernat

Shevrin

Snodgrass

. Subliminal visual oddball stimuli evoke a P300 component. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology 2001; 112(1): 159-71.

Sellers

Krusienski

McFarland

Vaughan

Wolpaw

. A P300 event-related potential brain-computer interface (BCI): The effects of matrix size and inter stimulus interval on performance. Biological Psychology 2006; 73(3): 242-52.

10.

Farwell

Donchin

. Talking off the top of your head: Toward a mental prosthesis utilizing event-related brain potentials. Electroencephalography and Clinical Neurophysiology 1988; 70(6): 510-23.

11.

Kaufmann

Schulz

Grunzinger

Kubler

. Flashing characters with famous faces improves ERP-based brain-computer interface performance. Journal of Neural Engineering 2011; 8(5): 056016.

12.

Eimer

. Event-related brain potentials distinguish processing stages involved in face perception and recognition. Clinical Neurophysiology: Official Journal of the International Federation of Clinical Neurophysiology 2000; 111(4): 694-705.

13.

Zhang

Zhao

Jin

Wang

Cichocki

. A novel BCI based on ERP components sensitive to configural processing of human faces. Journal of Neural Engineering 2012; 9(2): 026018.

14.

Krusienski

Sellers

Cabestaing

Bayoudh

McFarland

Vaughan

, et al. A comparison of classification techniques for the P300 Speller. Journal of Neural Engineering 2006; 3(4): 299-305.

15.

Krusienski

Sellers

McFarland

Vaughan

Wolpaw

. Toward enhanced P300 speller performance. Journal of Neuroscience Methods 2008; 167(1): 15-21.

16.

Fazel-Rezai

. Brain-computer interface systems –recent progress and future prospects Preface. InTech 2013; Vii-Vii.

17.

Farquhar

Hill

. Interactions between pre-processing and classification methods for event-related-potential classification: Best-practice guidelines for brain-computer interfacing. Neuroinformatics 2013; 11(2): 175-92.

18.

Blankertz

Lemm

Treder

Haufe

Muller

. Single-trial analysis and classification of ERP components – a tutorial. NeuroImage 2011; 56(2): 814-25.

19.

Kindermans

Tangermann

Muller

Schrauwen

. Integrating dynamic stopping, transfer learning and language models in an adaptive zero-training ERP speller. Journal of Neural Engineering 2014; 11(3): 035005.

20.

Kindermans

Schreuder

Schrauwen

Muller

Tangermann

. True zero-training brain-computer interfacing – an online study. PloS One 2014; 9(7): e102504.

21.

Kaper

Meinicke

Grossekathoefer

Lingner

Ritter

. BCI competition 2003 – data set IIb: Support vector machines for the P300 speller paradigm. IEEE Transactions on Bio-Medical Engineering 2004; 51(6): 1073-6.

22.

Kuncheva

Whitaker

. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning 2003; 51(2): 181-207.

23.

Rakotomamonjy

Guigue

. BCI competition III: dataset II – ensemble of SVMs for BCI P300 speller. IEEE Transactions on Bio-Medical engineering 2008; 55(3): 1147-54.

24.

Salvaris

Sepulveda

. Visual modifications on the P300 speller BCI paradigm. Journal of Neural Engineering 2009; 6(4): 046011.

25.

Liu

Bai

. Use of a green familiar faces paradigm improves P300-Speller brain-computer interface performance. PloS One 2015; 10(6): e0130325.

26.

Treder

Blankertz

. Covert attention and visual speller design in an ERP-based brain-computer interface. Behavioral and Brain Functions: BBF 2010; 6: 28.

27.

Semlitsch

Anderer

Schuster

Presslich

. A solution for reliable and valid reduction of ocular artifacts, applied to the P300 ERP. Psychophysiology 1986; 23(6): 695-703.

28.

Cherkassky

. Practical selection of SVM parameters and noise estimation for SVM regression. Neural Networks: The Official Journal of the International Neural Network Society 2004; 17(1): 113-26.

29.

Grandvalet

. Bagging equalizes influence. Mach Learn 2004; 55(3): 251-70.

30.

Breiman

. Bagging predictors. Machine Learning 1996; 24(2): 123-40.

31.

Boiarskaia

Boscolo

Zhu

Mahar

. Cross-validation of an equating method linking aerobic FITNESSGRAM(R) field tests. American Journal of Preventive Medicine 2011; 41(4 Suppl 2): S124-30.

32.

Schapire

. A brief introduction to boosting. IJCAI-99: Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence 1999; 1(2): 1401-6.

33.

Kim

. Forecasting time series with genetic fuzzy predictor ensemble. IEEE Transactions on Fuzzy Systems 1997; 5(4): 523-35.

34.

Kim

Pang

Kim

Bang

. Constructing support vector machine ensemble. Pattern Recognition 2003; 36(12): 2757-67.