A neural decoding strategy based on convolutional neural network

Abstract

Neural decoding is a technology to analyze intentions produced by neural activities, which has important applications in military, medical, entertainment and so on. As a typical application, decoding electromyogram (EMG) signals into corresponding gestures is an important content. In order to improve the accuracy of EMG signals recognition, researchers often extract effective features from EMG signals and classify gestures by constructing a reasonable classifier. However, because of the stochasticity of the signals, this method is not robust enough. This paper proposes a convolutional neural network (CNN) based on feature fusion, which can automatically learn and classify features from time-domain(TD) and frequency-domain(FD). To make full use of information, two fusion methods are used and compared. Experiments show that the proposed fusion methods are superior to the traditional algorithm for both normal people and amputees, and have better performance compared with CNN method using only one kind of information.

Keywords

Convolutional neural network (CNN)gestures recognition neural decoding surface electromyogram (sEMG)

1 Introduction

Recently, human-computer interaction (HCI) plays an increasingly important role in human life. As a key point of HCI, neural decoding is a technology interpreting recorded signals from neural population into a specific instruction [1, 2]. It is of great value in many aspects of life, and helps researchers understand neural activity.

As a signal with strong stochasticity, electromyogram (EMG) signals can be inferred by extracting effective features and constructing an appropriate classifier. Time-domain (TD) features, frequency-domain (FD) features and time frequency features are common methods for feature extraction [3 –8]. Jiang et al. [9] analyzed 8 different hand gestures with 2 different forces by zero crossing, slope sign changes, waveform length and mean absolute value in TD. However, these features are not robust enough for non-stationary signals. Hence, researchers pay attention to FD analysis methods and time-frequency domain analysis methods. Abdelouahad et al. [10] took median frequency, mean power frequency, and some TD features to seek the relationship between surface electromyogram (sEMG) signals and muscle contraction. Duan et al. [11] extracted time-frequency information with discrete wavelet transform and utilized wavelet neural network (WNN) algorithm to improve the performance of sEMG signal recognition. Further, they extended the work with WNN ensemble to analyze signals changing over time [12]. Khezri et al. [13] investigated an adoptive neuro-fuzzy inference system to analyze zero crossing in each level of the wavelet, and classified six gestures.

Besides feature extraction, many researchers focus on finding proper classifiers. Subasi et al. [14] optimized SVM with Particle Swarm Optimization (PSO) algorithm to classify EMG signals into normal, neurogenic and myopathic. Al-timemy et al. [15] concentrated on reducing training periods according to the classification score, conducting linear discriminant analysis for short latency.

Though these traditional methods based on TD and FD features are effective, they are usually used in the situation where gestures are not so many. Due to the difficulty in distinguishing handcraft features, the performance will drop with increasing number of gestures. This inspires researchers whether a feature learning method can be used for neural decoding.

With the progress of deep learning, convolutional neural network (CNN) has been widely used in various fields [16 –19]. Instead of a feature projection before classification, CNN can learn a more proper feature set, which obtains higher recognition accuracy through end-to-end learning methods [20 –22]. Considering these advantages, researchers use CNN for EMG analysis. Geng et al. [23] analyzed patterns inside the instantaneous values of high-density sEMG with a CNN. Du et al. [24] extended this work by employing a Siamese architecture for sEMG-based gesture recognition, and improved the recognition accuracy by training two auxiliary tasks. Wei et al. [25, 26] explored several multi-stream frameworks by taking different sEMG electrode or feature set combinations as input and fused streams with fully connected (FC) layers. Hu et al. [27] investigated a method to extract inter-channel information of sEMG signals, then proposed a structure associating CNN with attention mechanism and recurrent neural network to boost the performance, which reached state-of-the-art results in many datasets. Zhai et al. [28] took principle component analysis (PCA) on spectrogram to form a feature matrix, and then demonstrated a CNN for datasets from both intact subjects and amputees.

The sEMG signals in multi-electrodes represent how channel information changes through space and time. In order to solve the similar problems in action and expression classification, Refs. [29, 30] used CNN to extract and classify temporal and spatial information respectively, and fuse them in the softmax layer. Inspired by the scheme, this paper first proposes two kinds of methods to extract relevant information in both TD and FD. Rather than a low-pass filter with cut-off frequency of 1 Hz for data preprocessing in Ref. [23], the information in high frequency band is also concerned as supplementary information in the strategy. Then, two types of information are fused in the feature layer and softmax layer respectively to improve the accuracy.

This paper is organized as follows. Section 2 briefly describes materials information and the CNN model. Section 3 introduces the preprocessing methods and classifiers on different data information. Section 4 shows the effectiveness of two proposed fusion methods. Section 5 discusses the experimental phenomena. Finally, conclusions and future work are given in Section 6.

2 Materials information and CNN model

2.1 Materials information

This work uses Datasets from Ninapro project. Seven publicly accessible datasets are published in this project that many researchers take them as benchmark databases [31 –33]. As the statement in Ref. [28], EMG signals in FD mainly range in 0-700 Hz. According to the sampling theorem, to obtain complete FD information, sampling frequency should be higher than twice the maximum frequency of EMG signals. So two datasets named DB2 and DB3 are selected for experiments. Both of them are at a sampling frequency of 2000 Hz, and represent intact subjects and amputees respectively [34, 35].

In DB2 dataset, a total of 12 electrodes are used for 40 intact subjects. Including eight electrodes equally spaced around the forearm, two were placed on the main activity spots of the flexor digitorum superficialis and extensor digitorum superficialis, and two were on the main activity spots of the biceps brachii and of the triceps brachii (see Fig. 1). During the collection, subjects were asked to seat at a desk to make arm comfortable. For visual guidance, a laptop showing performed gestures is placed in front of subjects. There were totally 50 groups of gestures recorded in the dataset, each group of gestures was repeated 6 trials and lasted for 5 seconds. Between each two trials, a period of 3 seconds was arranged for rest in case of muscle fatigue. Specific information of DB2 and DB3 is shown in Table 1.

Fig. 1

The placement of electrodes [34].

Table 1

Subjects summary in DB2 and DB3

Dataset	DB2	DB3
Intact subjects	40	0
Trans-radial amputate subjects	0	11
sEMG electrodes number	12	12
Gestures number	50	50
Sample frequency (Hz)	2000	2000

The DB3 dataset is similar to the DB2 except that signals were collected from 11 amputees, which requires robustness of classifiers. Specific amputees’ information is shown in Table 2. Because there exist missing performed movements in subjects 1, 3 and missing electrodes information in subjects 6, 7, datasets from rest subjects are conducted in experiments.

Table 2

Subjects information in DB3

Subject	Handedness	Amputated hand(s)	Performed Movements (%)	Remaining Forearm	Electrode Used
1	Right	Right	39	50	12
2	Right	Left	50	70	12
3	Right	Right	49	30	12
4	Right	Left&Right	50	40	12
5	Left	Left	50	90	12
6	Right	Left	50	40	10
7	Right	Right	50	0	10
8	Right	Right	50	50	12
9	Right	Right	50	90	12
10	Right	Right	43	50	12
11	Right	Right	50	90	12

As most works do, a 49-classes intra-subject classification is performed in all the experiment progress (rest gesture excepted). Following the protocol in Refs. [26–28 , 33], the second and fifth trials of each gesture are used for testing, while the rest are for training. The accuracy of subject i is calculated as follows:

${Acc}_{i} = \frac{1}{N} \sum_{j = 1}^{N} {[\frac{correct segments number}{total segments number}]}_{j}$ (1) where j donates the index of gestures, and N is the total number of gestures.

2.2 CNN model

The common CNN architecture consists of convolutional layers, max pooling layers and FC layers.

The convolutional layers extracting features with kernels are represented by k. $x_{j}^{l} = f (x_{j}^{l - 1} * k_{cj}^{l} + b_{cj}^{l})$ (2) where * denotes the convolutional operation, $x_{j}^{l}$ and $x_{j}^{l - 1}$ are the jth neuron output of the lth layer and the neuron connecting in the previous layer, respectively. $k_{cj}^{l}$ and $b_{cj}^{l}$ are the convolutional kernel (filter) and bias in the lth layer for the jth output map. f is the non-linear activation function called ReLU.

By calculating maximum on part of features, the pooling function can reduce model size, increase computation speed and improve the translation invariance of features.

As for FC layers, classification can be finished by the weight matrices and bias vectors. $y^{l} = W^{l} y^{l - 1} + b^{l}$ (3) where y^l and y^l-1 are the outputs of the lth and l - 1th layers, W^l and b^l is the weight matrix and bias vector of lth layer, respectively.

Then the probability distribution of each class is calculated by inputting the final output score y_o into the softmax function: $softmax (y_{om}) = p (\hat{y} = m | x) = \frac{exp (y_{om})}{\sum_{i} exp (y_{oi})}$ (4) where m and x are the predicted class index and input respectively, and the final predicted label is: $\hat{y} = \underset{m}{argmax} softmax (y_{om})$ (5)

3 Preprocessing and classification

Since DB2 collects the sEMG signals from normal people, it has more obvious patterns relative to the DB3 from amputees. At the same time, considering computational complexity, the first 10 subjects of DB2 are chosen as a subset to determine the parameters and structure of CNN. Although this cannot obtain a global optimal network structure, the results can provide guidance for network design while balancing the computational cost and performance. According to Refs. [36 –38], a delay less than 300 ms can make subjects imperceptible in real life. So EMG signal is divided into segments with length of 200 ms (400 data points). To make sufficient number of training data, the data is double augmented with an overlap of 100 ms. For gestures classification based on sEMG signals, when gestures are executed, different parts of muscle will produce different degrees of contraction. As is discussed in Ref. [10], there is a positive correlation between muscle contraction and the activity intensity of sEMG signals detected from various electrode placements. In this paper, sEMG signals are preprocessed in a proper way to evaluate the activity intensity in TD and FD respectively.

3.1 TD method

For TD information, the amplitude of sEMG signals directly reflects the signal intensity. However, due to the stochasticity of the signal, it is common to get ideal amplitude estimation by reasonable preprocessing.

In previous works, a combination of the preprocessing methods including root mean square (RMS), low-pass filter, full-wave rectification and down-sampling in TD is leveraged to evaluate the amplitude of sEMG and eliminate noise [39]. Ref. [40] has proved full-wave rectification after low-pass filtering is a standard amplitude estimation method. Given these experiences, four methods are adopted to explore the appropriate preprocessing method.

Method1: low-pass filter, RMS.

Method2: RMS, low-pass filter.

Method3: full-wave rectification, low-pass filter, down-sampling.

Method4: full-wave rectification, down-sampling, low-pass filter.

These methods mean that the raw EMG signals are successively processed by the described preprocessing progress before segmentation. Function down-sampling turns the sample rate into 100 Hz. RMS is conducted with a window size of 20 data points, which is equal to the overlap between windows. The low-pass filter means a first-order Butterworth filter with cut-off frequency of 1 Hz. To be noticed, Method4 is the similar preprocessing methods mentioned in Ref. [27], and Method1 is the way used in Ninapro DB1 dataset.

In Ref. [23], Geng et al. proved an effective preprocessing method by using low-pass filtering on DB1 dataset, which has been processed by the acquisition device as stated above. Inspired by the method, there is a low-pass filter behind all four methods for comparison. In this situation, parts of methods 2 and 4 may be replaced by a higher order filter, but this is not the focus of this work. In addition, as DB2 collects raw signal, the magnitude value is so small that CNN parameters must be adjusted carefully during experiments. To solve the problem, data in DB2 is digitally amplified before four methods for comparison, and the magnification is 1000.

In summary, four preprocessing methods are proposed. In each preprocessing method, whether to use the digital amplification and whether to conduct low-pass filter behind are discussed. The dimension of preprocessed data is 12 * 20 * 1 (channel number * height * width).

Referring to Ref. [25], a multi-stream CNN structure which consists of 1D convolutional layers in TD is shown in Fig. 2. Preprocessed information in each electrode channel is used as one stream input of the network. Each stream consists three kinds of layers, batch normalization (BN) layers (yellow), convolutional layers (blue) and locally connected layers (green). A Relu non-linearity function is chosen as the activation function. The number of kernels in the first three layers are 64 while the last layer has only 32 kernels. All the kernels share the same size of 1*3. TD feature network constituted by 12 streams will converge to three FC layers (red), in which the nodes are separately 1024, 512 and the number of gestures. The dropout in the first two FC layers is 0.5.

Fig. 2

TD network structure.

Table 3 gives the recognition results of different preprocessing methods. In all situations, method2 with both amplification and filter makes the best result, so it is chosen as the preprocessing method in latter experiments. Compared with preprocessing on the raw signal, amplification can effectively increase the sensitivity of the neural network (about 3% increment in accuracy). Besides, conducting filters can better eliminate noise disturbance (83.71% VS 84.36% in the proposed method). At the same time, because of an additional smooth function, RMS performs better in signal smoothing and amplitude evaluation than the combination of rectification and down-sampling.

Table 3

Recognition accuracy in different preprocessing methods (%)

Amplification	filter	Method1	Method2	Method3	Method4
No	No	60.38	80.28	73.32	74.20
Yes	No	60.39	83.71	76.70	77.03
No	Yes	65.44	81.90	80.24	75.87
Yes	Yes	73.94	84.36	83.35	78.20

Suppose the preprocessed dataset with N samples in TD is $T_{t} = {s_{ti}, y_{i}}_{i = 1}^{N}$ , with Equation (2), the TD feature network can be expressed by: $x_{t} = f_{t} (s_{t}; k_{tc}, b_{tc}, γ_{t}, β_{t})$ (6) where γ_t and β_t are the sets of scales and shifts in the BN layers, the index t donates TD method. Let the parameters be expressed as θ_t = {k_tc, b_tc, γ_t, β_t}, and the sets of weight matrices and bias vectors in FC layers are W_t and b_t, the output score in TD is: $y_{to} = f (x_{t}, W_{t}, b_{t})$ (7) where f is a cascade function of Equation (3). The loss function in TD is: $L_{t} (y_{to}, y) = - log (p ({\hat{y}}_{t} = y) | x_{t}, W_{t}, b_{t}))$ (8) where ${\hat{y}}_{t}$ is the predicted label calculated by Equations (4) and (5).

3.2 FD method

For FD information, spectrogram is one of the information that directly reflects the signal intensity. However, there exists redundant information in the conversion process. So it is necessary to filter out the useless information in the spectrums.

According to Ref. [28], spectrogram is calculated by short-time Fourier transform (STFT) with a Hamming window sized 256 data points and an overlap of 184 data points. Then L1 normalization and PCA are adopted for dimension reduction. To eliminate redundant information, the first 25 principal components are selected to form a 5 * 5 feature matrix. Then the dimension of input will be 12 * 5 * 5 (channel number * height * width).

The FD network is shown in Fig. 3. In this method, slow fusion architecture is employed, which has shown strong performance in video recognition [41]. Different from video classification, each sub-stream represents an electrode channel rather than a video frame. It cannot merge streams at a distance of few frames apart as Ref. [41] does. So the fusion methods are still valuable to be discussed in EMG signals. As seen in Fig. 3, the information of the electrode channel is fused by slow fusion, early fusion and late fusion, respectively.

Fig. 3

FD network structure.

All fusion methods include only convolutional layers (blue) and the FC layers (red). A Relu activation function is followed by each convolutional layer. For sake of comparison, the kernel number of convolutional layers in all fusion methods is a structure of 200, 200, 512, and the node numbers in FC layers are separately 800, 1024 and the number of gestures. Besides, all convolutional layers share the same filter size of 3 * 3. The dropout is equal to 0.5. The early fusion method combines information across all the electrodes. The late fusion first extracts features in each electrode channel, and then fuses them in FC layers. The slow fusion method combines each three channels as a group to input the network, then fuses them progressively to get valuable information.

In Ref. [42], the author used the idea of slow fusion in the classification problem of EMG signals. Different from that, the network in this paper does not contain any pooling layers because of the small size of preprocessed data sample. Besides, BN layers are not used as they are with no help to the result (about 1% drop).

Similarly, donating the dataset with N samples in FD as $T_{f} = {s_{fi}, y_{i}}_{i = 1}^{N}$ , let the parameters be expressed as θ_f = {k_fc, b_fc}, and the sets of weight matrices and bias vectors are W_f and b_f, where the index f donates corresponding meanings of FD.

The extracted features and the final output score in FD are separately: $x_{f} = f_{f} (s_{f}; k_{fc}, b_{fc})$ (9) $y_{fo} = f (x_{f}, W_{f}, b_{f})$ (10)

The loss function is calculated by: $L_{f} (y_{fo}, y) = - log (p ({\hat{y}}_{f} = y) | X_{f}, W_{f}, b_{f}))$ (11) where ${\hat{y}}_{f}$ is the predicted label in FD.

For a classifier, accuracy, precision, recall and F-score are the common methods to measure its performance [43]. The precision and recall are conducted as follows: $Precision = \frac{1}{N} \sum_{i = 1}^{N} \frac{{TP}_{i}}{{TP}_{i} + {FP}_{i}}$ (12) $Recall = \frac{1}{N} \sum_{i = 1}^{N} \frac{{TP}_{i}}{{TP}_{i} + {FN}_{i}}$ (13) where N is the number of gestures, TP_i, FP_i and FN_i refer to true positive, false positive and false negative respectively. The F-score donated as F_β is calculated by: $F_{β} = \frac{(1 + β^{2}) \times P \times R}{(β^{2} \times P) + R}$ (14) where β is equal to 1, and P, R represent precision and recall, respectively.

Results of different fusion methods are illustrated in Table 4. It can be seen that because slow fusion method combines information in electrodes more effectively, the accuracy and F-score in slow fusion (0.9% and 0.54% higher in accuracy, and 0.92% and 0.56% higher in F-score) show better performance. So the features from slow-fusion method will be used for fusion in the future work.

Table 4

Results in different fusion methods (%)

Subject	Early Fusion				Late Fusion				Slow Fusion
	Acc	P	R	F_score	Acc	P	R	F_score	Acc	P	R	F_score
1	86.30	86.73	86.23	86.04	86.82	87.27	86.63	86.72	88.37	88.91	88.28	88.28
2	79.64	81.64	80.28	80.40	79.22	81.34	79.61	79.93	78.99	81.30	79.60	79.60
3	82.44	84.54	83.92	83.92	83.04	84.98	84.37	84.38	83.94	85.40	84.81	84.84
4	69.35	70.76	70.93	70.35	69.09	70.87	70.16	70.06	70.97	72.95	72.30	72.07
5	90.50	91.01	90.64	90.60	90.83	91.50	90.87	91.00	90.50	91.12	90.53	90.63
6	79.97	79.88	80.14	79.40	80.48	80.89	80.27	79.97	81.30	81.64	81.09	80.94
7	81.81	82.34	82.04	81.94	82.17	82.79	82.34	82.15	83.30	83.87	83.66	83.55
8	89.42	90.68	90.18	89.97	90.68	92.20	91.36	91.36	90.44	92.07	91.13	91.10
9	82.54	84.73	83.96	83.98	81.86	83.94	82.91	83.13	82.63	84.76	83.77	83.83
10	76.35	78.56	78.59	77.89	77.70	80.13	79.86	79.40	76.90	79.55	79.14	78.81
AVG	81.83	83.09	82.69	82.45	82.19	83.59	82.84	82.81	82.73	84.16	83.43	83.37

3.3 Data augmentation

Apart from determining the models and preprocessing methods, data augmentation is also discussed. As described above, this paper mainly uses the ‘sliding window’ method for data augmentation. In order to find proper amount of data, the overlap between windows is adjusted, resulting in data doubled, tripled, quadrupled and quintupled separately.

Accuracy results are shown in Fig. 4. It can be seen that when data is tripled, the best performance is obtained in TD, and it gets highest growth in FD. As a result, the data is tripled in latter experiments. In addition, accuracy in both two methods firstly increased, and then dropped. It is a trade-off problem. Like discussion in Ref. [20], too little data cannot reasonably characterize the data space, while too much data may lead to high similarity among data samples, resulting in over-fitting.

Fig. 4

Accuracy comparison between different augmentations.

The results of all subjects obtained in TD and FD methods are shown in the Fig. 5. The black bar demonstrates accuracy in TD method and the yellow represents FD method. It is seen that there exists some complementary information between these two methods, which means in some subjects, the TD method has a higher accuracy, while in others the FD method prevails.

Fig. 5

Accuracy comparison between methods and subjects.

4 Feature fusion methods

From the discussed TD and FD methods, it is seen that they either obtain effective information via low-pass filters, or get frequency features with STFT. However, both methods lost information in sEMG signals. This produces a thinking whether they can be combined in a proper way to make performance gains.

First of all, to validate the complementarity of the two methods, a representative subject, subject1 is selected to calculate confusion matrix in both methods (both methods make high accuracy). As seen in Fig. 6, the X-axis represents the true labels index of gestures, and Y-axis represents the predicted labels index. For facilitating comparison, the difference of the confusion matrix (DCM) is made as follows (see Fig. 7).

Fig. 6

Confusion matrix of subject1 (a) TD confusion matrix (b) FD confusion matrix.

Fig. 7

Accuracy comparison between methods and subjects.

$DCM (i, j) = {TD}_{C} M (i, j) - {FD}_{C} M (i, j)$ (15) where TD_CM and FD_CM are confusion matrices in TD and FD, respectively. i and j are indexes of true gestures and predicted gestures.

The confusion matrices demonstrate the mode classifier works. For example, the number of TD_CM (i, j) means that for the dataset in TD, how many data samples are classified as index i, while the true gestures label index should be j. It is obvious that the larger sum of diagonal values (where predicted labels index is equal to true labels index), the more accurate of the classifier. Then, the difference of confusion matrix proves that for a specific subject, the two methods have different judgment norms, which means there exists complement to some extent.

Inspired by Refs. [29, 30], two methods are conducted to fuse TD and FD information. The average fusion algorithm takes softmax scores in TD and FD methods as input, and averages them as the final predict score (see Equation (16)). It is always used as a baseline of the fusion method. $y_{{ave}_{o}} = \frac{1}{2} (softmax (y_{to}) + softmax (y_{fo}))$ (16) As for feature fusion method, pre-training strategy is adopted. This method takes the preprocessed data in both two methods as input $T = {s_{fi}, s_{ti}, y_{i}}_{i = 1}^{N}$ , and then trained parameters θ_t and θ_f are used to extract features in both domains. At last, three FC layers with the sets of weight matrices W and bias vectors b are initialized for further training to recognize gestures.

Suppose the final output score is donated as: $y_{{fea}_{o}} = f (T, θ_{t}, θ_{f}, W, b)$ (17)

The loss function for back propagation is: $L (y, y_{{fea}_{o}}) = - log (p ({\hat{y}}_{{fea}_{o}} = y) | T, θ_{t}, θ_{f}, W, b))$ (18) where ${\hat{y}}_{{fea}_{o}}$ is the predicted label of feature fusion. To maintain features extracted in TD and FD, the learning rate in θ_t and θ_f is 10% of that in FC layers. Schematic diagram of the both methods is shown in Fig. 8. The units in FC layers are separately 1200, 800 and class number. The probability of dropout applied in the first two layers is equal to 0.7.

Fig. 8

Analytical procedure of fusion methods.

Table 5 compares the results with the state-of-the-art researches, which also explore the matches of data preprocessing and classifier structure. In TD and FD, our methods achieve the best results, demonstrating rationality of data preprocessing and model building. As opposed to using TD or FD information alone, the fusion method achieves a higher accuracy, proving the proposed method can integrate information efficiently. Specifically, the trained models were tested on a workstation with one NVIDIA 1080Ti GPU and one Intel W-2123 CPU.

Table 5

Comparison of recognition accuracy (%)

Methods	Descriptor FD method	Accuracy
Hu et al. [27] et al. [32]	TD CNN method	73.4
Our method	TD CNN method	80.84 ± 0.05
Zhai et al. [28]	FD CNN method	78.71
Our method	FD CNN method	78.92 ± 0.05
Atzori et al. [33]	Traditional fusion method	75.27
Hu et al. [27]	CNN fusion method	82.2
Wei et al. [26]	CNN fusion method	83.7
Our method	CNN fusion method (average)	83.50±0.03
Our method	CNN fusion method (feature)	84.09 ± 0.11

Table 6 shows the runtime of different trained models. Because the feature fusion method has less parameter than average fusion method, the runtime performs better. It is seen that the maximum runtime is 21.41 ms, which means the proposed methods are acceptable for practical application, such as myoelectric prostheses control and rehabilitation.

Table 6

Runtime of methods (ms)

Methods	Time
TD method	17.64
FD method	3.76
Fusion	21.41(average)
	18.94(feature)

Considering the practical application of rehabilitation, Zhai et al. [28] adopted 10 common gestures as a subset in DB3 dataset to verify the effectiveness of the algorithm, namely abduction of all fingers, wrist supination, wrist pronation, wrist flexion, wrist extension, small diameter grasp, index finger extension grasp, tip pinch grasp, quadpod grasp and lateral grasp. Similarly, the feature fusion method is adopted on these ten gestures. The accuracy of 76.74% verifies availability of the feature fusion method, which is 2.41% higher than FD method and 4.87% than TD method. While, it is lower than the accuracy of all gestures classification in DB2, following the same conclusion of Zhai’s work.

5 Discussion

In this paper, two methods based on TD and FD information are proposed for gestures classification. The results shown in Fig. 5 indicate that different methods have advantages on different subjects. The main reason is thought that the absolute position of the acquisition device on arm will be deviated among subjects. Besides, strength and diameter of arm might differ from subjects even though they are asked to maintain a same gesture during the acquisition process. All the factors will influence the signal fluctuation intensity. In addition, the duration of each gesture is also slightly different among subjects according to the relabeled data (for example, some gestures have only 1935 sets of data, while others have 2400 sets of data). This makes the signals of some subjects are more sensitive to TD information, while others are more sensitive to FD information.

In terms of fusion methods, it can be seen that both fusion methods demonstrate better performance than TD and FD method. This is because the complement between the two domains. Furthermore, the best effect is achieved by fusing features in feature layers, which is about 0.8% increment compared with the average fusion method. It is the reason that feature fusion leads to a more reasonable feature combination. The learned features can be selected according to the sensitivity to different subjects.

Besides, from Table 5, it is notable that CNN methods are superior to traditional methods. This verifies the assumption that the feature learning method is more adoptable compared with feature projection, which generates a proper feature set for classification. Specifically, feature learning method learns features according to the characteristics of subjects, showing the robustness of proposed method.

As for 10-classes gestures subset in DB3, the accuracy is lower than the all gestures classification in DB2. It’s probably because signals are more unstable on account of physiologically nerve injury. Notably, this might be a meaningful technology to judge degree of injury in the future.

6 Conclusions and future work

In this paper, the suitable match for data preprocessing and CNN structures is explored in TD and FD respectively. During the progress, data augmentation is also discussed for better performance. Experiments show that proposed preprocessing methods can effectively complement each other. In addition, two fusion methods are conducted to connect the information and make a better performance. Besides, experiments carried out on the DB3 dataset show great significance for neural decoding with nerve injury subjects. In the future, transition between rest gesture and other gestures will be studied for a better recognition result.

Footnotes

Acknowledgment

This work was supported by the JiangSu Provincial Key Research and Development Program of China (Grant No. BE2016757) and The Open Funding Project of National Key Laboratory of Human Factors Engineering (Grant No. 6142222190310).

References

Kim

Y.H.

, Thakor

N.V.

, Schieber

M.H.

and Kim

H.N.

, Neuron selection based on deflection coefficient maximization for the neural decoding of dexterous finger movements, IEEE Transactions on Neural Systems and Rehabilitation Engineering 23(3) (2014), 374–384.

Choi

, You

K.J.

, Thakor

N.V.

, Schieber

M.H.

and Shin

H.C.

, Single-Finger Neural Basis Information-Based Neural Decoder for Multi-Finger Movements, IEEE Transactions on Neural Systems and Rehabilitation Engineering 26(12) (2018), 2240–2248.

Phinyomark

, Phukpattaranont

and Limsakul

, Feature reduction and selection for EMG signal classification, Expert Systems with Applications 39(8) (2012), 7420–7431.

Jahromi

M.G.

, Parsaei

, Zamani

and Stashuk

D.W.

, Cross Comparison of Motor Unit Potential Features Used in EMG Signal Decomposition, IEEE Transactions on Neural Systems and Rehabilitation Engineering 26(5) (2018), 1017–1025.

Jiang

, Gao

, Liu

and Shull

P.B.

, A novel, co-located EMG-FMG-sensing wearable armband for hand gesture recognition, Sensors and Actuators A: Physical 301 (2020), 111738.

Côté-Allard

, Campbell

, Phinyomark

, Laviolette

, Gosselin

and Scheme

, Interpreting deep learning features for myoelectric control: A comparison with handcrafted features, (2019), arXiv preprint arXiv:1912.00283.

Shull

P.B.

, Jiang

, Zhu

and Zhu

, Hand Gesture Recognition and Finger Angle Estimation via Wrist-Worn Modified Barometric Pressure Sensing, IEEE Transactions on Neural Systems and Rehabilitation Engineering 27(4) (2019), 724–732.

Jiang

, Li

, Xu

, Gu

and Shull

P.B.

, Stretchable E-Skin Patch for Gesture Recognition on the Back of the Hand, IEEE Transactions on Industrial Electronics 67(1) (2019), 647–657.

Jiang

, Lv

, Guo

, Zhang

, Wang

, Sheng

and Shull

P.B.

, Feasibility of wrist-worn, real-time hand, and surface gesture recognition via sEMG and IMU Sensing, IEEE Transactions on Industrial Informatics 14(8) (2017), 3376–3385.

10.

Abdelouahad

, Belkhou

, Jbari

and Bellarbi

, Time and frequency parameters of sEMG signal—Force relationship, Proceedings of the 4th International Conference on Optimization and Applications (2018), 1–5.

11.

Duan

, Dai

, Chang

, Chen

, Zhu

and Li

, sEMG-based identification of hand motion commands using wavelet neural network combined with discrete wavelet transform, IEEE Transactions on Industrial Electronics 63(3) (2015), 1923–1934.

12.

Duan

and Dai

, Recognizing the gradual changes in sEMG characteristics based on incremental learning of wavelet neural network ensemble, IEEE Transactions on Industrial Electronics 64(5) (2016), 4276–4286.

13.

Khezri

and Jahed

, A neuro–fuzzy inference system for sEMG-based identification of hand motion commands, IEEE Transactions on Industrial Electronics 58(5) (2010), 1952–1960.

14.

Subasi

, Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders, Computers in Biology and Medicine 43(5) (2013), 576–586.

15.

Al-Timemy

, Bugmann

and Escudero

, Adaptive windowing framework for surface electromyogram-based pattern recognition system for transradial amputees, Sensors 18(8) (2018), 2402.

16.

Rubio

J.D.J.

, Ricardo Cruz

, Elias

, Ochoa

, Balcazarand

and Aguilar

, ANFIS system for classification of brain signals, Journal of Intelligent & Fuzzy Systems, Preprint: 1–9.

17.

Rubio

J.D.J.

, SOFMLS: online self-organizing fuzzy modified least-squares network, IEEE Transactions on Fuzzy Systems 17(6) (2009), 1296–1309.

18.

Giap

C.N.

, Son

L.H.

and Chiclana

, Dynamic structural neural network[J], Journal of Intelligent & Fuzzy Systems 34(4) (2018), 2479–2490.

19.

Rubio

J.D.J.

, Garcia

, Ochoa

, Elias

, Cruz

D.R.

, Balcazar

, Lopez

and Novoa

J.F.

, Unscented Kalman filter for learning of a solar dryer and a greenhouse[J], Journal of Intelligent & Fuzzy Systems, (Preprint) (2019), 1–12.

20.

Krizhevsky

, Sutskever

and Hinton

G.E.

, Imagenet classification with deep convolutional neural networks, Proceedings of the Advances in neural information processing systems (2012), 1097–1105.

21.

Simonyan

and Zisserman

, Very deep convolutional networks for large-scale image recognition, (2014), arXiv preprint arXiv:1409.1556.

22.

Szegedy

, Liu

, Jia

, Sermanet

, Scott

, Dragomir

, Dumiyru

, Vincent

and Andrew

, Going deeper with convolutions, Proceedings of the IEEE conference on computer vision and pattern recognition (2015), 1–9.

23.

Geng

, Du

, Jin

, Wei

, Hu

and Li

, Gesture recognition by instantaneous surface EMG images, Scientific Reports 6 (2016), 36571.

24.

, Wong

, Jin

, Wei

, Hu

, Kankanhalli

M.S.

and Geng

, Semi-Supervised Learning for Surface EMG-based Gesture Recognition, Proceedings of IJCAI (2017), 1624–1630.

25.

Wei

, Wong

, Du

, Hu

, Kankanhalli

and Geng

, A multi-stream convolutional neural network for sEMG-based gesture recognition in muscle-computer interface, Pattern Recognition Letters, 2017.

26.

Wei

, Dai

, Wong

, Hu

, Kankanhalli

and Geng

, Surface Electromyography-based Gesture Recognition by Multi-view Deep Learning, IEEE Transactions on Biomedical Engineering 66(10) (2019), 2964–2973.

27.

, Wong

, Wei

, Du

, Kankanhalli

and Geng

, A novel attention-based hybrid CNN-RNN architecture for sEMG-based gesture recognition, PloS One 13(10) (2018), e0206049.

28.

Zhai

, Jelfs

, Chan

R.H.

and Tin

, Self-recalibrating surface EMG pattern recognition for neuroprosthesis control based on convolutional neural network, Frontiers in Neuroscience 11 (2017), 379.

29.

Simonyan

and Zisserman

, Two-stream convolutional networks for action recognition in videos, Proceedings of the Advances in neural information processing systems (2014), 568–576.

30.

Zhang

, Huang

, Du

and Wang

, Facial expression recognition based on deep evolutional spatial-temporal networks, IEEE Transactions on Image Processing 26(9) (2017), 4193–4203.

31.

Atzori

, Gijsberts

, Heynen

, Hager

A.G.M.

, Deriaz

, Van Der Smagt

, Castellini

, Caputo

and Müller

, Building the Ninapro database: A resource for the biorobotics community, Proceedings of the 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (2012), 1258–1265.

32.

Atzori

, Gijsberts

, Kuzborskij

, Elsig

, Hager

A.G.M.

, Deriaz

, Castellini

, Müller

and Caputo

, Characterization of a benchmark database for myoelectric movement classification, M. Atzori, A. Gijsberts, C. Castellini, IEEE Transactions on Neural Systems and Rehabilitation Engineering 23(1) (2014), 73–83.

33.

Atzori

, Gijsberts

, Castellini

, Caputo

, Hager

A.G.M.

, Elsig

, Giatsidis

, Bassetto

and Müller

, Electromyography data for non-invasive naturally-controlled robotic hand prostheses, Scientific Data 1 (2014), 140053.

34.

Gijsberts

, Atzori

, Castellini

, Müller

and Caputo

, Movement error rate for evaluation of machine learning methods for sEMG-based hand movement classification, IEEE Transactions on Neural Systems and Rehabilitation Engineering 22(4) (2014), 735–744.

35.

Atzori

, Gijsberts

, Castellini

, Caputo

, Hager

A.G.M.

, Elsig

, Giatsidis

, Bassetto

and Müller

, Effect of clinical parameters on the control of myoelectric robotic prosthetic hands, Journal of Rehabilitation Research & Development 53(3) (2016), 345–358.

36.

Hudgins

, Parker

and Scott

R.N.

, A new strategy for multifunction myoelectric control, IEEE Transactions on Biomedical Engineering 40(1) (1993), 82–94.

37.

Englehart

and Hudgins

, A robust, real-time control scheme for multifunction myoelectric control, IEEE Transactions on Biomedical Engineering 50(7) (2003), 848–854.

38.

Farrell

T.R.

and Weir

R.F.

, The optimal controller delay for myoelectric prostheses, IEEE Transactions on Neural Systems and Rehabilitation Engineering 15(1) (2007), 111–118.

39.

Castellini

and Koiva

, Using surface electromyography to predict single finger forces. In 2012 4th IEEE RAS & EMBS International Conference on Biomedical Robotics and Biomechatronics (2012), 1266–1272.

40.

Scott

R.N.

and Parker

P.A.

, Myoelectric prostheses: state of the art, Journal of Medical Engineering & Technology 12(4) (1988), 143–151.

41.

Karpathy

, Toderici

, Shetty

, Leung

, Sukthankar

and Fei-Fei

, Large-scale video classification with convolutional neural networks, Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (2014), 1725–1732.

42.

Côté-Allard

, Fall

C.L.

, Drouin

, Campeau-Lecours

, Gosselin

, Glette

and Gosselin

, Deep learning for electromyographic hand gesture signal classification using transfer learning, IEEE Transactions on Neural Systems and Rehabilitation Engineering 27(4) (2019), 760–771.

43.

Yavuz

and Eyupoglu

, A cepstrum analysis-based classification method for hand movement surface EMG signals, Medical & Biological Engineering & Computing (2019), 1–23.