Abstract
This paper proposes global template dynamic time warping (GTDTW) algorithm for gesture recognition with the wearable gloves. The method is applied to both isolated and continuous gesture recognition. A gesture segmentation system based on GTDTW is also proposed for continuous gesture recognition. The global template is obtained based on statistical methods. For a defined gesture, states which have a large proportion are selected as important states. They form the global template for the defined gesture. Global template more fully expresses the characteristics of the defined gesture which can improve gesture recognition rate. Global template also has a smaller length than normal template of Dynamic Time Warping (DTW) so that time consumption of GTDTW is low and gesture recognition system has a better real-time performance. Experimental evaluations on both isolated and continuous gesture recognition show the effectiveness of the proposed method. The time consumption is obviously reduced and recognition rate is improved that up to 98.8% for isolated gesture recognition. For continuous gesture recognition, the proposed method has high segmentation rate and recognition rate is up to 95.6%.
Keywords
Introduction
Gesture recognition is a hot topic in human-computer interaction. It is mainly based on image recognition or based on wearable devices [1]. Gesture recognition is divided into isolated gesture recognition and continuous gesture recognition. The methods of isolated gesture recognition usually are Dynamic Time Warping (DTW), Hidden Markov Model (HMM), neural network method and so on [2, 3]. DTW is widely used in gesture recognition to deal with the temporal data sequence. It can solve problem of matching similarity between temporal data sequence with different length. The basic process of DTW in gesture recognition is to select templates of defined gestures. Then the similarity is calculated between gesture sequence and different templates to choose highest similarity as recognition result. DTW is easy to implement but the performance is usually not ideal because of high time consumption and no-good template [4, 5]. The general way to select a DTW template is to select a sample as the template. But it is not good because there are spatial-temporal variability and noises between gestures sequence made by different persons. The general templates cannot fully express the characteristics of gesture so that recognition rate is low. The length of general templates usually equal to the length of the gesture. It is large so that time complexity of DTW is high. Some researchers improve the way to find optimal path [6] or are committed to improving the way of selecting templates [7]. They suggest that select multiple samples to find average as templates or select multiple samples as multiple templates for improving gesture recognition performance [8]. The recognition rate is improved but time consumption increases because more templates are used [9]. HMM is popular used in gesture recognition now because it is suitable for processing temporal data sequence, easy to add and modify gesture libraries [10, 11]. But it needs many samples to train model and time consumption is higher than DTW [12]. Recently, with the development of deep learning, neural network methods used for gesture recognition are growing up [13–15]. For processing temporal data sequence, the recurrent neural network (RNN) is proposed and has a great effect [16]. However, the complexity of the neural network method cannot be ignored [17]. The neural network model requires more data to ensure the recognition performance. And in general, increasing the number of data can increase the complexity of training the NN model. The number of hidden layers is also not easy to determine. There will be the problem of gradient explosion, gradient disappearance and over-fitting if the number of hidden layer is too much. Recognition performance is not ideal. Some researchers suggest that DTW combined with HMM or neural network combined with HMM to improve together [18–21]. But the weakness still cannot be completely eliminated.
For continuous gesture recognition, process of gesture segmentation is main concern. The task of gesture segmentation is to extract meaningful gesture from continuous gesture sequence [22–25]. In other words, it needs to find the start points and the end points of gestures pattern. This is considered a difficult process because gestures have segmentation ambiguities and spatial-temporal variability. Most existing methods need to first implement gesture segmentation and then implement the recognition [26]. Thus, there is an unavoidable time delay between them which is not appropriate for continuous gesture recognition. To solve this problem, some researchers have used the HMM because it can model the spatial and temporal characteristics of gestures [27]. Kim et al. use forward spotting accumulative HMM for gesture segmentation but the method needs a model of no-gesture. The no-gesture is not defined gesture. There are many kinds of no-gestures so that it is complicated to train the models of them [28]. Deng and Tsui use multi-size windows HMM to alleviate the problem of training the model of no-gesture [29]. There are also researchers who are committed to discussing gesture segmentation based on DTW [30]. Li and Greenspan analyze all candidates paths obtained based on DTW [31] and choose the best gesture candidate. Some other methods are proposed for gesture segmentation to find the sudden change points as the start points and the end points [32, 33]. For example, find energy or speed sudden change points as basis of gesture segmentation. Most of them need to assume that the start points are similar to the end points or need appropriate thresholds to define sudden change points [34–36]. But when thresholds are inappropriate or assumptions are not applicable, the recognition and segmentation performance may be influenced.
In this paper, GTDTW is proposed to improve the gesture recognition performance based on DTW. It is applied to both isolated and continuous gesture recognition. The global template can be more comprehensive expression of the characteristics of gestures which improves gesture recognition rate. The GTDTW reduce the size of the input sequence to feed into DTW so that time consumption of the proposed method is reduced. A gesture segmentation system based GTDTW is also proposed for continuous gesture recognition. The system implements gesture segmentation and recognition simultaneous and it has a well real-time performance. Combined with a novel gesture length threshold, the system can alleviate the influence of no-gestures.
The rest of this paper is organized as follows. The next section reviews details of the proposed method including GTDTW and gesture segmentation system. Section 3 describes experiments and results. Section 4 outlines conclusion.
Gesture recognition and GTDTW
In this paper, the GTDTW algorithm is proposed to use in gesture recognition, including isolated gesture recognition and continuous gesture recognition. The global templates show the most likely states of each defined gesture. The global templates fully express the characteristics of defined gestures which can improve gesture recognition performance. The time complexity of DTW is O (N2) where N is length of template. Global templates are much smaller than the normal DTW templates of other methods. The time consumption is reduced based on GTDWT. The real-time performance of gesture recognition is ensured based on the proposed method. In addition, the paper proposes a novel gesture segmentation system based on GTDTW combined with the length characteristics of different defined gestures. The basic idea of the proposed system is to observe the change of similarity between current gesture sequence and different templates and find the start point and the end point of gestures. The proposed method is real-time gesture segmentation which can directly output the recognition results after segmentation.
The global template
The global template can be obtained from samples for each defined gesture. Firstly, the gesture data needs to be discretized and K-means clustering algorithm is used for discretization. The clusters number defined as M and the gesture data at each moment is represented by an integer after discretization. The integer denotes the cluster that the gesture data belongs to. An illustration of the discretized gesture sequence is showed in Fig. 1. The number 1 in Fig. 1 indicates that gesture data obtained by sensors at these moments belongs to the same cluster which is the first cluster. The number 2 indicates that gesture data obtained by sensors at these moments belongs to the same cluster which is the second cluster. For all samples, all possible states of discretized gesture data are from 1 to M for integers which defined as observation state set S. S is showed as (1)

An illustration of the discretized gesture sequence.

An illustration of limited change range of index finger.

An illustration of the types of states that (a) is 4, (b) is 3 and (c) is 2.
For a defined gesture g, different states s
i
could be defined as key frame and sub-key frame according to the proportion θ
i
of s
i
in all samples. Considering that the total number of states is M, average occurrence probability for each state is 1/M. If the proportion of s
i
is bigger than 1/M, s
i
can better express the characteristics of gesture g than other states and define s
i
as key frame of gesture g. Through experimental statistics, 0.1/M is defined as the threshold of sub-key frame. In other words, define key frame threshold as θ
key
and sub-key frame threshold as θsub_key showed as (2) and (3)
For a defined gesture g, the global template is made up of key frames and sub-key frames which is defined as Y g . The length of global template is just a single digit so that the length of the input to GTDTW is shorter and the time consumption of DTW is reduced. The function of sub-key frames is to ensure distinguishing gestures when part of key frames is repeated for different defined gestures.
Isolated gesture recognition is to recognize individual gesture data, but continuous gesture recognition is to segment continuous gesture sequence and then recognize the segmented gesture data.
The similarity between gesture sequence and global templates Y
g
is defined as D
g
and calculated based on GTDTW. Define the weight of key frame as W
key
and the weight of sub-key frame as Wsub_key. They are showed as Eqs.(7) and (8)
For Continuous gesture recognition, a real-time gesture segmentation method is essential. The paper proposes a gesture segmentation system based GTDTW combined with the length characteristics of different defined gestures.
Length characteristics of gesture
For defined gesture g, its length characteristics is defined as length threshold L
g
. It represents the average length of defined gesture g and is represented as Eq.(11)
For gesture sequence input, define gesture sequence at moment t as O
t
= {o1, o2, …, o
t
} where o
i
denotes discretized gesture data at moment i and o
i
∈ S. Calculate the similarity between O
t
and global template Y
g
of defined gestures based on GTDTW. It is defined as D
gt
and showed as Eq.(12)
Find the gesture start point. The increment of D
gt
between moment t and moment (t - 1) is defined as Δ
gt
. It is showed as Eq.(13)
Find the gesture end point. After getting the start point of gesture g, the gesture end point is defined as g
end
and g
end
= t′ when Δgt′ = 0. After getting the gesture end point, continue to find the next start point so as to segment the whole gesture sequence.
The segmented gesture sequence can be got according to the gesture start and end point. The recognition process is combined with length characteristic. If the length of segmented gesture sequence meets Eq.(15), recognition result of segmented gesture sequence is gesture g. If not, recognition result of it is a no-gesture.
The gesture segmentation system proposed in this paper is forward spot continuous gesture segmentation system. It can output recognition result directly after segmentation to ensure the real-time performance. Combined with gesture length threshold, it can avoid the impact of no-gestures. Figure 4 illustrates how gesture recognition works based on GTDTW and Fig. 5 illustrates the gesture segmentation system.

How gesture recognition works based on GTDTW.

The gesture segmentation system.
Experiment setup
The tool used in the experiment was MATLAB2014 software. For capturing gesture data, a wearable gloves device was produced which was produced by Beijing Noitom Company. The wearable glove was showed in Fig. 6. One glove consisted of nine sensor nodes in the finger part, one sensor node at the hand back and one sensor node at the hand wrist. Every sensor node was a 6-axis inertial sensor STM32-6050. Using the Axis Neuron Pro software provided by Noitom Company, it could automatically get the acceleration, angular velocity and quaternion data of the corresponding sensor node which could denote the curvature degree of the fingers and the direction of the palms. According to collecting the data of hand sensor nodes as features, we could describe the hand posture. The sample frequency of the sensor was 120Hz. In the experiments, thirty-three dimensional features were obtained at each moment according to the wearable glove device. There are a total of eleven sensor nodes on a glove. Every sensor nodes can get three dimensional features at each moment. In this work, nine gestures of right hand were defined for use based on Chinese Sign Language Manual. The sign meanings of nine gestures were greeting, thanking, eighty, staying up, walking, pasting, flight, you and catching that were showed in Fig. 7.

The wearable glove.

Description of different defined gestures.
We collected samples for both training the global templates of different defined gestures and evaluating the proposed method. Each defined gesture was collected a total of 120 samples, of which 80 as training samples, 40 as test samples and each sample could be denoted as Eq.(15)
Train global template. For each defined gesture, find the key frame and the sub-key frame according to the method proposed in Chapter 2.1 to form the global template. In the experiments, the global template of each defined gesture was showed in Table 1 and the weights of key frame and sub-key frame were showed in Table 2. Train length threshold. For each defined gesture, calculate average length of the defined gesture samples as the length threshold L
g
to represent the length characteristic. In the experiments, there were 80 samples for each defined gesture and the L
g
was calculated as Eq.(17)
The global template and length threshold of each defined gesture
The weights of key frame and sub-key frame
For isolated gesture recognition, the process of calculating similarity D g between gesture sequences X and global templates Y g was showed as Fig. 8. In Fig. 8, the states 1 and 2 were the key frame and state 3 was the sub-key frame. When states 1 and 2 appeared in gesture sequences X, the similarity increased corresponding weight 0.0667. The similarity increased 0.0067 when state 3 appeared and it did not increase when the others states appeared. In the experiments, test samples were used to evaluate the performance of proposed method and experiments were repeated five times to get the average as the results for ensuring the robustness. As showed in the experiments, the length of DTW template was usually 50 to 100 and the length of GTDTW template was less than 5. The time consumption was reduced to ensure the real-time performance. The result of isolated gesture recognition rate based the proposed method was showed in Table 3. When testing samples was 40, the comparison of time consumption and recognition rate of isolated gesture recognition based on different methods was showed in Table 4. The time consumption was the time taken to recognize 20 gestures. Experimental results showed that the performance of proposed method was better to other methods.

An illustration of the process of calculating similarity.
The result of isolated gesture recognition rate based GTDTW
The comparison of time consumption and recognition rate of isolated gesture recognition based on different methods
For continuous gesture recognition, gesture segmentation was used to find gesture start and end points. In the experiments, 180 gestures were randomly made in a continuous gesture sequence with no-gestures between them. The sequence contained 20 valid gestures of each defined gesture. Part of segmentation process was shown in Fig. 9. Different colors were corresponding to different gestures.To evaluate the performance of temporal segmentation algorithms, recall rate and recognition rate were analyzed as quality criteria
Segmentation rate. The recall rate was that correct segmentation gesture number divided by total gestures number. Recognition rate. The recognition rate was that correct recognition gesture number divided by total gestures number.
The proposed method was real-time. For vector discretization by K-Means clustering, each cluster center vector was recorded in the training process. When the test gesture sequence was input, it just needs to calculate distance between data vector at each moment and each cluster center vector. The time complexity of the processing is O (M) that M is the cluster numbers. It was constant complexity. For gesture segmentation, the similarity was calculated and compared between data vector after discretized and each global template. It was also constant complexity. In general, the time complexity of the total processing was constant complexity.

An illustration of the proposed method.
The process of experiments was repeated 5 times, calculating the average recall rate and recognition rate as results. The experimental results of continuous gesture recognition based on GTDTW were showed in Table 5. The experimental results for the two criteria demonstrated the effective performance of the gesture segmentation system. The length threshold was also proven to be useful in ensuring recognition performance of continuous gesture recognition that was showed in Fig. 10. The comparison of recognition rate of continuous gesture recognition based on different methods was showed in Table 6. The time consumption of gesture segmentation processing was showed in Table 7 and it just need 0.407s to segment 180 gestures. This is a nearly real-time processing for hand-robot interaction applications.

An illustration of the proposed method.
The results of continuous gesture recognition based on GTDTW
Continuous gesture recognition rate based on different methods
The time consumption of gesture segmentation processing
In this paper, the GTDTW algorithm is proposed for gesture recognition and a gesture segmentation system based on the proposed method is proposed for continuous gesture recognition. The wearable gloves are used to collect gesture data for gesture recognition system. Theory and experimentation indicate that GTDTW based on global templates can speed up continuous gesture recognition. For isolated gesture recognition, the proposed method improves recognition rate and recognition speed of isolated word gesture recognition.
There is some directions for future work. The size of experimental data set can be increased. It can also create more data sets to verify better robustness of the proposed method. For gesture segmentation, accumulation of errors may occur. It will lead to increase error rate in the second half of the gesture segmentation process. This problem requires a better way to solve.
Footnotes
Acknowledgments
Research was supported by National Natural Science Foundation of China (61372142, U1401252) and Fundamental Research Funds for the Central Universities SCUT (2017MS062). In addition, thanks to Baiyi Zhou for providing picture of hand gestures.
