Abstract
The difficulty of sports gesture recognition is the effective cooperation of hardware and software. Moreover, there are few studies on machine learning in the capture of the details of sports athletes’ gesture recognition. Therefore, based on the learning technology, this study uses the sensor with gesture recognition algorithm to analyze the detailed motion capture of sports athletes. At the same time, this study selects inertial sensor technology as the gesture recognition hardware through comparative analysis. In addition, by analyzing the actual needs of athletes’ gesture recognition, the Kalman filter algorithm is used to solve the athlete’s posture, construct a virtual human body model, and perform sub-regional processing, so as to facilitate the effective identification of different limbs. Finally, in order to verify the validity of the algorithm model, the basketball exercise is taken as an example for experimental analysis. The research results show that the basketball gesture recognition method used in this paper is quite satisfactory.
Introduction
With the development of society and the improvement of people’s living standards, people pay more attention to their own health and sports conditions. At the same time, they also put forward higher requirements and challenges for researchers in the field of human body gesture recognition [1]. Human body gesture recognition generally refers to the classification and recognition of various movement modes and states of the human body such as walking, jumping, running, upstairsanddownstairs, sitting, and layinging by analyzing relevant human motion data [2]. According to the different ways of data source, human body motion recognition can be divided into two categories: 1. Depth cameras (such as Kinect) or image data are used to capture the intuitive data structure of the human body’s various motion poses to analyze the feasibility of the algorithm and to identify the human body posture [3]. 2. The sensor data of the human body motion posture is acquired and recognized by various smart devices (in-line sensors) worn on the human body to recognize various motion postures of the human body. Two types of typical human body posture research methods have their own advantages and disadvantages. Among them, the recognition method of capturing human motion images based on depth camera or image attitude as research data has the advantages of early start, relatively intuitive, and high recognition rate. Moreover, the disadvantage is that real-time monitoring cannot be performed by collecting data through the camera, and it is easy to invade the personal privacy of the experimenter. As various miniaturized and intelligent sensors are widely used in people’s daily life, researchers are gradually focusing on the research of human body motion recognition based on wearable devices. The sensor data of the human body motion is acquired from smart devices such as smart phones and smart watches that people wear on a daily basis, and the feasibility algorithm is designed to recognize the human body motion posture, and can also be applied to artificial intelligence, so that the robot can learn human motion data, which is of great significance to reduce the difference between the intelligent robot and human motion [4]. The method of recognizing human motion images based on depth cameras or image poses as research data generally has strict requirements on the location of use. It is mainly used to identify the movement posture of the human body in a public place, and it is necessary to avoid installing a video image acquisition instrument such as a camera in a private place. The wearable human body gesture recognition is mainly applied to the fields of personal sports health and the like. This type of method is relatively convenient and can be carried anywhere at any time to identify certain physical movements of the individual or various sports data such as the number of movement steps, energy consumed, or complete human posture [5].
In the basketball game, the player’s basketball skill level has a significant impact on the entire team. If the player’s basketball level is lacking, the team’s weaknesses are exposed, the defensive and offensive levels are greatly reduced, which is not conducive to the team’s performance in the basketball game, so it is necessary to conduct scientific and reasonable basketball training for the athletes [6]. In the traditional way of basketball training, the coaches develop training plans based on the athlete’s training and competition. This method relies on the training theory and experience of the coaches, which has certain subjectivity [7]. In addition, it is difficult to avoid wrong movements through scientific observation during the training process, and may cause damage to the athlete’s muscles, soft tissues and bones, and then affect the normal training and even shorten the athlete’s exercise life. From the perspective of training quality evaluation, the evaluation work is performed manually, and the coaches need to calculate the training results of each athlete with reference to different test standards. Therefore, this method also has some drawbacks [8]. Therefore, it is very important to recognize the posture of basketball players. Based on this, this study studies the posture recognition of basketball players based on machine learning.
Related work
At present, human body gesture recognition about three-dimensional models is widely used in human-computer interaction, intelligent security, medical, sports and many other fields [9]. Moreover, human gesture recognition has become a challenging subject for cross-disciplinary research in computer graphics, artificial intelligence, and pattern recognition. The motion gesture recognition based on 3D model in China started relatively late, but its development is very rapid, and has achieved great research results in related fields [10]. The earliest successful case of human body gesture recognition is the National Model Recognition Key Laboratory of the Institute of Automation, Chinese Academy of Sciences. The laboratory mainly starts from the static form of the human body, and comprehensively explores and analyzes the posture behavior under the intelligent video surveillance, which is a good start for future generations to further study the human body gesture recognition; For example, Ren Haibing and others at Tsinghua University use the Dynamic Bayesian Network (DBN) to classify and recognize simple daily behaviors of people [11]; Chen Fengsheng et al. developed a Taiwanese sign language system to identify 15 different pose features using the 3D neural network method [12]; With the birth of Microsoft Kinect sensor in 2010, Tan Jiapu et al. used deep image processing technology to realize the detection and gesture recognition of human fingertips [13]; Most of the research projects currently carried out in China, such as the “985” plan, the “863 Program”, and the National Natural Science Foundation project, are all funded by the government. Moreover, most colleges and universities have begun to plan self-sufficient foundation research projects, introducing a large number of 3D technologies and performing human body gesture recognition based on them [14].
In contrast, in the early years, a large number of systematic three-dimensional models of human body gesture recognition methods have emerged abroad. Although most of the methods of recognition originate from the traditional 2D image extraction features, the method principles and computer algorithms involved are not the sameFor example, the early James W. Davis [15] used the density distribution characteristics of local interest points and combined the MHI model with the MEI model to characterize the human action characteristics; LuXia [16] and others further use the hidden Markov model combined with the 3D skeletal point feature information to establish the action model through the feature histogram distribution and perform gesture recognition categorization; Philiposem [17] integrated the human body model depth feature and the skeleton point feature and then realized the human motion recognition by discriminating the motion model; Weinland [13] developed the human body model into a three-dimensional structure, and placed each frame of the obtained image in a 3D stereo model in a gradual order according to the chronological order, so that the recognition accuracy is further improved; Ellis [18] discussed the advantages and disadvantages of the difference between the accuracy of behavior recognition and time, selected key frame instances from the motion data sequence, and derived the key frame pose sequence from the framework of behavior recognition method to realize human body gesture recognition; Japanese scholar Ma [19] applied the DTW algorithm to isolated speech recognition and achieved good experimental results, but it needed to align the head and tail of the two action sequences before the experiment.
Theoretical basis of the algorithm
Gesture recognition based on extended Kalman filter
For the case where the rigid body rotates in three-dimensional space, if an accurate angular velocity value describing the rotation of the rigid body can be obtained, it is theoretically possible to calculate the precise posture of the rigid body in space. However, in the actual situation, due to the drift of the inertial sensor gyro output, the output contains an error, which will accumulate over time, and the value of the rotation angle of the rigid body deviates from the true value. Therefore, the extended Kalman filter algorithm is used to solve the attitude quaternion, which reduces the angular error of the sensor. This
paper mainly uses the strapdown inertial navigation algorithm to realize the calculation of carrier pose. Among them, the sensor node includes sensors such as a gyroscope, an accelerometer, and an electronic compass for measuring the angular velocity of the node rotation, the acceleration of the linear motion, and the magnetic field strength of the node. At the same time, this paper uses the gyroscope to measure the angular velocity of the node rotation to predict the attitude quaternion of the next cycle and uses the measured values of the accelerometer and the electronic compass to calculate the Kalman gain in the extended Kalman filter. Furthermore, the predicted quaternion of the pose is corrected, and the error generated during the iterative process of the pose is reduced, and finally the more accurate pose quaternion is obtained [20].
The first step in the extended Kalman filter algorithm is to calculate and predict the next system state, which is based on the current state and is obtained by the f (·) function. In the attitude calculation, the predicted value of the quaternion at the next moment can be obtained from the measured angular velocity value of the gyroscope and the current attitude quaternion by the quaternion kinematics equation. The quaternion represents the state of the system [21].
We assume that the pose quaternion at the kth moment is the unit quaternion Q
k
= (a
k
, b
k
, c
k
, d
k
), and it is known that the angular velocity measured by the gyroscope is (ω
x
, ω
y
, ω
z
), then the state transition equation of the system is [22]:
After the formula (1) is sorted, the formula (2) is obtained.
The expression of each item in the function f (·) can be obtained by the above method, as shown in the formula (3), which represents the relationship of various items in the system state transition process.
Equation (3) is the state transition equation of the rotation of the attitude node, and the Jacobian matrix F
k
can be obtained according to the equation of state as shown in Equation (4).
Equation (2) is known as the state transition equation of the system, in which the angular velocity is obtained by gyroscope acquisition. Assuming that the estimated value of the system state at time k is
It is assumed that the posteriori error covariance of the pose quaternion at the kth moment is p
k
. Moreover, p
k
is the diagonal matrix and p
k
will converge automatically over time. Therefore, the initial value of p
k
can be arbitrarily set according to experience, and its form is as shown in Equation (6). With the iterative operation of Kalman filter, the state quantity of the system is also converging close to the true value, so the initial state of the quaternion is generally located in the unit quaternion (1, 0, 0, 0) according to experience.
In the case where the a priori error covariance at the k-th moment is known, the a priori error covariance
In the extended Kalman filter process, the gyroscope is used to collect the angular velocity of each axis of the node rotation, and the current system state is calculated according to the system state quaternion at the previous time. There is an error in the angular velocity collected, and the angle drift phenomenon occurs after a period of time. To solve this problem, the system introduces accelerometers and electronic compasses to collect acceleration and magnetic field strength around the nodes. This paper assumes that the magnetic field strength and gravity field will not change during the rotation of the node, so the acceleration and magnetic field strength are used as reference to correct the attitude quaternion. Therefore, in the extended Kalman filter algorithm, the attitude quaternion is taken as the state quantity of the system, and the acceleration value and the magnetic field strength value are taken as the observations of the system. To achieve the transition of the system observation to the state quantity, it needs to be unified by the observation function h (·). In the process of correcting the attitude quaternion using acceleration and magnetic field strength, the acceleration and magnetic field strength in the initial state are first recorded. Then, the estimated value of the obtained quaternion is applied to the acceleration and magnetic field strength in the initial state and is converted into an estimated value of the acceleration and the magnetic field strength in the current state, and the residual value is obtained by comparing with the currently measured value. Thereafter, the pose quaternion is corrected by this residual, and the Kalman gain value K k is required before this. The following article uses acceleration as an example to illustrate [24].
In the case where the acceleration in the initial state is known, it can be converted to the acceleration in the current coordinate system by the direction cosine matrix. Among them, the direction cosine matrix can be obtained by quaternion conversion, as shown in formula (7). It is the direction cosine matrix L represented by the quaternion, where the quaternion g is assumed to be Q = (a, b, c, d).
We assume that the acceleration value measured by the system in the initial state is
From this, the expressions in the function h (·) can be obtained as shown in Equation (9).
The items of the quaternion are solved separately, and the Jacobian matrix H of the function h (·) can be obtained as shown in Equation (10).
For convenience, x, y, z in Equation (10) represents the acceleration values on the three axes of acceleration, respectively. After H is found, it can be brought in to find the Kalman gain. Similarly, the function of filtering can be realized by replacing the value of the magnetic field strength with the acceleration value, and the function of the gesture recognition accuracy can be further improved.
The unit action data is obtained by data division, which consists of acceleration and angular velocity.
The triaxial acceleration, the triaxial angular velocity, the combined acceleration and the combined angular velocity are grouped into an 8-dimensional vector, and N is used to represent the number of sampling points in each unit motion, so each dimension in the vector has N number of sampling data.
Each unit action is treated as a sample, then each sample is an N × 8 -dimensional matrix. The features of each dimension data of each sample are calculated, and the signal features extracted by the experiment include time domain features and frequency domain features. Among them, the time domain features include the mean and the variance, and μ
a
and δ2 respectively represent the mean and variance of a component of the acceleration in the unit motion, which are obtained by Equations (13) and (14). Moreover, a is a certain component of the acceleration.
The frequency domain features include the peaks of the discrete Fourier transform and their corresponding frequencies. The discrete Fourier transform method is used to convert the signal from the time domain to the frequency domain, S
DFT
(n) is used to represent the Fourier transform result of the nth sample point, and j is the imaginary unit, and the calculation method is as shown in Equation (15).
The peak value S
DFT
(K) is obtained according to the Fourier transform result. Moreover, the sampling point corresponding to the peak of the Fourier transform is K, and the frequency f corresponding to the Fourier transform is obtained by the formula (16). Among them, f
s
represents the sampling frequency of the sensor.
Through feature calculation, the characteristics of each dimension data in the time domain and frequency domain can be obtained, and a 32-dimensional feature vector is constructed, as shown in Tables 1 and 2.
Data time domain characteristics
Data time domain characteristics
The attributes contained in the feature vector are complex. In order to eliminate the irrelevant and redundant attribute values in the feature vector, feature selection is required for the feature vector. In the attribute screening, the first priority search algorithm and principal component analysis method are adopted in this paper. Moreover, the feature selection realizes the dimensionality reduction of the feature vector, which reduces the complexity of the classification calculation process and improves the working efficiency of the system. In this experiment, the sensor nodes are respectively fixed on the lower leg and the lower arm of the subject. For the different placement positions of the nodes, this paper divides the data set of each action into the upper limb motion data set and the lower limb motion data set. Moreover, a classifier is constructed separately for different sample sets. At the same time, this paper realizes the specific division of the movements of the upper and lower limbs, and the combination of the results of the upper and lower limb movements can obtain the basketball posture of the current subject. In the construction of the classification model, based on the previous research basis, this paper uses four commonly used classification algorithms: C4.5 decision tree, support vector machine, Bayesian network and back-transfer artificial neural network. Moreover, this paper compares and analyzes the output results of these four classification methods, and thus obtains the best classification method in accordance with the experimental environment of this paper. The motion pose recognition model constructed in this study is shown in Fig. 1.

Motion gesture recognition model.
In order to achieve the effect of human body attitude tracking, the upper computer software is always in the process of continuous circulation and uses real-time data to display the posture until the system stops. In order to better encapsulate the data, we can use a class to store and manipulate the data, and from the receiving data to the human body display can be done in a class. The member variables in the class include the received quaternion, processed data, etc., and the member functions should include data reception, data processing, and data display functions. Finally, calling the member function of the corresponding class in the main function can complete the function of the upper machine gesture display.
Then, we use the 3d Max tool to draw a specific mannequin and bind the bones. The 3d Max tool is used to convert model files in max format to obj format. The obj file consists of a single text, and each line of text begins with a keyword (Keyword) to indicate the type of data in the line. By parsing the obj file, we can get the geometry vertex coordinates, texture coordinates, and vertex indices of the entire model.
In order to achieve animation effects, this article refers to the Skeletal animation method to bind geometric vertices to bones and generates animation by controlling the translation and rotation of these bones. Specifically, each bone has a weight factor for all vertices, ranging from 0 to 1, and 0 means that the bone is irrelevant to the vertex. At the same time, a vertex may be driven by multiple bones, but the weight of each bone connecting the vertex should be 1. The bone information is not included in the obj file, we need to use 3d Max to export the set weight data as text.
The unitized quaternion is transformed into a rotation matrix supported by Open GL, which is applied to the bone of the corresponding part of the model to realize the rotation of the bone. The bones are interconnected, and the bones at the parent node drive the bones at the child nodes. That is, the rotation of the parent node causes the child node to be displaced, thereby completing the display of the human body posture.
In this paper, the Kalman filter algorithm is used to solve the human body quaternion, in order to filter out the systematic error caused by the gyroscope’s angular drift. Therefore, the sensor’s attitude calculation accuracy needs to be determined. In the case of determining the rotation angle, the accuracy of the attitude calculation is determined by comparing the rotation angle of the sensor node obtained by Kalman filter with the actual angle.
The experiment starts from 0 degrees, and continuously rotates the sensor node at the end of the 1080 degrees angle, and performs data sampling every 90 degrees. The experiment is divided into two groups. In order to compare the compensation effect of Kalman filter on the angle calculation, the first group of experiments does not adopt any compensation method, and the angle of the sensor rotation is directly obtained by the angular velocity integral. The second group of experiments uses the extended Kalman filter method to compensate the angular output of the sensor node.
Comparison of error data before and after compensation are shown in Table 3 and Fig. 3. After rotating 1080 degrees, the angle calculation error without any compensation method has reached 10.5 degrees, while the angle calculation error compensated by Kalman filter is only 0.37 degrees. It can be seen that the extended Kalman filter algorithm can effectively compensate the error caused by the angular velocity integral and improve the accuracy of the human body posture calculation.
Error table before and after three-axis output angle compensation
Error table before and after three-axis output angle compensation

Human body recognition skeleton model.

Error diagram before and after three-axis output angle compensation.
In the data acquisition process, according to the different placement positions of the sensor nodes in the body, the collected upper limb movements and lower limb movement data are separately discussed and identified. Therefore, for the upper and lower limb movements, the classifier is separately constructed for recognition, and the combination of the upper and lower limb movements is used to determine the movements performed by the athlete. Moreover, this paper analyzes the classification characteristics of different classifiers and compares the classification performance of different classifiers on basketball gesture recognition. In addition, according to the motion data of different limbs, the corresponding classification algorithm is constructed for training, and the recognition effect is analyzed from two aspects of accuracy and recall rate, as shown in Tables 4 and 5. The entire experiment was carried out on the weka platform and a ten-fold cross-validation method was used.
Classification table of t different classification algorithms on upper limb movements
Classification table of t different classification algorithms on lower limb movements
It can be seen from Tables 4 and 5 and Figs. 4 and 5 that BP artificial neural network achieves better recognition effect for different limbs’ motion classification. Moreover, the average accuracy of upper limb movements reached 93.2%, the average recall rate reached 93.2%, the average accuracy of lower limb movements reached 99.2%, and the average recall rate reached 99.2%. For the four recognition algorithms, the average accuracy of lower limb movements ranged from 97% to 99.2%, while the average accuracy of upper limb movements ranged from 84.9% to 93.2%. The recognition accuracy of upper limb movements is relatively low. The reason is that the upper limb movement state of the in-situ dribble, the walking dribble and the running dribble are all dribbling states, and the three dribbling characteristics are similar, and the distinction is difficult. As shown in Table 6, the upper limb movements of the in-situ dribble, the walking dribble and the running dribble are regarded as a state of motion, and the average recognition rate is up to 99%, and the average recall rate is up to 99%.

Classification image of t different classification algorithms on upper limb movements
.

Classification image of t different classification algorithms on lower limb movements.

Classification image of different classification algorithms on the combined upper limb movements.
Classification table of different classification algorithms on the combined upper limb movements
Experiments show that BP artificial neural network has the best recognition effect on upper and lower limb movements. BP artificial neural network is used to construct the classifiers of upper limb movements and lower limb movements respectively, and the recognition of basketball posture is completed, as shown in Fig. 6.3. Among them, the abscissa indicates the type of action, the ordinate indicates the accuracy, and the recognition accuracy of the recognition method for each basketball action exceeds 95%, and the average accuracy reaches 98.85%.
This paper introduces the basic recognition method of human body movement posture and attaches the sensor to the key parts of the human body to detect the body movement information. In basketball gesture recognition, the body and arm movement information of the human body is mainly collected, so the sensor is attached to the arm and the calf of the subject respectively. In the process of acquiring limb signal data, due to the drift of the gyro sensor, the sensor signal needs to be filtered. At the same time, the principle of Kalman filter algorithm is introduced in detail, and the application of Kalman filter in strapdown inertial navigation is introduced in detail. The experimental results show that Kalman filtering can complete the data fusion of the sensor, reduce the interference of the noise signal in the attitude calculation, and improve the accuracy of the attitude calculation. The basketball actions identified in this article include walking, running, jumping, and standing, dribbling, walking dribbling, shooting, shooting, passing, and catching. Moreover, this paper analyzes the characteristics of basketball action, and proposes a two-stage method of dividing basketball action data according to the characteristics of basketball action, and finally extracts the unit action to analyze each basketball action. Furthermore, this paper extracts the feature data of 32 kinds of basketball actions in feature extraction. Finally, this paper constructs a classifier for the upper and lower limb motion samples and obtains the most suitable classifier by evaluating the recognition performance of the four commonly used classification algorithms. The research results show that the basketball gesture recognition method used in this paper is quite satisfactory.
Conclusion
This study is based on machine learning to study the gesture recognition of basketball players. Moreover, this study combines feature recognition for analysis, and through feature calculation, the characteristics of each dimension data in the time domain and frequency domain can be obtained, thus constructing a 32-dimensional feature vector. The attributes contained in the feature vector are complex. To eliminate the irrelevant and redundant attribute values in the feature vector, feature selection is needed for the feature vector. In the attribute screening, this paper adopts the first priority search algorithm and principal component analysis method, and the feature selection realizes the dimensionality reduction of the feature vector, which reduces the complexity of the classification calculation process and improves the working efficiency of the system. In this experiment, the sensor nodes are respectively fixed on the calf and the arm of the subject to detect the behavior information of different limbs. Moreover, for the different placement positions of the nodes, this paper divides the data set of each action into the upper limb motion data set and the lower limb motion data set. At the same time, the classifier is constructed separately for different sample sets, and the specific division of the upper and lower limb unit movements is realized in this paper, and the combination of the results of the upper and lower limb movements can obtain the basketball movement posture of the current subject. Furthermore, in order to achieve the effect of human body attitude tracking, the upper computer software is always in the process of continuous circulation and uses real-time data to display the posture until the system stops. Finally, this paper uses the Kalman filter algorithm to solve the human body quaternion, in order to filter out the systematic error caused by the gyroscope’s angular drift. The experimental results show that the model proposed in this study performs well.
