Abstract
Due to the difficulty of athletes’ motion recognition, there are few studies on athletes’ specific motion recognition. Based on this, this study uses the acceleration sensor as the carrier, and uses human-computer interaction to transform the action of the athlete into a machine-identifiable action unit. At the same time, this paper combines the actual situation of human body motion to construct a human body motion model and builds a corresponding computer hardware and software platform. Moreover, this paper designs a classification recognition algorithm that can recognize the movement of athletes and builds SVM model based on machine learning for classification and recognition. In addition, in this study, the effectiveness of the algorithm was studied through experimental comparison. Finally, the simulation analysis was carried out to obtain the corresponding research results, and the results were analyzed by combing statistics. The research shows that the proposed algorithm can classify and recognize the collected motion data, and it has certain effects on the theoretical analysis of athletes’ motion recognition. Moreover, the algorithm can perform motion quality analysis and provide theoretical reference for subsequent related research.
Introduction
With the rapid development of computer technology, image processing technology, communication technology and digital video technology, video surveillance technology is developing towards digitalization, intelligence, networking, high definition and integration. At the same time, target detection and tracking in intelligent video surveillance systems provide data support for further analysis in intelligent surveillance. About 80% of the information that humans receive is from the human eye. Under the influence of the human eye and the human brain, we can achieve the positioning, tracking and behavior analysis of moving targets in complex scenes. Moreover, computer vision uses a camera instead of a human visual organ as an input interface, and intelligently processes the target by detecting, tracking, and analyzing it through the powerful computing and processing capabilities of the computer [1].
Intelligent video surveillance introduces computer vision technology, pattern recognition and artificial intelligence technology based on traditional video surveillance, which is an emerging application direction and a topic of great concern in the field of computer vision. Intelligent video surveillance analysis refers to the automatic analysis and processing of video sequences captured by cameras by computer vision, image processing, video analysis and pattern recognition under the condition of no one intervention. It includes detecting, extracting, marking and tracking the target of interest in the surveillance scene, and finally analyzing and judging the action behavior of the target [2].
The current human-computer interaction technology has made great progress, and the way people adapt to computers has changed to the way that computers adapt to people. Specifically, it refers to capturing and recognizing the motion signals of the human body through sensors to understand the behavior and intention of the user, which is an important feature of artificial intelligence. Among them, pattern recognition is its key technology, and without the help of the traditional input device of any computer system, such as mouse, keyboard, etc., it can recognize the intention of human motion, which is not only the trend of computer development, but also affects people’s lifestyle [3].
The athlete’s motion recognition can not only help it improve the training level, but also promote the injury recovery of the athlete’s own organizational structure, which is of great significance to the athlete’s professional business development. Based on this, this study analyzes the athlete’s motion recognition and combines machine learning technology to improve the recognition effect.
Related work
Up to now, due to anti-terrorism and security needs, most countries and regions in the world have conducted in-depth research on intelligent video surveillance systems. Between 1996 and 1999, several well-known research institutions, such as the David SARNOFF Research Center and Carnegie Mellon University, have developed the video surveillance and monitoring system VSAM [4] under the auspices of the US Defense Research Projects Agency (DARPA). VSAM is mainly used in urban public places and on the battlefield to solve the problem of expensive manpower monitoring resources and insecure on the battlefield. In 2000, the US Defense Research Projects Agency (DARPA) sponsored the HID (Human Identification at Distance), a major project for remote human identification. It is mainly used for remote monitoring and human detection, classification and identification, thereby enhancing national defense capabilities [5]. From 1998 to 2002, the Framework 5 Program Committee of the Information Society Technology IST (Information Society Technologies) funded the French National Institute of Computer Science and Control Institute INRIA and the UK’s Reading University UR and other research institutions to develop a major video surveillance and retrieval project ADV-SOR (Annotated Digital Video for Surveillance and Optimized Retrieval) [6]. It is mainly used for the management of public transportation systems (such as subways, buses) to alleviate urban traffic pressure. Companies such as IBM and Microsoft are also beginning to apply vision-based gesture recognition in the commercial world. The University of Maryland’s real-time visual surveillance system not only detects the human body, locates the person’s position in the video, but also segments the human body part. At the same time, it can also track the multi-person by establishing the appearance model, and further monitor the simple interaction between them [7].
In 2009, Mehran et al. proposed a method for detecting abnormal behavior of groups based on social force model. The method uses particle advection to represent the motion of the crowd and creates mesh particles for each frame of the image in the video. At the same time, by using the social force model to describe the interaction between particles and the surrounding space, the method uses the intensity of force to describe the behavior of pedestrians in the video image, then constructs a local space-time cube and builds a word bag model to detect abnormal behavior in the population [8]. Kratz et al. used gradient-based spatiotemporal models to describe scene motion information and used HMM to capture the relationship between spatio-temporal models to achieve local anomaly behavior detection in video [9]. Zhao et al. used a more classical social force model in the detection of abnormal group behavior and described the intensity of movement between groups according to the size of social forces [10]. Yang Cong et al. used multi-scale optical flow histograms to detect anomalous behavior in video and to statistic the distribution of optical flow in different directions in the image block [11]. Navbcct Dalal et al. used a directional gradient histogram to detect changes in the movement of the crowd in the video [12]. In the detection of abnormal group behavior, Xiong et al. first extracted the foreground of the moving target from the video sequence, and then projected the extracted foreground to the horizontal coordinate axis and the vertical coordinate axis respectively, and then calculated the probability distribution of the foreground. Finally, the calculated population entropy was used to describe the distribution of the population [13]. Mahadevan et al. used dynamic texture methods to model the normal behavior in the population and to detect abnormal behavior in the population based on the established model [14]. Wu et al. developed a group monitoring system that only considers a small number of people and needs further improvement in real-time [15]. In 2008, Admit Adam et al. proposed a rapid method for detecting abnormal population behavior. The method first arranges an observation point on a video frame and uses an optical flow method to obtain characteristics such as a moving direction and a velocity of the observation point. Moreover, the method determines whether the video frame is abnormal by the abnormality of the obtained observation point. At the same time, the recognition rate of this method for behavior detection with certain regularity in group motion in the scene is relatively high [16]. Zhao et al. used robust tracking methods to track individuals in the monitoring environment. Their experimental results show that the algorithm can track about 20 people in the group environment. The method is based on the method proposed in the environment scene of low- and medium-density populations, and there is no particularly serious occlusion between individuals in this environment [17].
Algorithm theory based signal preprocessing
Usually, the signal collected by the sensor is subjected to more noise interference. In addition to the acceleration signal generated by the action, it also contains various noises such as gravitational acceleration. In order to ensure the accuracy of the recognition result, the extracted motion acceleration signal needs to be preprocessed.
Filtering
In the collected motion acceleration signal data, the experimental background environment and the jitter of the tester will bring certain interference to the data. The superimposed noise is mainly divided into two categories: Periodic and irregular. In order to eliminate and reduce noise interference, improve the validity of the data, and make the curve smoother, the data must be denoised and smoothed. After a lot of experiments, it is found that the human motion acceleration signal is mainly distributed in the frequency range of 0–20 Hz. Between this frequency range, it mainly includes the information of the acceleration signal and the component of the gravity acceleration. As shown in Fig. 1, the high frequency part outside 0∼20 Hz is mainly noise interference, and the amplitude of the acceleration signal is almost negligible with respect to the signal in 0∼20 Hz. Therefore, the frequency range of 0 to 20 Hz, which concentrates most of the acceleration information, should be retained without filtering the high frequency portion. Commonly used filtering methods include median filtering, five-point three-time smoothing, etc. The system uses a five-point three-smoothing method to smooth the signal to reduce the impact of glitch. The acceleration signals before and after filtering are shown in Fig. 2 and Fig. 3 [18].

Spectrogram of the acceleration signal.

Acceleration curve before filtering.

Acceleration curve after filtering.
Five-point cubic smoothing is a method of performing cubic-multiple polynomial smoothing on sampling points by least squares method.
We set the discrete data sequence X (n), to take equally spaced points of 2m + 1 numbers:
The upsampled value is:
If Z is an equally spaced sample length, then the equally spaced points of 2m + 1 numbers are:
The n polynomial is set to:
Appropriate coefficient a
j
(i = 0, 1, 2, ⋯ n), point (T, Y) is substituted into equation (1), then there are equations of 2m + 1 numbers, that is:
By the principle of least squares, the coefficients are obtained, that is, the sum of the squares of the errors R is minimized. Assuming that
Then
We set m = 2, n = 3, then we get a0, a1, a2, a3
T = 0, ± 1, ± 2, then the five-point three-time smoothing formula [19] is:
In the discrete sequence, the first two lines of the above formula are used for the first two points, the last two lines are used for the last two points, and the middle data is processed by the third line. Because most of the sequence is from point 1 to point N, the substitution formula can be obtained:
The determination of the start of the action and the of the terminal point affect the extraction of the action effective signal, and the accuracy of the decision is also related to the level of the recognition rate of the action. This paper focuses on collecting the acceleration signal of one action, and then using the decision action to always exclude the invalid data, and finally extract the effective action signal.
The energy threshold method and the residual method are used to accurately determine the start and end points of the action, that is, the energy threshold method is used to find the segment containing the start and end points of the action, and then the residual method is used to determine the start and end points of the action [20].
(1) The energy threshold method determines the segment containing the start and end points of the action
The flow chart of the energy threshold method is shown in Fig. 4. According to the data collected by the camera, through multiple image analysis, the average effective action time of the upper limb of the human body is 0.3 seconds to 0.7 seconds, and the sampling frequency is 200 Hz. Therefore, it can be known that the sequence of each action should have about 60 to 140 collection points. The 10 collection points are used as a segment, and the energy characteristics of the segment are used to determine whether the collection point segment at that moment contains the always point of the action. We assume that the segment has 10 points X (i), i = 1, 2, ⋯ , 9, 10 and the corresponding sample value is Y (i), i = 1, 2, ⋯ , 9, 10, then there is [21]:

Flow chart of the energy threshold method.
The collected signals are the 6-axis acceleration signals of the upper limbs, which are the 3-axis acceleration signals of the front, rear, left and right and up and down directions of the forearm and the 3-axis acceleration signals of the front, rear, left and right and up and down directions of the big arm. In order to accurately locate the always-on endpoint, the threshold is set to t. At the same time, we assume that there are at least 3 axes of energy in the 6-axis acceleration signal segment of S time period when the threshold reaches the threshold. If it is detected that the energy of at least 3 axes reaches the threshold value for each of the three consecutive time periods s + 1, s + 2, s + 3 after S, it can be determined that the segment contains the starting sampling point of the action. After determining the starting sampling point, it is assumed that there are at least 3 axes of energy reaching the threshold in the 6-axis acceleration signal segment of the subsequent certain period e. If it is detected that the energy of at least 3 axes reaches the threshold value for each of the three consecutive time periods e + 1, e + 2, e + 3 after e, it can be determined that the segment contains the end sampling point of the action.
(2) Regression residual method to determine the beginning and end of the action
The energy threshold method can find the motion start point segment and the action end point segment containing the valid signal but cannot accurately extract the start and end points of the action. Therefore, the energy threshold method and the regression residual method can be combined to accurately extract the motion signal. The regression residual method can be used for the point where the amplitude of the data suddenly changes greatly in the discrete sequence and can be used to find the start and end points of the action in the action signal. At present, foreign regression studies have begun to use the regression residual method to find out the segmentation of the action and achieve better results. After using the energy enthalpy method to find the segment containing the starting point of the action, the regression residual method must be used to determine the starting and ending endpoints in the segment. First, the motion acceleration signal is taken to window. The so-called windowing operation is to divide the acceleration signal into overlapping windows of the same length, as shown in Fig. 5 [22].

Windowing of the acceleration signal.
In many research work, we chose to window the motion acceleration signal with a sliding window with 50% overlap. The use of sliding window detection not only ensures that each point is detected, but also shortens the length of the acceleration signal and also normalizes the length of the motion acceleration signal generated by different users, which has a very important influence on subsequent features and recognition. In this experiment, two consecutive windows each containing 10 points are set, and each window is a segment. By sliding the window method, the linear regression equation is established with the two points in the previous paragraph, and the third point regression residual is analyzed. When the regression residual exceeds a threshold, it is determined that the second point is the starting point of the action, and the threshold is set to 25. In order to establish a regression equation for each point and predict the regression residual, it is necessary to set two consecutive windows with five overlapping points, so as to ensuring the accuracy of the experiment. For discrete data sequences X (n) [23]:
The upsampled value is:
The first two points X (i), X (i + 1) are taken to establish a linear regression equation:
Get:
Then, the regression residual of the third point is:
When point z exceeds threshold T, it can be determined that point X (i + 1) is the starting end of the motion of the axis. Similarly, the end point of the action of the axis can be found. For a 6-axis motion acceleration signal, if a regression residual of more than 3 axes at a certain point exceeds the threshold, the point can be determined as the starting end point of the action, and the end point of the action can be obtained in the same way [24].
After determining the 6-axis sample segment containing the action, due to errors and uncertainties in the experiment, the length of the sample point sequence may be inconsistent. For subsequent feature extraction and recognition processing, it is necessary to normalize the sequence length of all signal sampling points. This chapter uses linear interpolation to achieve this goal, and the 6-axis motion acceleration signal is separately adjusted into an acceleration signal with a sequence length of 130. Assuming that the corresponding sample value of the sample point of the original length n is X
n
, it is necessary to obtain length data Y
m
of sample points of m numbers. Then, the length ratio obtained first is [25–27]:
Assuming that:
For point Y
i
in other Y sequences, we take the corresponding position in X:
Taking its left integer W
a
and right integer W
b
, then the new ratio is
Corresponding to the new data, there are:
Then for the last point, there are:
Algorithm of principal component analysis
Principal Component Analysis (PCA) is an important statistical method for studying how to convert multi-indicator problems into fewer comprehensive indicators. It can transform the problem of high-dimensional space into low-dimensional space to deal with, making the problem simpler and more intuitive, and these less comprehensive indicators are not related to each other, and can provide most of the information of the original indicators. Principal component analysis not only reduces the dimensions of the multivariate data system, but also simplifies the statistical characteristics of the variable system. Principal component analysis, while optimizing the multivariate data system, can also provide many important system information, such as the position of the center of gravity of the data points (or called the average level), the maximum direction of data variation, and the spread of the group points and so on. Principal component analysis, as one of the most important multivariate statistical methods, has its place in social economy, enterprise management, geology, biochemistry and other fields, such as comprehensive evaluation, process control and diagnosis, data compression, signal processing, pattern recognition and other fields.
The steps of PCA are as follows:
Step (1): n number of samples x i = (xi1, xi2, ⋯ x ip ) T were collected from P-dimensional random vector X = (x1, x2, ⋯ x p ) T
Among them, i = 1, 2, ⋯ , n, n > p. The sample array constructed is:
Step (2): The element in X is converted
Get:
Step (3): The element in Y is converted
Among them:
Then, a standardized matrix can be obtained:
Step (4): The sample correlation coefficient matrix of the normalized matrix Z is obtained:
Among them:
Step (5): the sample correlation coefficient matrix R is solved by the equation
The eigenvalues of P numbers are available:
Step (6): From
Step (7): After solving z
i
= (zi1, zi2, ⋯ z
ip
)
T
, the main component components of the m numbers are obtained, i = 1, 2, ⋯ , n.
Then, the principal component decision matrix is
u
i
is the main component of the i-th sample, i = 1, 2, ⋯ , n. The i-th component u
ij
of u
i
is the projection of z
i
on
The Relief algorithm is a series of algorithms, first proposed by Kira. The main point of the Relief series of algorithms is to evaluate features based on their ability to distinguish close-range samples. The main idea is: Good features should make similar samples close and leave different types of samples away. The Relief series of algorithms operate efficiently, have no requirements for data types, and are insensitive to relationships between features. ReliefF is one of the most effective Filter-based feature selection methods.
The steps of the Relief algorithm are as follows:
Input: sample set D, number of iterations m, threshold (feature weight threshold), number of nearest neighbor samples k.
Output: The feature subset B consisting of features whose ownership is greater than the threshold A.
(1) Each feature weight of all samples is set to 0.
(2) for i = 1 to m do
1) Randomly select a sample X from D;
2) The neighboring R j (j = 1, 2, ⋯ , k) of k numbers that belong to the same class as X and neighboring M j (c) of the k numbers that not belong to the same class as X are found, among them c = 1, 2, ⋯ , C. M j (c) represents the j-th nearest neighbor sample in c.
3) for A = 1 to N do
3, for A = 1 to N do
A feature is added to T.
Among them, p (c) represents the probability of B category target, i.e.:
diff (A, I1, I2) represents the distance between sample I1 and sample I2 on feature A. Its calculation formula is defined as follows:
The ReliefF algorithm is highly efficient, has no requirements for data types, and the results are quite satisfactory, so it has been widely promoted.
Feature extraction method of PCA-ReliefF
PCA can effectively reduce redundant feature information, and ReliePF algorithm can effectively evaluate the quality of features. This experiment attempts to combine the two to form PCA. ReliefF feature extraction algorithm, the steps are as follows:
Step 1: Sample set D, number of iterations m, number of nearest neighbor samples kare entered for PCA analysis. The number of reserved features sis set. Among them, s is less than the feature dimension n, and sample set D′ after PCA analysis was obtained.
Step 2: The weight of each feature of all samples in D′ is set to 0, that is w i = 0, i = 1, 2, ⋯ , s.
Step 3: A sample X is randomly selected from D′.
Step 4: The reliefF algorithm is used to calculate new weights and find the neighboring R j (j = 1, 2, ⋯ , k) of k numbers that belong to the same class as X inD′ and the neighboring M j (c) of k numbers that not belong to the same class as X in D′.
Among them, c = 1, 2, ⋯ , C and c ≠ c (x) j = 1, 2, ⋯ , k. c is the number of categories of the sample, and c (x) is the category of the sample X.
Step 5: The weight of each feature of A is found
Step 6: The process from step 3 to step 5 is repeated m times, and the weight w of the feature is output. Thereafter, the correlation weights w of each feature of each sample are arranged in descending order.
The feature of the first d number with the largest output weight is composed of a feature subset. d is the number of features to be retained after the RelielT algorithm is processed and m, k are set according to the number of samples and the number of dimensions.
The feature extraction method of PCA-ReliefF can not only reduce the dimension after high-dimensional data is processed by PCA, but also update the weight of each feature by using ReliefF algorithm. At the same time, the higher the feature weight is, the more favorable the feature is to the classification. Therefore, the feature of the d number with the largest feature weight is selected to form a feature subset.
Model test analysis
In this paper, the inertial sensor hardware platform is used to collect the forehand shot in table tennis, which is a simple and common human motion signal. The experimental arrangement is as follows: In this experiment, the training model of the action signal samples of 20 experimental subjects was collected. Then, 3 people were randomly selected from 20 people and 3 actions were performed 10 times, and a total of 90 actions were used for verification and classification. The experimental environment is the indoor environment. After the collection experiment is completed, all sample data is saved in the TF card for subsequent data processing and analysis to verify the effectiveness of the algorithm.
In the test, the 6-dimensional acceleration information is collected by the inertial sensor attached to the arm, and the collection effect is shown in Fig. 6.

Acquisition signal diagram.
The sample is subjected to five-point three smoothing filtering. Because there is a preliminary simple median filtering in the hardware environment, the effect of this filtering is not very obvious.
Through the initial and end point discrimination, it is possible to effectively extract a complete valid information map containing a single action from the continuous action. We separately extracted the continuous motion of each person, and the initial and final discriminant rates of the first 10 subjects were as follows
Among them, the testers of No. 6 and No. 10 have lower discriminating rate, but the overall discriminating rate is higher. The reason may be that the individual problem is determined by the action being too fast or too slow. In order to carry out the feature extraction effectively, the data segment containing the single complete motion information extracted after the start and end point discrimination is uniformly adjusted to 120 dimensions by linear interpolation processing, and the effect diagram is as follows:
The basic function of feature extraction is to extract the feature vectors that are most effective for classification and identification in the original data, describe and characterize the recognition objects, and thus provide accurate and efficient support for classification identification. In this experiment, the wavelet transform is selected for feature extraction. Figure 9 shows the results of the forehand shot after wavelet transformation.

Signal diagram of the forehand shot after filtering.

Effect diagram of a single forehand shot.

Results of the forehand shot signal after wavelet transform.
As can be seen from the above figure, most of the information amount of the action is included in the low frequency range and part of the intermediate frequency range. In order to include as much as possible of the original motion information and reduce the computational complexity, each action retains the first 100-dimensional wavelet coefficients. The wavelet coefficients of the last extracted single action are 6 axes * 100 = 600 dimensions.
Through feature selection, the computational complexity can be reduced on the basis of retaining the original data information as much as possible. In this experiment, the PCA-RelieF algorithm is selected for feature selection. The feature selection result of forehand shot is as follows.
Among them, V1, V2, and V3 are the main components of the forehand shot wavelet coefficient after PCA principal component analysis and then selected by the ReliefF algorithm. Its interpretation of the original information is as follows:
In the classification verification stage, 3 people were randomly selected from 20 people, and each of the 3 actions was performed 10 times, and a total of 90 actions were performed to verify the classification. The results are as follows:
It can be seen from Table 4 that when C is 100 and γ is 0.00033, the recognition rate is the highest, reaching 93.78% or more, which can verify that the identification algorithm design is reasonable and feasible.
Discriminant rate statistics
Feature selection results
The total variance of explains
Classification results
This paper presents an improved motion recognition scheme for athletes based on acceleration sensors. At present, the recognition of human motion patterns based on acceleration sensors is still at a relatively basic stage, and it becomes difficult for diverse objective environments and complex human movements, and there are many problems that need to be solved. However, in general, simple daily motion recognition has been gradually improved and new techniques for constructing complex human motion pattern classifications have emerged, and eventually these techniques are used to perceive people and actions. These technologies will certainly become one of the key research directions in the field of human motion pattern recognition in the future, and it is worth studying in both theoretical and practical applications.
This paper mainly introduced the algorithm design of motion recognition system. First of all, this paper smoothed the human upper limb motion signal, and then discriminated the start and end points of each action and extracted the action effective signal. After obtaining the effective signal, the normalization process was performed, then the wavelet transform was used to extract the feature of the acceleration signal, and the wavelet transformed signal was used to select the feature vector to remove the redundant feature by using the PCA-RliefF algorithm. Then the SVM was trained on the feature vector by SVM. Finally, the experimental results showed that the SVM model has a higher recognition rate.
In addition, this paper used the inertial sensor hardware platform to collect the signal of human body action in the action of forehand shot in table tennis.
Data acquisition is freer. When using the acceleration sensor to collect human motion data, it is not restricted by external conditions (such as light, angle, obstacles, etc.), and no external devices, such as camera equipment, touch screen, etc., are required. Moreover, users can move more naturally according to their usual habits, so that the data obtained is easier.
The way to get sports information is more convenient. Because the acceleration sensor has the characteristics of low price, small size and high sensitivity, it is easy to integrate into mobile phones, game controllers, watches, waist pendants, etc., so it can be placed anywhere in the body, such as pockets, waist, wrists, etc., and users can carry them without any discomfort.
The way of acquiring sports information is more humane. When the user carries the acceleration sensor, the user can exercise in accordance with daily living habits and exercise habits. Moreover, the acceleration sensor actively participates in acquiring data without the passive participation of the user. Therefore, whether the user is walking, going upstairs, going downstairs, or going to the bathroom, as long as the user moves, the acceleration signal can be collected, and the motion data can be obtained without invading the user’s privacy.
The classification and identification of data is only done offline identification, and the efficiency and performance of the algorithm are not analyzed. Therefore, the next step of work should be to achieve online recognition of the action. In addition, this article only classifies and recognizes the three simplest common batting actions in a sport of table tennis. In the future, it needs to be extended to more complicated movements and more sports such as badminton games and golf.
Conclusion
In order to eliminate and reduce noise interference, improve the validity of the data, and make the curve smoother, the data must be denoised and smoothed. Therefore, the system uses a five-point three-smoothing method to smooth the signal to reduce the impact of burrs. In this paper, we focus on collecting the acceleration signal of one action, and then use the judgment start and end points to eliminate the invalid data, and finally extract the effective action signal. In this paper, the inertial sensor hardware platform is used to collect the forehand shot in table tennis, which is a simple and common human motion signal. The experimental arrangement is as follows: This experiment designs the sample training model by collecting the motion signals of 20 experimental objects. In the test, 6-dimensional acceleration information is acquired by an inertial sensor attached to the arm. In addition, feature selection can reduce the computational complexity on the basis of retaining the original data information as much as possible. Therefore, this experiment chose to use the PCA-RelieF algorithm for feature selection. Finally, the results of the study can verify that the identification algorithm design is reasonable and feasible.
