Automatic posture capture of sports movement based on sensor information fusion

Abstract

Aiming at the problems of low accuracy and long time consumption of motion attitude information collection in the existing methods, a new automatic attitude capture method based on sensor information fusion is proposed. The line builds a motion attitude information acquisition architecture through five functional modules, including data acquisition and processing, laser sensor, data storage, voltage conversion and serial communication, to obtain motion attitude data. Then, the sensor image information collected by median filtering is processed, and the target image is obtained by fusing the binary image obtained by optical flow segmentation with the original image. Finally, the Mean-Shfit algorithm and particle filter are used to track the human moving target, and the self-similar matrix is used to automatically capture the motion posture. Experiments show that the method has high recall and accuracy for the collection results of motion attitude information, the accuracy rate of automatic motion attitude capture is as high as 95.5%, the maximum capture time is 79 ms, the time consumption is short, and the practical application effect is good.

Keywords

Sensor information fusion sports posture automatic capture optical flow segmentation mean-shfit algorithm particle filter

1. Introduction

Body posture can be divided into two types: motion posture and static posture. The study of its motion posture refers to the use of various methods to detect and track the motion information of the human body, so as to obtain various data motion information, so as to promote our country’s research on human medicine. develop. With the gradual deepening of people’s research on sports posture, a new discipline has been formed [1]. Human motion biomechanics involves a wide range of disciplines, including new technologies such as metrology, human anatomy, surgery, kinematics, robotic mechanisms, testing techniques, and computers [2]. It is based on the basic laws of human anatomy, establishes the whole or part of the human body model, uses the test method to obtain the human body movement speed, angle, acceleration and EMG signal, and further analyzes it, and finally obtains the instantaneous posture and change law of the human body movement. It is very valuable to track and detect the movement state of the human body [3], and can be applied to the daily sports training of athletes, military training and rehabilitation of children with cerebral palsy, so as to provide more scientific rehabilitation training and treatment plans [4].

At present, some progress has been made in the research of sports posture. For example, Han [5] proposed a key posture detection method of sports video based on BP neural network. The method to athletes and coaches can more efficiently training, face lifting motion video, USES hierarchical block background estimation method for human target detection, according to the characters of edge feature extraction to accomplish goals, using the improved BP neural network to weightlifting, which can identify common four kind of body posture to related gesture recognition results are obtained. Although this method can recognize related gestures, its recognition of overall motion is not in place and cannot be applied on a large scale. Zheng [6] proposed a motion training pose intelligent capture method based on deep learning. Based on Arduino embedded development board and equipped with multiple IMU sensors, the method established a system to collect accurate human movement data such as speed and acceleration by using stepper motors, and obtained accurate human movement data. In order to accurately identify human motion models, a deep learning model based on symmetric coding, time scale coding and structure coding was built to capture motion posture. This method has the problem of poor accuracy in capturing moving objects, and its detection time is long. Chen [7] proposed a design method of real-time motion posture detection system based on inertial sensor. The system consists of hardware and software. The hardware part realizes the specific design of circuit through inertial sensor, and uses nodes to represent joint positions to detect the skeletal posture of human body during sports. The software part is composed of three modules, which are equipment communication module, data processing module and attitude discrimination module. The LAN communication mode can enhance efficiency and accuracy of transmitting information and data, and at the same time can realize continuous detection of sports posture in a real time manner. Although this method can perform real-time detection of human motion posture, its computational complexity is large and its recognition accuracy is not high. Yang et al. [8] proposed a human pose recognition method based on Android sensor. The method by collecting Android devices embedded acceleration sensor produced by the acceleration in the process of human motion data, data extract different motion features, and recognition of walk, run, go upstairs and downstairs four basic motion, and through continuous criterion and the subsequent attitude criterion to identify the transformation between different postures, gesture to accurately identify sports. This method is prone to loss when capturing data of moving objects, so its detection efficiency is not high. An automatic motion posture recognition method based on computer video processing technology is proposed by Wu and Qin [9]. The recognition criteria of motion pose are set and motion feature vectors of different motion pose are defined respectively. The video data of motion posture can be captured by the installed camera equipment. By using computer video processing technology, the initial video data is preprocessed through the steps of motion video image extraction, video transformation and motion residual compensation. On this basis, the moving target is detected, and the motion attitude features are extracted from the contour, motion period and other aspects, and the fusion feature vector is obtained. By comparing with the set recognition standard, the final automatic recognition result of motion posture is obtained. Although this method improves the accuracy of motion gesture recognition to a certain extent, it still has the problem that recognition takes a long time.

Based on this, this paper proposes an automatic motion pose capture method based on sensor information fusion. It first builds a motion pose information collection architecture and collects motion pose data. Then, the collected sensor image information is processed, and the target image is obtained by fusing the binary image obtained by optical flow segmentation with the original image. Finally, the Mean-Shfit algorithm and particle filter are used to track the human moving target, and the self-similar matrix is used to automatically capture the motion pose. It is hoped that through the research in this paper, the collection accuracy of motion attitude information can be improved, and the time-consuming recognition can be shortened.

2. Design of automatic capture method for sports posture

2.1 Sports posture information collection architecture

The sports posture information acquisition architecture is mainly composed of five functional modules: data acquisition and processing, sensors, data storage, voltage conversion and serial communication. The data acquisition and processing module is mainly composed of the main controller and its peripheral circuit. The main task is to collect the digital measurement signal output by the sensor in real time and fuse the original attitude data to calculate the human body attitude Angle information. Sensors mainly include MEMS sensor, laser sensor and thermal imager, and their main function is to collect real-time sports posture information [10, 11]. The data storage module stores the constant parameters of the data fusion algorithm and the initial parameters of each sensor chip. After the system is powered on, the initialization Settings of each module can be quickly completed. Voltage conversion module for the system is set to provide the required or voltage. Finally, the serial communication module sends the attitude information to the host computer for real-time display.

2.1.1 Data acquisition and processing module

In this paper, the STM32F405RG microcontroller is used as the main controller of the data collection and processing module in the attitude measurement system. It is a digital signal controller based on the Kernel and bits of Cortex-M4 and core frequency as high as ST company launched. Schematic diagram of main controller circuit is shown in Fig. 1.

Figure 1.

Schematic diagram of STM32F405RG circuit.

STM32F405RG, as a digital signal controller and a powerful microcontroller (MCU) control peripheral, has fast computing power of digital signal processing (DSP) technology. STM32F405RG has a clock frequency of 168 MHZ and 32-bit operation accuracy, and provides rich controller peripheral interfaces and floating point operation units, so it meets the requirements of data collection and processing module for synchronous data collection of multiple sensors and for the speed of data fusion algorithm [12].

2.1.2 Sensor module

The laser sensor based on the working principle of radar usually uses invisible light (such as infrared light) with the unique advantages of short wavelength and narrow beam, so it has very high Angle, distance and speed resolution. It can quickly obtain the three-dimensional information of a certain section of the space object, and can quickly obtain the whole three-dimensional space information of the object through the necessary rotation and translation device, so as to realize the fast capture of sports posture. The laser sensor circuit is shown in Fig. 2.

Figure 2.

Laser sensor circuit.

Figure 3.

Thermal imager circuit.

In order to capture more sports posture data, a thermal imager is introduced. The circuit of the thermal imager is shown in Fig. 3.

The MEMS sensor consists of a 6-DOF MEMS inertial measurement unit (IMU) MPU600 and a digital compass (HMC5883L), in which a three-axis gyroscope and a three-axis accelerometer are integrated in the MPU600. Since the attitude sensor is responsible for providing the original motion information, the accuracy of its output data directly affects the accuracy of the attitude measurement results. Therefore, it needs to further understand their working principle when designing the software and hardware of the module, and at the same time, it is necessary to carry out sensor correction before the original data fusion.

2.1.3 Data storage module

The module is mainly used to store the initialization parameters of each sensor and some constants needed in the attitude calculation process to ensure that each module can work normally after the system is powered on. In this paper, MACRONIX’s 32Mbit serial flash memory mx25l3206em2i-12g is selected as the data memory. It provides a standard communication interface for reading and writing operations, and the clock frequency is 86 MHz. Figure 4 shows the circuit principle of the data storage module.

Figure 4.

Circuit principle of data storage module.

2.1.4 Voltage conversion module

Since the data acquisition and processing module, MEMS sensor module and data storage module in the system all use $+$ 3.3 V power supply, it is necessary to convert the external power supply (generally $+$ 5 V) to the power supply voltage required by the system. In this paper, LP2985 voltage regulator chip is used to design the voltage conversion module. The chip has strong stability, small size, low noise, high output voltage precision and wide input voltage range. As shown in the schematic diagram of the voltage conversion module, $+$ 5V is provided by the external power supply and $+$ 5V ${}_{-}$ USB is the voltage of the USB communication interface. The circuit principle of LP2985 is shown in Fig. 5.

Figure 5.

Circuit principle of LP2985.

2.1.5 Serial communication module

In this paper, a high integration half-duplex micro-power wireless data transmission module is used to realize the communication between the attitude measurement system and the host computer. By adopting efficient cyclic interlacing error detection coding, serial communication module has improved anti-burst interference and sensitivity, which can correct continuous burst errors at the maximum. Therefore, the module is especially suitable for use in the industrial field and other harsh environment with strong interference.

Figure 6.

Serial communication module.

2.2 Sensor data fusion

Median filtering has a suppression effect on pulse interference and image scanning noise. In addition to removing noise, the image edge details can be well preserved. The basic idea of median filtering is to sort the gray values of pixels in the neighborhood of pixels and select the intermediate value as the pixel value of the point. For the median filter of $N\times M$ window size, its mathematical expression is:

$\displaystyle h\left({i,j}\right)=\mathop{E}\limits_{N\times M}\left\{{f\left(% {i,j}\right)}\right\}$ (1)

where, $f\left({i,j}\right)$ represents the pixel value of $x$ rows and $y$ columns in the image, and $E$ is the sum of image pixels.

Optical flow technology, proposed by Gibson in 1950, is a key subject in the research of three-dimensional motion analysis and computer vision. The study of optical flow is to determine the “motion” of pixel position by the correlation and temporal variation of pixel intensity data.

It is assumed that the gray value of point $\left({x,y}\right)$ in the image at moment $t$ is $I\left({x,y,t}\right)$ , and at moment $t+dt$ , the point moves to $\left({x+dx,y+dy}\right)$ , where $d x$ is the distance of point $\left({x,y}\right)$ moving in the direction of $x$ , and $d y$ is the distance of point $\left({x,y}\right)$ moving in the direction of $y$ . According to the basic assumption that image gray level is constant, then:

$\displaystyle I\left({x,y,t}\right)=I\left({x+dx,y+dy,t+dt}\right)$ (2)

According to Taylor series expansion:

$\displaystyle I\left({x,y,t}\right)=I\left({x,y,t}\right)+\left({\frac{dx}{dt}% \frac{\partial I}{\partial x}+\frac{dy}{dt}\frac{\partial I}{\partial y}+\frac% {\partial I}{\partial t}}\right)+O\left({dx,dy,dt}\right)$ (3)

Ignoring higher-order terms, we can get:

$\displaystyle\frac{dx}{dt}\frac{\partial I}{\partial x}+\frac{dy}{dt}\frac{% \partial I}{\partial y}+\frac{\partial I}{\partial t}=0$ (4)

In the formula, $u=\frac{dx}{dt}$ , $v=\frac{dy}{dt}$ are velocity components in horizontal and vertical directions respectively. The basic equation of optical flow field can also be expressed as:

$\displaystyle\nabla I\cdot U+I_{t}=0$ (5)

In the formula, $\nabla I=\left({I_{x},I_{y}}\right)$ represents gradient direction and $U=\left({u,v}\right)^{T}$ denotes optical flow [13].

Horn-schunck proposed the smoothness constraint of optical flow, and the optical flow calculation equation is:

$\displaystyle\int{\int{\alpha^{2}}}\left\{{\left[{\left({\frac{\partial u}{% \partial x}}\right)^{2}+\left({\frac{\partial u}{\partial y}}\right)^{2}+\left% ({\frac{\partial v}{\partial x}}\right)^{2}+\left({\frac{\partial v}{\partial y% }}\right)^{2}}\right]+\left({I_{x}u+I_{y}v+I_{t}}\right)^{2}}\right\}dxdy=\min$ (6)

Further collated by corresponding Euler equation:

$\displaystyle u^{\left({n+1}\right)}=\bar{u}^{\left(n\right)}-\frac{I_{x}\bar{% u}^{\left(n\right)}+I_{y}\bar{v}^{\left(n\right)}+I_{t}}{\alpha^{2}+I_{x}^{2}+% I_{y}^{2}}\cdot I_{x}$ (7) $\displaystyle v^{\left({n+1}\right)}=\bar{v}^{\left(n\right)}-\frac{I_{x}\bar{% u}^{\left(n\right)}+I_{y}\bar{v}^{\left(n\right)}+I_{t}}{\alpha^{2}+I_{x}^{2}+% I_{y}^{2}}\cdot I_{y}$ (8)

In the formula, $n$ represents the number of iterations, and $\alpha$ represents the weight coefficient, i.e., the relative weight between the above two errors.

In the study and application of images, only some specific areas of the image are often interested, these parts are called the target or the foreground, and the rest are the background. It is usually necessary to extract a certain area of interest from the image before further processing and utilization of the image [14]. Image segmentation and object extraction transform the image into a more abstract and compact form, which is more conducive to image analysis and understanding. Image segmentation is a basic technology of computer vision and a key step for further image analysis. Because the target moves in a static background scene, it may be complicated to segment the image directly for scenes with complex texture. The optical flow method can extract the moving object from the image after processing, but its performance is unstable and the extraction accuracy is not high. In this study, the optical flow method is used to extract the moving region of the image from the feature of the larger optical flow value in the optical flow field. Then the moving target region is further processed to extract the moving target.

2.2.1 Threshold segmentation

Considering the imaging principle of images, different gray values in images represent different temperature values through thermal radiation imaging of objects. Usually need to detect the temperature of the target (such as a person or animal) relative to the ambient temperature is higher (most of the environment is so), target brightness than the environment brightness in the image, and the large difference of environmental background, so the threshold segmentation technology for extracting targets from the image is a very effective method [15, 16].

Assuming that $p_{i}=\frac{N_{i}}{N}$ denotess the probability of occurrence of gray level $i\left({i\in 1,2,\ldots,L}\right)$ , and threshold $t$ divides the image into foreground and background areas, namely $R_{1}=\left\{{0,1,2,\ldots,t}\right\}$ and $R_{2}=\left\{{t+1,t+2,\ldots,L-1}\right\}$ , the probability distribution on area $R_{1}$ and $R_{2}$ is $\left\{{\frac{P_{0}}{P_{t}},\frac{P_{1}}{P_{t}},\ldots,\frac{P_{t}}{P_{t}}}\right\}$ and $\left\{{\frac{P_{t+1}}{1-P_{t}},\frac{P_{t+2}}{1-P_{t}},\ldots,\frac{P_{L-1}}{% 1-P_{t}}}\right\}$ , where $p_{t}=\sum_{i=0}^{t}{P_{t}}$ . Two entropies associated with area $R_{1}$ and $R_{2}$ are defined as:

$\displaystyle E_{1}\left(t\right)=-\sum\limits_{i=0}^{t}{\frac{P_{i}}{P_{t}}}% \log_{2}\frac{P_{i}}{P_{t}}$ (9) $\displaystyle E_{2}\left(t\right)=-\sum\limits_{i=t+1}^{L-1}{\frac{P_{i}}{1-P_% {t}}}\log_{2}\frac{P_{i}}{1-P_{t}}$ (10)

The entropy of the histogram is:

$\displaystyle E=E_{1}\left(t\right)+E_{2}\left(t\right)$ (11)

When entropy $E$ reaches the maximum, the gray distribution in the target area has the maximum uniformity, and the optimal threshold $t$ is obtained.

2.2.2 Digital morphological processing

After the above threshold extraction, the image is only a collection of scattered points, and the target object cannot be completely extracted. Due to subtle changes in the image background, background area exists in the point close to the target pixel, as well as the selection of threshold value, threshold segmentation may not be a complete target area, so the image needs further processing. Mathematical morphology can measure and extract the corresponding shape of an image by using structural elements, and then identify the image. It is capable of simplifying image data, maintaining the basic shape of the image, and removing irrelevant structures [17]. Mathematical morphology consists of four operations including expansion, corrosion, open operation and closed operation.

Expansion: $S\left[a\right]$ is obtained after translation of structural element $S$ by $a$ . If $S\left[a\right]$ hits target image $X$ , the $a$ points are recorded, and the set of all $a$ points that meet the conditions is the result of expansion of $X$ , which can be expressed:

$\displaystyle X\oplus S=\left\{{a\left|{S\left[a\right]\uparrow X}\right.}\right\}$ (12)

where $\oplus$ represents the dilation operator.

The bloat operation merges the background pixels into the target pixels, and if multiple target objects are close to each other, they are usually connected together.

Corrosion: $S\left[a\right]$ is obtained after the structural element $S$ is shifted by $a$ . If $S\left[a\right]$ is included in the target image $X$ , the $a$ points are recorded. The set of all $a$ points that meet the conditions is the result after corrosion of $X$ , which can be expressed:

$\displaystyle X\Theta S=\left\{{a\left|{S\left[a\right]\subseteq X}\right.}\right\}$ (13)

where $\Theta$ represents the corrosion operator.

Corrosion operation can remove target edge pixels, objects smaller than the operation structure, and corrosion can remove small connections between objects.

Open operation and closed operation: based on the basic operation of expansion and corrosion, it is possible to compound these two operations and set (union, intersection and complement) operation combination operation, among which the two important combination operations are Opening and Closing. The open operation is to use structural element $S$ to corrode and then expand $X$ , which can be expressed as:

$\displaystyle X\circ S=\left({X\Theta S}\right)\oplus S$ (14)

The closed operation uses structural element $S$ to expand and then corrode $X$ , which can be expressed as:

$\displaystyle X\cdot S=\left({X\oplus S}\right)\Theta S$ (15)

Opening operation and closing operation makes the object’s outline is smooth, can be used to filter out noise or fill the void, but open operation for decomposition of image, disconnect the narrow gap and eliminate small protrusions, and closed operation, by contrast, are commonly used to eliminate the narrow gap or bridge connecting adjacent image pixels.

The binary image obtained by optical flow segmentation is fused with the original image to obtain the target image.

$\displaystyle P\left({i,j}\right)=\left\{{\begin{array}[]{l}I_{0}\left({i,j}% \right),I_{0}\left({i,j}\right)*\textit{BW}\left({i,j}\right)\neq 0\\ 0,I_{0}\left({i,j}\right)*\textit{BW}\left({i,j}\right)=0\\ \end{array}}\right.$ (16)

2.3 Automatic capture of sports posture

The automatic capture process of sports posture based on human body posture self-similarity matrix is as follows.

2.3.1 Human target detection

The human body shape template was established on the basis of human body image set. For each shape template, a point set was obtained by sampling, and then the context descriptor of each point in the point set was calculated, which was used as the human body shape feature. For each image to be detected, a pyramid image is built according to the image size, and the boundary of each sub-image is obtained by Canny boundary detection algorithm. The set $S_{s}$ of pointsis obtained $r$ by random sampling from the image boundary. For any point $p_{i}$ of set $S_{s}$ , the matching degree with any point $q_{j}$ of template shape $T_{\textit{is calculated}}$ , and the distance is defined:

$\displaystyle C_{jj}^{t}=\frac{1}{2}\sum{\frac{\left[{h_{i}\left(k\right)-h_{j% }^{t}\left(k\right)}\right]^{2}}{h_{i}\left(k\right)+h_{j}^{t}\left(k\right)+% \varepsilon}}$ (17)

In the formula, $h_{i}$ and $h_{j}$ are the shape context descriptors of $p_{i}$ points on the image to be detected and $q_{j}$ points on the template image, respectively; and $h_{i}\left(k\right)$ and $h_{j}\left(k\right)$ are the $k$ -th component of the descriptor, respectively. To prevent the denominator on the right-hand side of the formula from going to zero, a very small positive $\varepsilon$ is added to the denominator.

In order to reduce the impact of noise, the matching cost function of Eq. (17) is modified. In Eq. (17), if $h_{j}\left(k\right)$ is equal to zero and $h_{i}\left(k\right)$ is not, the difference of this component is discarded and the matching distance is redefined:

$\displaystyle C_{ij}^{t}=\frac{1}{2}\sum{\frac{\left[{h_{i}^{\prime}\left(k% \right)-h_{j}^{t}\left(k\right)}\right]^{2}}{h_{i}^{\prime}\left(k\right)+h_{j% }^{t}\left(k\right)+\varepsilon}}$ (18)

In the formula, $h_{i}^{\prime}\left(k\right)$ is the modified shape context descriptor, which is defined as follows:

$\displaystyle h_{i}^{\prime}\left(k\right)=\left\{{\begin{array}[]{l}0,\text{% if},h_{i}^{t}\left(k\right)=0\\ h_{i}\left(k\right),\text{else}\\ \end{array}}\right.$ (19)

Each pair of points of $S_{s}$ in the set is considered to be two points on the shape of a candidate target if the set conditions are met, that is, a candidate target can be determined through these two points. Next, remove some interference targets. Basic principle is that if there are two points existing in the shape of the object under test, and their shape descriptors and template respectively corresponding to the distance between the two shape descriptor is very small, and the approximation is equal to the geometric distance of the two points in the graph under test template corresponding geometric distance of two points, then think for the graphics is similar to the shape of the template, in order to complete the human body target detection. Its flow chart is shown in Fig. 7.

Figure 7.

Human target detection flow chart.

2.3.2 Moving target tracking

In the human movement video, the horizontal and vertical coordinates of the moving target center point $P$ are represented as $P_{x}$ and $P_{y}$ , the horizontal and vertical velocities of the moving target are represented as $V_{x}$ and $V_{y}$ respectively, and the state of the moving target is represented as $Z=\left[{P_{x},P_{y},V_{x},V_{y}}\right]$ .

The maneuver of the target is considered as a kind of random disturbance, and its disturbance size is expressed by the covariance of the process noise by $W_{t-1}$ . Based on the fact that the motion speed of the target does not change very much between video adjacent blocks, the first-order constant velocity model is employed to describe the motion law of the target, and the motion model can be expressed as follow.

$\displaystyle Z_{t}=\left[{{\begin{array}[]{cccc}1&0&{dt}&0\\ 0&1&0&{dt}\\ 0&0&1&0\\ 0&0&0&1\\ \end{array}}}\right]Z_{t-1}+W_{t-1}$ (20)

After predicting the state of the system through the motion model, the estimated value $Z_{t}^{\prime}$ of the state at the current moment $t$ can be obtained, which can be corrected by using the observed value obtained at the current moment. The process of correcting the system state through observation is basically a process measuring the similarity between the possible state of the target and the real state of the target through observation. The observed probability density function likelihood function is defined as follows:

$\displaystyle p\left({X_{t}\left|{Z_{t}^{\prime}}\right.}\right)=\frac{1}{% \sqrt{2\pi\sigma}}e^{-\frac{D_{i}^{2}}{e2\sigma^{2}}}$ (21)

In the formula, $D_{i}$ denotes the distance between the estimated target center $y^{\prime}$ of the $i$ -th particle and the actual position $y$ (mean-shfit converges to $y$ through $y^{\prime}$ ), and $\sigma$ is the Gaussian variance. The larger the value of Eq. (21) is, the smaller the distance between $y^{\prime}$ and $y$ is, and the more reliable the condition of Mean-Shfit algorithm is. In other words, the smaller $d y$ is, the more likely the candidate target is to be a real target.

The transfer probability density function $p\left({Z_{t}\left|{Z_{t}^{i}}\right.}\right)$ is selected as the importance density function, and the weight update of the particle can be expressed as:

$\displaystyle w_{t}^{i}=w_{t-1}^{i}p\left({Z_{t}\left|{Z_{t}^{i}}\right.}\right)$ (22)

After the motion model and observation model of particle filter algorithm tracking are described, the algorithm of Mean-Shfit and particle filter to realize human moving target tracking is presented. The specific process of the algorithm is as follows:

According to the tracking target position and speed selected in the initial frame, $N$ particles $Z_{0}^{i}\left({i=1,2,\ldots,N}\right)$ and particle weight $w_{0}^{i}=\frac{1}{N}$ are initialized based on the joint Gauss hypothesis.

According to the motion model of Eq. (20), sample $Z_{t}^{i}$ is obtained from importance function $p\left({Z_{t}\left|{Z_{t-1}^{i}}\right.}\right)$ . Taking position $\left({x_{t}^{i0},y_{t}^{i0}}\right)$ of the target state of particle $Z_{t}^{i}$ as the starting point, Mean-Shfit algorithm converges to $\left({x_{t}^{i},y_{t}^{i}}\right)$ .

Calculate the Euclidean distance $D_{i}$ between $\left({x_{t}^{i0},y_{t}^{i0}}\right)$ and $\left({x_{t}^{i0},y_{t}^{i0}}\right)$ and substitute it into Eq. (21) to obtain $p\left({z_{t}\left|{x_{t}^{i}}\right.}\right)$ .

The importance weight 5 is estimated by Eq. (21) to obtain the total weight, which is calculated by the following formula:

$\displaystyle t=\sum\limits_{t=1}^{N}{w_{k}^{i}}$ (23)

For each particle, normalize its weight:

$\displaystyle w_{k}^{i}=\frac{w_{k}^{i}}{t}$ (24)

Calculate effective sampling scale $N_{E}$ , if $N_{E}<N_{T}$ , apply simple random resampling to solve particle degradation.

The obtained particles represent the posterior distribution of the target state at $t$ time.

The target state value is estimated according to particle $w_{k}^{i}$ to track human target.

2.3.3 Automatic capture of sports posture

The region cost reflects the penalty for each pixel being missegmented, such as $R_{i}\left(1\right)$ for marking the $i$ -th pixel as foreground and $R_{i}\left(0\right)$ for marking the $i$ -th pixel as background. Define $R_{i}\left(\cdot\right)$ by defining grayscale histograms $P_{i}\left({I\left|O\right.}\right)$ and $P_{r}\left({I\left|B\right.}\right)$ for foreground and background.

$\displaystyle\left\{{\begin{array}[]{l}R_{i}\left(0\right)=-\ln P_{r}\left({I_% {i}\left|B\right.}\right)\\ R_{i}\left(1\right)=-\ln P_{r}\left({I_{i}\left|O\right.}\right)\\ \end{array}}\right.,0\leqslant P_{r}\leqslant 1$ (25)

Relationship cost $B\left(A\right)$ reflects the relationship between pixels. If the $i$ t-th pixel and the $j$ -th pixel are neighborhood relations, when the marker vector element $A_{i}\neq A_{j}$ , then the boundary between $i, j$ is the segmentation. $B_{ij}$ is used to judge whether there is a possibility of boundary between $i, j$ . According to the definition, the difference between $B_{ij}$ and the gray scale of adjacent pixels is in a reverse change relationship, that is, the more similar the gray scale of two pixels is, the greater $B_{ij}$ is, and the higher the boundary cost of segmentation is between the $i, j$ -th pixels. Therefore, this feature can be used to segment foreground image and background image, complete foreground image extraction, and finally achieve.

A self-similar matrix is a graph that can reflect the cyclic properties of a system. State cycling exists in many dynamic systems. To express the cyclic characteristics of such a dynamic system in the form of graphs, Eckmann proposed a Recurrence Plot of signals that could be used to display the recursion of states in the phase space. The Recurrence Plot was defined as follows:

$\displaystyle R\left({i,j}\right)=H\left({\varepsilon-\left\|{\mathord{% \buildrel\lower 3.0pt\hbox{$\scriptscriptstyle\rightharpoonup$}\over{x}}\left(% i\right)-\vec{y}\left(i\right)}\right\|_{2}}\right),\mathord{\buildrel\lower 3% .0pt\hbox{$\scriptscriptstyle\rightharpoonup$}\over{x}}\left(i\right)\in\Re^{m% },i,j=1,...,N$ (26)

where $N$ refers to the number of states $\mathord{\buildrel\lower 3.0pt\hbox{$\scriptscriptstyle\rightharpoonup$}\over{% x}}\left(i\right)$ to be considered, $\left\|\cdot\right\|$ is the modulus of the vector, $H$ is the step function, and $\varepsilon$ is the distance threshold. When the threshold $\varepsilon$ is determined, the signal recursive graph is a binary image containing rich information in a dynamic system.

The self-similar matrix reflects the correlation of sequential images. Suppose there is a video image sequence, $I=\left\{{I_{1},I_{2},\ldots,I_{n}}\right\}$ , whose self-similarity matrix is defined as follows:

$\displaystyle D=\left[{{\begin{array}[]{cccc}0&{d_{12}}&\cdot&{d_{nn}}\\ \cdot&0&\cdot&{d_{2n}}\\ \cdot&\cdot&0&\cdot\\ {d_{n1}}&{d_{n2}}&\cdot&0\\ \end{array}}}\right]$ (27)

Element $d_{ij}$ in the matrix represents the distance between the underlying features of image $I_{i}$ and $I_{j}$ , and element $d_{kk}$ on the diagonal of the matrix represents the distance between image $I_{k}$ and itself, so element $d_{kk}$ on the diagonal is all zero.

3. Simulation experiment

Simulation is carried out to validate the feasibility of the proposed automatic posture capture method.

A total of 1,500 healthy volunteers, 900 men and 600 women, aged 18 to 35, were recruited for the exercise experiment. In the process of the experiment, the subjects need to complete a number of sports experimental tasks. During the experiment, the data of the sensor terminal is transmitted to the monitoring software of the upper computer by wireless way for real-time display and storage. Then, the data of each task was selected from the experimental data of each tester, and the motion state under the corresponding posture was determined at each time point to verify the effectiveness of different methods.

Table 1
Experimental environment

Parameter	Describe
CPU	10 nuclear Intel Xeon E5-2640 CPU
Memory	64 GB
Hard disk	HDD 10 TB
	SSD 480 GB
Network card	Broadcom NetXtreme Gigabit Ethernet
Operating system	Windows XP
Simulation software	MATLAB 7.2

The proposed method, the method in Reference [5] and method in Reference [6] method are comparatively analyzed in terms of the recall and precision of sports posture information collection.

The recall comparison results of sports posture information collection are shown in Fig. 8.

Figure 8.

Comparison of recall.

The recall rate of method in Reference [5] reaches the maximum of 85% in 70 experiments and that of method in Reference [6] reaches the maximum of 86% in 13 experiments. Compared with these two methods, the recall rate of this method reaches the maximum of 97% in 47 experiments, indicating that the recall rate of sports posture information collection in this method is higher than these two methods. The results of information collection are more comprehensive.

Table 2

Comparison of accuracy rate (unit: %)

Number of experiments	Reference [5] method	Reference [6] method	Method of this paper
10	87.4	78.6	95.6
20	85.2	74.5	94.7
30	84.7	75.8	96.3
40	85.3	72.3	95.2
50	83.6	78.6	93.8
60	81.4	84.2	94.7
70	86.4	84.6	95.4
80	87.1	79.5	98.1
Average value	85.1	78.5	95.5

Figure 9.

Comparison of precision.

Figure 10.

Comparison of time-consuming for automatic capture of sports posture.

The comparison results of the accuracy rate of sports posture information collection are shown in Fig. 9.

The precision rate of the method in Reference [5] reaches the maximum value of 81% in 70 experiments, and the precision rate of the method in Reference [6] reaches the maximum value of 80% in 13 experiments. In contrast, the proposed method reaches the maximum value of 96% in 60 experiments, indicating higher precision rate of sports posture information collection.

The accuracy of sports posture automatic capture of the proposed method, the method in Reference [5] and the method in Reference [6] are compared.

The average accuracy rate of sports posture automatic capture of the method in Reference [5] is 85.1%, and that of method in Reference [6] is 78.5%. In contrast, the accuracy rate of sports posture automatic capture of the proposed method is 95.5%, which is the highest.

The time-consuming of the three method in sports posture capturing is also compared.

The maximum and minimum time-consuming of sports posture automatic capture by the method in Reference [5] method are 271 ms and 169 ms, respectively; that by the method in Reference [6] are 286 ms and 182 ms, respectively. In contrast, the maximum time-consuming of sports posture automatic capture of the proposed method is 79 ms and the minimum is 62 ms, indicating the proposed method takes less time in sports posture automatic capturing and has higher efficiency.

4. Conclusion

With the rapid development of the Internet, more and more people are under pressure. More and more people are busy with work and study and have no time and energy to exercise. Jiang’s death often occurs in many industries, so people have begun to pay attention to their physical condition and hope to achieve the effect of fitness through exercise. People want to quantitatively understand their exercise effects, such as exercise volume, exercise intensity and other information, and skillfully turn to some sports aids. Motion posture capture technology is an important means to obtain human body posture information, and has been widely used in sports training, gait recognition, action recognition, medical rehabilitation and other fields. Therefore, this paper proposes an automatic motion capture method based on sensor information fusion. Experiments show that after the method in this paper is applied, the recall rate of motion pose information collection reaches 97%, the capture accuracy rate is 95.5%, and the maximum time-consuming to capture motion pose is 79 ms, which has good application performance and can be used for motion poses. The further development of capture theory lays a solid theoretical basis for computing. Although the method in this paper has certain application effects, it does not consider the interference of the surrounding environment in the operation process, which is also the future research direction.

References

Komori

Terakawa

Matsutani

Yasuda

. Posture operating method by foot posture change and characteristics of foot motion. IEEE Access. 2019; 7(1): 176266-176277.

Zhou

Yun

Liu

Zhang

. Analysis of the sagittal motion posture of the acromioclavicular joint using image registration and axial angle representation. Int J Gen Med. 2021; 14(1): 1975-1981.

Nishimura

Itoi

Tsurumaki

Kurushima

Tokunaga

. Nursing students’ motion posture evaluation using human pose estimation. Int J Learn Teach. 2020; 23(2): 43-46.

Gao

Zhang

Qiao

Wang

. Wearable human motion posture capture and medical health monitoring based on wireless sensor networks. Meas. 2020; 166(4): 108252-108263.

Han

. Key posture detection of sports video based on BP neural network. J Shangluo Univ. 2019; 33(6): 14-17.

Zheng

. Research on intelligent attitude analysis method of sports training based on deep learning. Electr Des Eng. 2021; 29(10): 167-171.

Chen

. Design of motion attitude real-time detection system based on inertial sensor. Auto Instrum. 2019; 21(10): 38-42.

Yang

Sun

Wang

. Human posture recognition method based on Android sensor. J Nanchang Univ. 2019; 43(6): 616-620.

Qin

. Research on automatic recognition of motion attitude based on computer video processing technology. Mod Electr Technol. 2021; 44(5): 89-93.

10.

Zeng

. Sensor data security fusion method based on node reputation. Comput Simul. 2021; 38(7): 290-293.

11.

Fang

. Intelligent recognition of motion posture based on FPGA and neural network. Microprocess Microsyst. 2020; 4(1): 103374-103386.

12.

Bulbul

Islam

Ali

. 3D human action analysis and recognition through GLAC descriptor on 2D motion and static posture images. Multimedia Tools Appl. 2019; 78(15): 21085-21111.

13.

Mou

. Research on aerobics training posture motion capture based on mathematical similarity matching statistical analysis. Dyn Syst Appl. 2020; 29(3): 1-12.

14.

Yuan

Zhang

Chen

. Adaptive recognition of motion posture in sports video based on evolution equation. Adv Math Phys. 2021; 21(5): 1-12.

15.

Rebecca

Allwin

. Detection of DR from retinal fundus images using prediction ANN classifier and RG based threshold segmentation for diabetes. J Ambient Intell Humanized Comput. 2021; 21(4): 15-26.

16.

Liu

Zhao

. Counting of pine wood nematode disease trees based on threshold segmentation. J Phys Conf Ser. 2021; 1961(1): 012033-012045.

17.

Prabhu

Parvathavarthini

Alaguraja

. Integration of deep convolutional neural networks and mathematical morphology-based postclassification framework for urban slum mapping. J Appl Remote Sens. 2021; 15(1): 1-12.

Automatic posture capture of sports movement based on sensor information fusion

Abstract

Keywords

1. Introduction

2. Design of automatic capture method for sports posture

2.1 Sports posture information collection architecture

2.1.1 Data acquisition and processing module

2.3.1 Human target detection

Table 1 Experimental environment

References

Table 1
Experimental environment