Abstract
Ball sports have great variability in the game and the intelligent control of the rules of ball movement can effectively improve the training effect of athletes. However, the current research on artificial intelligence of spherical motion trajectory prediction points is basically blank. Based on this, this study is based on deep learning technology, and obtains the main experimental data through network data collection in the research and builds the table tennis spatial position image data set under various environments with accurate annotation based on the traditional deep learning. At the same time, the convolutional neural network is used as the location recognition algorithm, and a prediction algorithm for predicting the trajectory of table tennis is proposed based on the recurrent neural network. In addition, this paper designs comparative experiments to analyze the effectiveness of the algorithm model, and evaluates the real-time recognition, location and trajectory prediction capabilities, and conducts quantitative analysis. The research shows that the algorithm has certain practical effects and can provide theoretical reference for subsequent related research.
Introduction
The detection and tracking technology of moving targets is an important research content of stereo vision technology. The essence is to analyze the sequence of multiple video frames captured by two or more cameras, extract a series of imaging coordinate pairs of moving targets in the image sequence according to the characteristics of the moving targets, analyze the coordinate pairs, further obtain the three-dimensional position sequence of the moving target, and realize the motion tracking of the target [1]. Currently, motion detection and tracking technology incorporates advanced technologies in many other fields, such as image processing, artificialintelligence, pattern recognition and automatic control. It has a wide range of applications in robot navigation, public scene monitoring and military vision guidance [2]. In the past visual technology, monocular vision technology has been applied more and has been better developed. However, due to the limitations of monocular vision, it is very difficult for a moving monocular camera to track moving targets. The binocular stereo vision system simulates the binocular mechanism of human beings, and acquires the target image pairs synchronously and simultaneously, and can extract the parallax information of the target in the stereo image pair in real time and realize the tracking of the moving target. Therefore, the research on the detection and tracking of moving targets using binocular stereo vision technology has very important significance in the fields of target measurement, target tracking and three-dimensional information recovery [3].
In table tennis, table tennis is small and flying fast. This requires the table tennis robot to have a continuous rapid response capability, and it takes a short time to complete the trajectory prediction of the high-speed sports table tennis and make an accurate hit. This puts great demands on the real-time and accuracy of the control of table tennis robots. Therefore, the development and wide application of table tennis robots must first ensure effective table tennis trajectory prediction. Only in the premise of sufficient analysis of the characteristics of the ball, can the table tennis robot’s counterattack action be more timely and accurate [4].
Aiming at the problem of the lack of accuracy of the recognition, location and trajectory prediction of the rotating table tennis in the traditional method, based on the neural network method which has better adaptability and generalization to complex environment, this paper studies the real-time recognition, location and trajectory prediction of rotary flying table tennis based on the platform of table tennis humanoid robot system.
Related work
Professor John Billingsley of the United Kingdom first proposed the ping-pong robot competition in 1983. In 1985 and 1988, he held four ping-pong robot competitions. In the meantime, in order to encourage the accuracy of table tennis robots to hit the ball, they also developed the rules for table tennis robot competition [5]. The rule stipulates that the table tennis table is 2m×0.5m, which is smaller than the conventional table size, and the sports space for the table tennis robot to hit the ball is installed in three wire frames respectively on the two sections of the table and the ball net. These rules are difficult for the table tennis robot to hit the ball, and for the era when the visual system and the motion control are not developed at the time, the level of the table tennis robot can be fully utilized, and the high hitting rate isguaranteed.
In 1987, Russell L and Andersson of AT&T Bell Laboratories in the United States jointly developed a relatively perfect table tennis machine system [6]. The table tennis robot has three major components: a real-time vision system, a control system, and a mechanical system. Among them, the PUMA260 robot arm developed by Unimation of the United States was used to hang the arm on the mechanical support, and the link and the racket were fixed in the front of the arm, and the ball position information was obtained through four 756×242 pixel, 60 Hz sampling speed cameras.
In the mid-1980s, Toshiba Corporation of Japan developed a robotic system that hits the wall [7]. It uses a seven-degree-of-freedom humanoid manipulator and binocular vision system, and the image acquisition rate is 100 frames per second, which improves the accuracy of the hit. However, the degree of repetition of the system is relatively high, and the ping-pong track is basically the same.
In the late 1980s, the Swiss Federal Institute of Technology in Zurich designed a six-degree-of-freedom table tennis robot to participate in the table tennis robot competition [8]. The robot consists of a three-degree-of-freedom robotic arm and a three-degree-of-freedom mechanical wrist. It uses a binocular vision system with a pixel resolution of 579×422 and a sampling frequency of 50 Hz. Like the American Andersson, the robot uses a distributed computer network based on the MC6802 and MC68000 processors developed by Motorola and completes table tennis trajectory prediction and robot control. Moreover, the robot won the World Robot Table Tennis Championship twice.
In 1996, Jon Price, an engineer at Salt Lake City, USA, designed the first humanoid robot, SARCOS [9], which was installed on a wheeled moving mechanism to perform a batting action according to the operator’s instructions. The robot detects the opponent’s batting action with a sensor and controls the arm to return the ball through the computer. However, the robot is only a humanoid robot that can execute motion commands.
In 2002, Professor Miyazaki of Osaka University of Japan proposed a method of controlling table tennis robots. It can return the ball to the desired position in the specified time [10]. The robot is based on a binocular vision system and uses a high-speed digital camera with a resolution of 640×416 pixels. Moreover, the system has two translation joints and two rotation joints, a total of four degrees of freedom. In addition, it mainly adopts control strategies such as trajectory clustering analysis and local weighted regression analysis (LWR), and its learning control ability can accurately control the ping-pong drop point.
L. of Spain Acosta et al. applied a robotic arm with a double racket in 2003 and successfully developed a low-cost table tennis robot [11]. Each racquet of the robot has three degrees of freedom, using only a monocular camera, and uses the geometric relationship between the ball and the projection of the ball on the table to determine the spatial position information of the ping pong. The robot’s returning strength is small, and the hitting distance is limited, so the table is also much smaller than the average table. Moreover, in the low-speed man-machine-matching experiment, the success rate of returning the ball is only about 80%. In addition, the robot uses only one camera, avoiding visual time synchronization problems and saving costs, but due to experimental limitations, the requirements for scenes and lighting conditions are strict.
In December 2007, the new humanoid table tennis robot developed by TOSY of Vietnam [12] was unveiled at the International Robot Exhibition held in Tokyo, Japan. The robot has 39 “joints” for flexible rotation and it uses a binocular active vision system. It is loaded with two 200fps cameras that dynamically change the posture as the head moves and respond quickly by detecting the trajectory of the ball. Moreover, the robot’s expression is rich, if it fails to catch the ball, it will make a shaking head, and when it succeeds, it will nod. In addition, its “brain” is an artificial intelligence nervous system, and the best return route is selected by analyzing the ball trajectory.
In 1997, the research team of Professor Zhao Qing of Shanghai Jiaotong University cooperated with Professor Miyazaki Miyazaki of the Japanese University to propose a trajectory prediction method based on LWR (LocalWeightRegression) through the theoretical study of table tennis trajectory prediction and simulation algorithm [13]. In 2001, Yuan Jianchang and others from Northwest Textile Institute of Technology made in-depth theoretical research on the wrist structure of table tennis robots and analyzed the possibility of implementing robotic arms [14]. In addition, Zhang Peiyan and others from Shanghai Jiaotong University have developed volleyball robots based on industrial robots and real-time vision. The research team of Zhejiang University began researching table tennis robots in 2004. So far, they have developed three generations of table tennis robots [15]. The first generation of table tennis robots was introduced in 2006, using a monocular vision system. The system determines the three-dimensional coordinates of the table tennis based on the geometric relationship between the elements “monocular camera”, “light”, “table tennis” and “shadow”. The robot’s single-round return success rate is about 60%. It has laid a foundation for the research team’s further work because it has realized the basic motion control servo platform.
Theoretical analysis
In table tennis, in order to achieve a shot, the end must have a certain speed in the Cartesian space. Therefore, it is necessary to study a Jacobian matrix of six degrees of freedom to convert the hitting speed of the racket to the joint space.
Jacobian matrix solution
The relationship between space velocity and Cartesian space velocity can be expressed as [16]:
The element in column j of row i is
Among them, the first three lines indicate the transfer ratio of the end line speed v, and the last three lines indicate the transfer ratio of the end angular velocity w. Therefore, it can be expressed as [17]:
The Jacobian matrix can be obtained by definition method, vector product method and integral transformation method. The definition method has a large amount of calculation, and a large number of derivation calculations are required, and the process of vector product and integral transformation is more complicated.
Joint variables can be defined as
As shown in Fig. 1, according to the rigid body kinematics, the motion of i is converted to the end effector n, then the end line velocity and angular velocity are

Schematic diagram of the solution of the Jacobian matrix.
According to the definition of the D-H coordinate system and joint variables, the rotating joint has [19]:
The moving joints has:
After formula (7), (8) is substituted into formula (6), the following formula is obtained.
From
Equation (9) is the formula for finding the Jacobian matrix by the vector product method. For the table tennis hitting process, the six joints are all rotating joints, that is,
This is the table tennis arm vector product expression.
However, in the solution process, the solution of Z i × P in is more difficult. Therefore, this paper combines the vector product and the definition method to obtain the Jacobian matrix.
The Jacobian matrix depends on the shape q. When a particular shape makes det(J (q)) = 0, the Jacobian matrix loses its ability to convert the Cartesian velocity to the joint velocity. This shape is called the singular point of the Jacobian matrix and should be removed during use.
According to the above analysis, the determinant of the Jacobian matrix is found, and whether the determinant is zero can be used as the criterion for the singularity of the Jacobian matrix [20].
It can be seen from the calculation that the determinant of the Jacobian matrix is as follows
Thanks to the table tennis hitting point, there are many options. When calculating a certain shape, it is judged whether the Jacobian matrix is singular. If it is singular, the batting point is abandoned, thereby avoiding singularity [21].
A recurrent neural network with an LSTM unit is a special neural network structure. It can process time series data by recording and processing the input of the first N frames (N21) and the network intermediate results, thereby integrating the multi-frame information to complete tasks such as classification, regression, and prediction of the input. The LSTMfN network has the same basic form as other forms of networks, and is composed of an input layer, a hidden layer, and an output layer. The difference is that the hidden layer not only accepts the current frame input, but also accepts the previous frame output and the network intermediate result as input. The basic structure of the network is shown in Fig. 2. The figure shows a schematic diagram of a LSTM network with a hidden layer for five lengths of time [22].

Schematic diagram of recurrent neural network.
The LSTM unit is in fact a special design form of the recursive neural network hidden layer node, and a hidden layer can contain multiple LSTM units. The internal detailed structure of the unit is shown in Fig. 3. Each unit consists of a core node, an input gate, a forgetting gate and an outputgate [23].

Schematic diagram of LSTM unit structure.
Assuming that the current input value is x
t
, the output of the Cell node in the same LSTM unit at the previous time is
Among them, w il is a weight parameter of the input term to the input gate; w cl is the weight parameter of the Cell output item to the input gate at the previous moment; w hl is the weight parameter of the output of other nodes at the previous moment; f is a simple nonlinear activation function. If there is no specific explanation, the nonlinear activation functions used in this paper are all Sigmoid functions.
Similar to the input gate, the output
The output s of the Cell node in the current time unit is determined by the input of the current time, the output of the current time input gate, the output of the current moment forgetting gate, the output of the Cell at the previous moment, and the output of other nodes at the previous moment:
Among them, g is also a Sigmoid function. The output
Finally, the total output
Among them, h is the Sigmoid function. Based on the above analysis, the output of the unit can be obtained.
As can be seen from the above equation, the output of a single LSTM unit at time t is related to the network input at time t and the internal node output of the unit at time £— 1. Moreover, the output of these internal nodes is the selection result of all input-passed weight parameters and nonlinear activation functions at [0, 1,..., t– 1] moments. The operating mechanism enables the RNN network with the LSTM unit to predict the future trajectory of the table tennis based on past trajectory data.
In addition to the error propagation between the layers as well as other forms of neural networks, the back-propagation algorithm for recurrent neural network applications also requires error propagation over time, i.e., back propagation through time. There are a large number of weight parameters to be trained in the RNN network with LSTM units. Among them, a large part is the weight parameter of the network output at time t to the network output at time t-1. According to the gradient descent method and the chain rule, this paper deduces the propagation of the trajectory prediction network timing error.
Assuming that the output of the network at time t is N-dimensional vector O
N
and the input label is y
n
, the error between the predicted value and the label value is calculated by EuclideanLoss:
It is also known that in the time t, O
N
is obtained by the output
According to the chain law, the gradient
The chain rule continues to be used, and the gradient
It can be seen from Equation (4.6) that as the core node of the LSTM unit, the Cell node has an influence on the unit output at time t, the Cell output at time t + 1, the input gate at time t + 1, and the output gate at time t + 1 in the network forward process. Therefore, when performing inverse gradient calculation, the error of reverse transmission from these four directions should also be considered.
It can be seen from the equation that for the error conduction of the Cell node, the internal calculation result of the node at the next moment is needed. Therefore, when training a recurrent neural network with an LSTM unit, it is usually extended according to a certain length of time, and the network parameters in one sequence are simultaneously updated.
Using the chain rule, the gradient
At this point, the error between the predicted value and the tag value is calculated for the conduction gradient of each node inside the LSTM unit. Then, according to the gradient descent method, the weights of each node are updated by applying these gradient values.
In this paper, 1800 tracks are randomly selected to form the training data set, and another 200 tracks constitute the test data set. After that, the above network is trained and tested. In order to better compare the impact of network expansion accuracy according to different step sizes and select the step size that is most suitable for trajectory prediction tasks, this paper chooses three different step sizes to train the network. Figures 4 and 5 show the convergence process of training error and test error during the training of 50,000 times of three networks with expansion steps of 5, 10, and 15, respectively. The error (Loss) is measured in m, which represents the average spatial distance between the predicted position P (x p , y p , z p ) and the true position P r (x r , y r , z r ) in the test of 200 times. The calculation is as follows:

Training error image.

Test error image.
Table 4 and Fig. 5 show the training error and test error values after the network training in the above three step sizes. It can be seen from the table that after sufficient training, the network of three steps can converge to the training and test accuracy of 5mm and below, and the test error is slightly larger than the training error. Among them, the network training accuracy and test accuracy of the expansion step size of 5 are better than the other two networks, and the accuracy of the other two networks decreases with the increase of the step size. After the completion of the training, the prediction error of the test data set of the three-step trajectory prediction network is reduced to less than 5 mm. However, since the method predicts the long-term trajectory by means of iterative calculation, that is, the current time network output is used as the next time network input for prediction, the cumulative error of the network prediction causes the overall prediction error to increase as the number of predicted frames increases. As mentioned in the overview, table tennis robots generally need to complete the prediction of the future 400ms flight trajectory within 100ms. However, the hardware platform visual system used in this paper has a frame rate of 120 fps, which means that the network needs to predict the next 40 frames based on about 10 frames of data.
In order to measure the impact of cumulative error on the overall error, in this paper, the first 10 frames of data are input to the trained network to predict the spatial position of the 14th, 19th, 24th, 29th, 34th, and 39th table tennis balls in the future. Moreover, the prediction error is counted as shown in Table 2 and Fig. 6.

Schematic diagram of network prediction error with predicted frame number.

Time-consuming statistics diagram of network prediction.

Error statistics diagram of collision points.
Training error and test error
Error table of long-term forecast
For the table tennis robot vision system, in addition to predicting accuracy, predicting real-time performance is also an important requirement. This paper also uses the test data set to calculate the average time taken by the above three networks to predict the future trajectory of table tennis, and the test hardware platform is the same as the training hardware platform. Table 3 shows the average time used by the three networks to predict future frames 1, 19, and 39. It should be noted that the statistics in the table do not count the time for pre-preparation work such as network initialization, parameter loading, and generation of calculation graphs. It can be seen from the table that the recursive-based neural network structure proposed in this paper is more complicated. Therefore, its calculation speed is slow, and the operation time increases as the number of predicted frames increases.
Time-consuming statistics table of network prediction
During the ping pong flight, when the ball table collides, the ball’s centroid height z is equal to the ball’s radius r. At this time, the error of the collision point only needs to consider the two dimensions of x and y. The prediction at the collision point can more typically reflect the accuracy and adaptability of the prediction method. For the same test data set, the statistical results of the collision point error of the proposed method and the comparison method are shown in Table 4. The MSE in the table represents the mean square error between the predicted value and the true value; RMSE represents the root mean square error between the predicted value and the true value; Bias represents the average offset between the predicted value and the true value; SD represents the variance between the predicted value and the predicted average value. Among the above indicators, MSE and RMSE reflect the accuracy of the prediction. The smaller the two values, the higher the accuracy of the prediction. Moreover, Bias reflects the extent to which the predicted value deviates from the true value as a whole. The smaller the value, the closer the predicted value is to the true value. SD reflects the degree of dispersion of the predicted value. The smaller the value, the larger the predicted value is, and the probability that the predicted value deviates significantly from the true value is smaller, and the adaptability of the prediction method is better.
Error statistics table of collision points
It can be seen from the above figures and tables that the prediction value obtained by the method based recurrent neural network proposed in this paper have smaller deviation values and the data aggregation degree is better. Moreover, its prediction accuracy and adaptability in the x and y dimensions are greatly improved compared to the prediction method based on the continuous motion model. Figure 9 is a more visual representation of the error distribution of predicted drop points on the test data set for the method of this paper and comparison method. In the figure, each point represents the error between the predicted value and the true value in one test. Among them, the green point is the prediction result of the research algorithm, and the red point is the comparison algorithm prediction result.

Distribution image of collision point error.
The traditional physical model and trajectory prediction method for motion modeling by force analysis based tennis ball are often not well adapted to table tennis in a variety of rotating states, and the accuracy is limited by the model. This paper does not rely on prior knowledge and proposes a method of table tennis trajectory prediction based on recurrent neural network. Moreover, by constructing a large number of nonlinear LSTM elements to approximate the high-order nonlinear table tennis motion process, the long-term trajectory prediction of flying table tennis in various rotating states is realized. The experimental results prove the prediction accuracy of the proposed method and meet the real-time requirements of the table tennis robot’s visual perception system.
Table tennis is regarded as a spring model, and the collision process between table tennis and the table is theoretically analyzed and mathematically derived. The study found that the collision between table tennis and the table is a continuous gradual process. The support for table tennis is a sinusoidal function, the maximum amplitude is linearly related to the incident velocity before the table tennis collision, and the correlation coefficient depends on the physical properties of the table tennis. At the same time, the collision period between table tennis and the table is a constant that depends only on the physical properties of the table tennis and is not related to the state of motion of the table tennis before the collision. Then, we use the ultra-high-speed camera design experiment to observe the collision process of table tennis and the table with different incident speeds, and further analyze and verify the relationship between the collision period with the maximum amplitude of the support force and the motion state according to the observation results.
In the research of table tennis trajectory prediction, this paper builds a single-input, single-output recurrent neural network, which allows the network to predict only the space position of the next frame. This structure streamlines the network size and improves real-time prediction accuracy. However, it makes the prediction of long-term trajectory need to adopt the method of iterative calculation, and makes the error accumulate in the iterative process, which affects the long-term observation accuracy to a certain extent, and also increases the computation time of the network long-term prediction. In the future research, we can try to build a multi-input, multi-output recurrent neural network, let the network directly learn to predict long-term trajectories, thus saving computation time and further improving predictionaccuracy.
The neural network based ping-pong positioning and trajectory prediction method proposed in this paper does not depend on any prior model. Moreover, the built-in identification positioning network and the internal parameters of the trajectory prediction network are all trained by the data set and belong to the non-model open framework method. In addition, these two networks are built using the open source platform toolkit, which is highly modular and easy to change the hierarchical structure. Therefore, it is only necessary to use the corresponding data set with rich object features for retraining, and then supplement with the necessary hierarchical structure adjustment. The method proposed in this paper can be applied to other similar fields, such as the identification and trajectory prediction of other balls such as tennis.
The overall perception of the environment is the main task of the table tennis robot visual servo system, and also the basis for the robot to play table tennis. The work of this paper solves the robot’s perception of rotating table tennis, including accurate recognition, precise positioning and long-term trajectory prediction. However, table tennis is an interactive movement, and the robot needs to have the ability to perceive the opponent, including opponent motion recognition, intention recognition, target prediction and so on. This work has not been carried out in this paper and further research is needed.
Conclusion
Based on the neural network method which has better adaptability and generalization to complex environment, this paper studies the real-time recognition, location and trajectory prediction of rotary flying table tennis based on the platform of table tennis humanoid robot system. Moreover, in table tennis, in order to achieve a shot, the end must have a certain speed in the Cartesian space. Therefore, it is necessary to study a Jacobian matrix of six degrees of freedom to convert the hitting speed of the racket to the joint space. In this paper, 1800 tracks are randomly selected to form the training data set, and another 200 tracks constitute the test data set to train and test the above network. At the same time, in order to better compare the influence of different step sizes on the accuracy of network convergence to select the step size that is most suitable for trajectory prediction tasks, this paper chooses three different step sizes to train the network. The research shows that the recursive neural network method proposed in this paper has a smaller deviation from the true value and better data aggregation. Moreover, the prediction accuracy and adaptability of the method in the x and y dimensions are greatly improved compared with the prediction method based on the continuous motion model.
Footnotes
Acknowledgment
This work was supported by Topics of Guangdong Sports Bureau: analysis of the technical and tactical characteristics of Guangdong table tennis player Liu Shiwen’s main opponents in the Rio Olympic Games. Number: GDSS 2016133.
