Abstract
The traditional sports match analysis mostly adopts the method of manual observation and recording, which is not only time-consuming and laborious but also has the defects of subjectivity and inaccuracy in the judgment results, resulting in the deviation of the match data analysis and statistical results. The purpose of this paper is to study an artificial intelligence system that can automatically analyze and evaluate the effect of both sides in volleyball matches. In this paper, the system is divided into two steps: detection and tracking of moving objects, recognition, and classification of players’ behaviors and movements. About moving target detection and tracking, this paper proposes a moving target fast detection framework based on a mixture of mainstream technologies and a MeanShift target tracking method based on Kalman filtering and adaptive target region size. For behavior and action recognition and classification, this paper proposes a classifier combining BP neural network and support vector machine. Experimental results show that the proposed algorithm and classifier are effective. By analyzing the performance of the proposed classifier, the classification accuracy is 98%.
Keywords
Introduction
As one of the three major sports in the world, volleyball is not as promising as football and basketball in recent years. Therefore, the popularity and development of Volleyball in the world are relatively slow [1]. Moreover, there is a certain gap between the old and new volleyball players. With the improvement of the competition level, the training method that only relies on the intuition of the coach cannot play an effective role in the improvement of the competition level [2]. The robot analysis system is more accurate and real-time than the human eye, which is a new intelligent analysis technology [3, 4]. Based on the low-level processing of video image sequence, we can further investigate the nature of objects in the image and their relationship, to obtain the understanding and explanation of the layout of both sides in a volleyball match, and thus provide guidance and plan according to the action of the match [5, 6].
Although the rapid development of computer technology, volleyball match detection system and monitoring equipment in the functional application and performance improvement have been rapid development, the traditional monitoring system has the passivity that humans cannot change or avoid [7]. Because of its weakness and human shortcomings, the development of a detection system is limited. The intelligent video analysis system helps users analyze the changes of video images and give early warnings of dangerous events under the condition of activity and real-time [8, 9]. Domestic and foreign research on motion video recognition has made important achievements in theory and method, but most of the related research aims to identify and apply specific objects in specific opportunities, and the number of research objects is usually no more than two, all for the study of most independent objects [10]. So far, there is no research report on Volleyball Players’ concentrated action [11]. Target detection is greatly affected by the changes in external conditions such as light and weather, which is difficult to detect in a complex animation environment [12]. Background motion itself has been the subject of detection. If unknown targets, deformation and rotation of targets and unknown direction of target motion are detected, all existing algorithms cannot deal with them well [13]. It is difficult to consider both real-time performance and accuracy. There are also methods with good detection effects, however, there are too many calculations to meet the real-time performance, and the methods meeting the real-time detection cannot meet the accuracy [14]. In the case of target tracking, due to the tracking process, parallel movement, rotation and zooming of the target action, the target action becomes complex and difficult to track. The collected frame is easy to be affected by complex background, noise, occlusion and other environments, especially the difficulty of occlusion being used as tracking, which will cause the obstacle of tracking reliability and practicability [15]. Also, the research focus is to extend the tracking of a single target to the tracking of multiple targets at the same time, to avoid the mutual influence of targets [16]. At the same time, the tracking process also needs to meet the needs of real-time processing. Therefore, it has important theoretical value and social benefit to study the formation effect of both sides in a volleyball match and the detection theory, method and key technology of players’ hitting action, to develop the volleyball match video intelligent detection system [17]. Given the above reasons, this study intends to take volleyball game video as the object, study the accurate extraction algorithm of volleyball game’s formation, and use the formation effect of each game to carry on the key technology of intelligent analysis of player’s action, including the design of the overall scheme of the system, propose the collection method of the video sequence, through the establishment of physical characteristics constraints of volleyball, combined with image analysis and pattern recognition technology, To achieve volleyball target detection, tracking, players hitting behavior detection, classification and recognition and other key technologies [18, 19].
In this paper, the volleyball game intelligent analysis system is divided into two steps, the detection and tracking of sports targets, the recognition and classification of players’ behavior and actions. As for the detection and tracking of moving objects, this paper proposes a fast detection framework of moving objects based on a mixture of various mainstream technologies and a mean shift target tracking method based on the adaptive target area size of Kalman filter [20]. For the classification of behavior recognition, this paper proposes a classifier combining BP neural network and support vector machine. The experimental results show that the proposed algorithm and classifier are effective.
Literature review
Bobick and Davis use motion energy image (Mei) and motion history image (MHI) to describe the motion of objects in an image sequence [21, 22]. Polana and Nelson use the characteristics of two-dimensional mesh to identify the movement of objects [23]. First, the optical flow field between consecutive frames is calculated, and each frame is decomposed along with the X and Y directions on the spatial grid. The amplitude of each cell is accumulated to form a high-dimensional feature vector for recognition. To regulate the duration of the motion, they assume that the trajectory of the moving object is periodic, and finally use the nearest neighbor algorithm to identify the behavior of the moving object [24]. Template matching technology has the advantages of low computational complexity and simple implementation, but it is sensitive to the changes in noise and motion time interval [25]. Based on the statistical decomposition of ball dynamics at different levels of abstraction, Bregler proposed a comprehensive ball motion recognition network [26, 27]. The recognition process is completed by the HMMs of the maximum posterior probability. Although the state space method can overcome the shortcomings of template matching, it usually involves complex iterative operations and low efficiency [28]. Hong proposed a new method to generate natural language description of athletes’ behaviors in real-time video images: firstly, the head region representing the whole human body is extracted from each frame of the image, and its three-dimensional pose and position are estimated by the model-based method; Then, these parameter trajectories are decomposed into single motion primitives, and the conceptual features such as the change degree of posture and position of each primitive, the relative distance between each primitive and other targets in the environment are evaluated, and the most appropriate words and other semantic elements are selected. Finally, the natural language text of athletes’ behavior description is generated by using machine translation technology [29]. Remaining et al. proposed an event-based visual monitoring system, which monitors basketball and players [30, 31]. The system can describe the dynamic activities of objects in the 3D scene.
Fang Shuai, Cao Yang, Wang Hao and others proposed a new method of behavior understanding, proposed the architecture of behavior recognition in video monitoring and used a hierarchical statement model to model behavior [32]. The model is easy to modify, expand, extend and describe. At the same time, they also proposed an automatic action recognition algorithm based on the Bayesian network [33]. In this algorithm, the target characteristics are connected to the different levels of action, and the Bayesian causality model which can analyze and infer the action of moving target is established [34]. The system construction consists of three parts: scene model, action model and action recognition model. Xiong Rongyan studies the gray-scale characteristics of video sequence images, and combines the advantages of background subtraction target detection, proposes a fast detection and tracking method based on the eigenvalue under the condition of the still camera [35, 36]. To solve this problem, based on analyzing the difference between the features of sequence image and the gray features of target and background, we are studying the detection and tracking algorithm based on the eigenvalue [37]. The principle of the algorithm is to search the moving target according to the matching of target shape and gray feature after the moving target is detected by background difference [38]. According to the track tracking of volleyball, some typical parameters representing the moving target in the image are investigated. The invariants, gray mean value and gray variance mean value of the shape feature are significantly improved.
Target detection tracking and behavior classification algorithm
Target detection and tracking algorithm
(1) Target detection algorithm
In this paper, a moving target detection framework under dynamic background is proposed, which combines a variety of methods such as epitope geometry and LK optical flow, and carries out real-time detection of moving targets with the frame sequence captured by mobile video mining equipment. The detection process of this detection framework takes the Harris Angle of the previous frame as the first step. The second step is to calculate the displacement vector of the corner point relative to the subsequent frames based on the pyramid layered LK optical flow method, to obtain the corresponding point of the corner point in the subsequent frames. In step 3, the set of point pairs formed from the Angle points and their corresponding points is used as a training set, and the RANSAC algorithm is repeatedly executed to estimate the accurate and stable basic matrix, and the basic matrix is used to estimate the moving target from the Angle point set. Angle points can be classified. Finally, the obtained foreground Angle sets are quickly clustered to form a moving target region for each type of inspection.
The corner response function is defined as:
The optical flow field equation is shown in Equation (2):
Where ∇I (x, y, r) represents the spatial gradient of the image grayscale, V is the velocity field at the image point (x, y), and I i (x, y, t) is the time domain differential of image grayscale.
The basic matrix estimated by the RANSAC algorithm is shown in equation (3).
E is the eigenmatrix, and M is the camera internal parameter matrix.
(2) Player tracking algorithm
Mean Shift multi-target tracking method based on Kalman filter is a method to track the moving target detected or selected in the video sequence captured in the process of camera movement in real-time. The color histogram of the kernel function is used as the search feature, and when the video acquisition equipment moves during the shooting process, the background of the frame sequence collected will move, the target edge will be blocked, and the target deformation and target rotation will occur. It’s hard not to track. The bata-Zakariya coefficient is used as a similarity function to represent the similarity between the candidate region and the target region, and the candidate region corresponding to the maximum similarity value is searched. Due to the fast convergence of the MeanShift algorithm, the algorithm repeatedly calculates the MeanShift vector and converges to the maximum similarity region, that is, the reallocation of the target. Similarly, in this paper, by associating the MeanShift vector with the image moment, we can calculate the moment, increase the length and width of the target area and adaptively transform the target to obtain the target window quality and target direction. Adjust to the size of the target area.
(1) Common behavior recognition and classification methods
1) Artificial neural network
Under the construction steps of BP neural network:
Neural network initialization. The main purpose of this phase is to set the initial value in the threshold value of all neurons in the neural network and the weight matrix of the adjacent layer neurons. The input layer has n neurons, the middle layer has p neurons, and the output layer has q neurons. The connection weight matrix of two adjacent layers of neurons in the input layer, hidden layer and output layer is represented by W ij , V jt . The threshold values of the hidden layer and the output layer are denoted by θ j , γ t respectively.
Training sample and label making. Select enough samples as training samples. The main task of this phase is to create labels for each training sample. This is to determine the desired network standard output when entering the network for each sample.
Training for the forward propagation of data, it is assumed that equation (4) in the input model of A
k
= [x1, x2, . . . , x
n
] the KTH training sample obtained the input s
j
of the JTH hidden layer neuron.
By selecting the appropriate activation function, equation (5) represents the commonly used activation function, and this activation function can calculate the output b
j
of each neuron in the hidden layer.
The computation process from hidden layer neuron to output layer neuron is consistent with that from the input layer neuron to hidden layer neurons. The input & of the t th neuron in the output layer can be calculated by the formula (7). Similarly, after selecting the appropriate activation function, the output c
t
of the t-th neuron in the output layer can be calculated by the formula (8).
Error matrix and weighted matrix and threshold adjustment. According to the label of the training sample, suppose the label of the training sample is Y
k
= [y1, y2, . . . , y
q
], that is, the expected output result of the t-th neuron in the output layer is y
t
. The correction error of the neuron can be calculated by the formula (9).
According to the correction error of the neurons in the output layer, the coupling weighted matrix and threshold value between the neurons in the adjacent layer are adjusted by formula (11) and formula (12).
Here, α represents the learning rate of the neural network in the training stage, and N represents the training times of the neural network.
2) Support vector machines
A support vector machine (SVM) is an algorithm based on statistical theory. Support vector machines were originally used to solve binary classification problems. This method has obvious advantages in small sample nonlinear and high dimensional identification. In this way, even low-complexity models can improve learning. In other words, the adaptability of the model to the sample is improved.
There exists a sample set x i , y i , i = 1, 2, . . . , k, x ∈ R n , x i represents the ith data of the sample set as a set of feature vectors, and the dimension of the feature vector is n. y i represents the ith data category of the sample set, and the value range is {–1,1-}. K is the size of a set of samples, the total number of samples. The purpose of SVM classification is to find the most suitable classification hyperplanar wx + b = 0, w ∈ R n , b ∈ R in the high dimensional feature space. Through this, the classified data can be divided into two separate partial spaces. The classification of data is accomplished by dividing the data into different subspaces. According to the definition of the optimal classification hyperplane, solving the problem of the optimal classification hyperplane in high dimensional space can be transformed into a restricted 2-time planning problem. There are two constraints. The first is to ensure the correct classification and recognition of samples in the high dimensional feature space. Second, all the sample points are farthest from the classification hyperplane.
The mathematical expression of the optimal classification hyperplane is as follows:
(2) Common classification and recognition algorithm of BP neural network support vector machine
Considering the advantages and disadvantages of the above classifiers, a BP neural network support vector machine classifier is constructed for classification and recognition. The construction process of the composite classifier is shown in Fig. 1.

The construction process of the joint classifier.
Data collection
The test data set used in the experiment was selected from the data set of the visual tracking baseline. Four groups of video sequences with background movement, the target being locked, sudden change of motion state, deformation of target and rotation were selected as the test group. People are people1 of Hopkins155 data set, and Car is car1-car7 of Hopkins155 data set. The images are all 640 by 480 in size.
Experimental environment
(1) Hardware platform
HiSilicon Hi3531A is used as the main chip in the analysis system of volleyball games. Hi3531A is a professional SOC chip developed for multi-channel SD and HD DVR products. The chip is powered by an ARM A9 dual-core processor and an h.264 video coding engine. Chip integrated with several video and image processing engine complex algorithms, performance is relatively high. This chip has a rich peripheral interface that supports native HDMI/VGA HD display. This kind of back-end video processing chip realizes high performance and high-quality video quality solution. The physical picture of the haisi Hi3531A chip is shown in Fig. 2.

The physical picture of HiSilicon Hi3531A chip.
(2) Software environment
This paper uses C++ development language to realize the software design of intelligent identification and analysis system for volleyball matches. C++ as a programming language, its characteristics are inheritance and polymorphism, can be object-oriented process development. It has the characteristics of practicability, robustness, and robustness. The platform on which the software runs is Hi3531A. The specific software environment is shown in Table 1.
Software environment
The intelligent analysis system of the volleyball match layout effect is mainly composed of four parts: camera equipment, video capture card, computer and software system. The intelligent analysis system of the volleyball match layout effect is shown in Fig. 3.

The intelligent analysis system of the volleyball match layout effect.
In the actual live game broadcast, the camera is usually set up 10 meters behind the volleyball field, 5 meters above the ground, and the Angle between the camera and the horizontal plane is less than 15°. At this time, the video obtained is conducive to the video processing and analysis in the later stage. Therefore, the same setting is adopted in this design. Start the camera and get the sequence of video frames. The captured video frame sequence is sent to the video capture card at the same time, input to the computer, processed and analyzed by the software system, and the result is output by the computer.
(1) Target detection algorithm test results analysis
In this paper, we used the proposed algorithm and two comparison algorithms of Sheikh and Yi to make a statistical measurement of the measurement results of the four test sets. Calculate the total number of targets, missed and error checks for each series in each algorithm, and calculate the missed and error rates. When the missed detection rate and the error detection rate are considered equal, the multi-objective detection accuracy (MODA) is 1 minus the false alarm rate and the error detection rate. After obtaining the statistical data of these three measurements, we provide a convincing reference value for the accuracy of the moving target measurement of each algorithm. The test results of the target detection algorithm are shown in Table 2 and Fig. 4.
Test results of the target detection algorithm
Test results of the target detection algorithm

Target detection algorithm test results.
As can be seen from Table 2 and Fig. 4, for four test sets, the algorithm proposed in this paper is compared with the test result statistics of the two algorithms. From the data collection and processing of these four series, the algorithm proposed in this paper has a miss rate of less than 1%, an error detection rate of less than 2%, and a detection accuracy of more than 97%. The detection rate of the Sheikh Algorithm is 4% or less, the error detection rate is 10% or less, and the detection accuracy is 89% or higher. However, the missed detection rate of the Yi algorithm is less than 1%, the error detection rate is less than 10%, and the detection accuracy is higher than 89%. In general, the Yi algorithm has the lowest miss rate, while the algorithm proposed in this paper has the lowest error detection rate, and Sheikh Algorithm has the highest error detection rate.
(2) Test results analysis of player tracking algorithm
Using the tracking algorithm and two comparison algorithms proposed in this paper, the tracking results of four test data sets are counted, and the total number of targets in each series and the tracking error number, error test number and mismatch number of each algorithm are counted, and the error rate, error test rate and mismatch rate are obtained. The accuracy of multipurpose tracking is equal to 1 minus miss, false positives, and mismatches. After obtaining statistics on the tracking performance evaluation metrics for each series of algorithms, we evaluated the tracking strengths and weaknesses of each algorithm in each scenario with reference values that are generally persuasive. Therefore, the accuracy of the tracking moving target can be obtained. The test results of the target tracking algorithm are shown in Fig. 5.

Target tracking algorithm test results.
As can be seen from Fig. 5, the CSRT algorithm can track the target very well, and the tracking algorithm proposed in this paper can also track the target very well. However, the KCF algorithm can track the target lost after the target is intercepted in a large proportion, and the correct tracking cannot be recovered. The tracking accuracy is very low. Move to the volleyball sequence of the background, the goal in the process of rapid movement, rotation, deformation, the proposed tracking algorithm, and the CSRT is pretty good, when no deformation, tracking the target rotation, especially in the process of great changes have taken place in the target scale, CSRT algorithm tracking appear large deviation, may cause the failure of tracking, causing low tracking precision; The Car sequence has a complex background, and although it performs multi-target tracking, each algorithm can perform the tracking with high precision. From the perspective of overall data, the tracking algorithm and CSRT algorithm proposed in this paper can perform more accurate tracking more reliably than the CSRT algorithm, and the CSRT algorithm has higher overall tracking accuracy than the tracking algorithm proposed in this paper.
(1) Target detection and tracking algorithm processed frame number test result analysis
In this experimental environment, the frame throughput per second when processing the four data sets is shown in the implementation of the detection and tracking algorithm. The result of the frame number test processed by the detection and tracking algorithm is shown in Fig. 6.

Test and track algorithm processing frame number test results.
As can be seen from Fig. 6, although the algorithm proposed in this paper has a higher number of frames per second than the Yi fast detection algorithm, the number of frames per second processed by this algorithm and Yi fast detection algorithm is much higher than that of Sheikh Algorithm. The detection processing time of this algorithm is less than 0.2 s per frame, and the detection speed is fast. It can be seen from the experimental results that both the algorithm proposed in this paper and the Yi algorithm can process the video sequence in real-time, while the Sheikh algorithm cannot meet the requirements of the real-time video sequence. The throughput of the KCF tracking algorithm depends on whether the target size changes and fluctuates greatly, while the throughput of the CSRT tracking algorithm depends on the number of targets. The throughput rate of both the tracking algorithm and KCF algorithm in this paper is greater than 11 frames per second, which meets the real-time requirement and is much higher than the throughput rate of the CSRT tracking algorithm.
(2) Analysis of target behavior recognition and classification test results
To verify the performance of the classifier, the BP neural network, support vector machine and BP neural network-support vector set are used to classify the classifier. Thus the effectiveness of the proposed ensemble classifier can be verified. The results of the target behavior recognition and classification test are shown in Table 3 and Fig. 7.
Target behavior recognition and classification test results

Target behavior recognition classification test results.
As can be seen from Fig. 7, the BP neural network, support vector machine and BP neural network-support vector set classifier constructed in this paper can effectively recognize the specified training actions, with the correct recognition rate reaching more than 94%. In the BP neural network - support vector machine proposed in this paper, due to the time efficiency of classification, the time of using the integrated classifier proposed in this paper is slightly increased, and the time of feature extraction and classification for each case reaches 76 ms. But in the end, it achieved a 98 percent discriminatory classification effect, showing the best results of the three classifiers. The combination of the BP neural network and SVM can effectively classify and recognize these actions. Experimental data prove the effectiveness of the proposed ensemble classifier.
In this article, to avoid processing all pixels, the extracted angles are used as image features. Based on the background Angle points and geometric constraints in the continuous frames shot by the mobile camera, the RANSAC algorithm was repeated to improve the accuracy of high-speed detection of moving target Angle points, and the moving target area was gathered at high speed to obtain real-time detection of moving target. The experiment verifies the expected effect of the detection framework proposed in this paper and shows the advantages of the target detection algorithm proposed in this paper.
In this paper, a Mean Shift real-time tracking method based on Kalman filter adaptive target region size is proposed, which makes up for the shortcomings of the existing target tracking algorithm. In this paper, the experimental results show that the target tracking accuracy of this algorithm is high, and it can meet the real-time processing requirements at the same time.
