Abstract
At present, the majority of sports games video adopts MPEG image technology, and MPEG video compression is the current more mainstream approach. After compression, the quality of the video will decline, and other practical problems. However, the existing detection methods of sports video scene conversion, when dealing with MPEG compressed video, are not ideal, often appear the phenomenon of missing detection and wrong detection. In order to solve this problem, this paper proposes a detection method of sports scene conversion on MPEG compressed video based on fuzzy logic. Introducing fuzzy logic into the detection method of video scene conversion is the highlight of this method. Firstly, this paper preprocessed the video image according to the Convention. In this paper, the recognition of image features and specific extraction methods are introduced in detail, and the extraction algorithm of image color features is further optimized. For the design of the detection method, the main innovation is to fully combine the fuzzy logic and macroblock information. In the existing detection methods, different detection schemes are given for the abrupt change of video scene and the gradual change of scene. Finally, in order to verify the actual effect of the detection method in this paper, an experimental analysis based on the keyframe complexity detection method is established. After a number of experiments including the experimental results of scene transition, analysis, and processing time, through the analysis of data, a step-by-step proof of this method has good accuracy and recall.
Introduction
With the significant improvement of multimedia compression technology and computer performance [1], as well as the rapid development of the broadband network, multimedia information systems including MPEG digital video has been more and more widely used. Because MPEG is the mainstream format of dynamic video, this lossless compression has obvious advantages, but at the same time, we need an efficient scene transformation detection method to adapt to it. Such as fast forward and fast backward manual search, so it’s a lot of trouble and time-consuming. It has been unable to meet the needs of the multimedia database, users often want to give examples or role descriptions, and the system can automatically find the required video clips. Because of this, scene conversion detection based on MPEG compressed video has become a research hotspot in this field. Scene transformation detection is the first step of video content analysis and research, and also the first problem to be solved. Scene transform detection can be realized in the pixel domain and compression domain. Because scene transform detection is carried out in the compression domain, this advantage can reduce the computation of encoding and decoding, and make the detection more simple and accurate [2]. Therefore, at present, many scene transformation detection algorithms are carried out in the compressed domain. In video coding, the macroblock type information in the compressed domain is easy to extract and calculate. In the existing research, several kinds of scene transformation detection algorithms based on macroblock type information in the compressed domain are proposed. Generally, scene transformation detection based on macroblock type information is realized by comparing the number of macroblocks before and after two frames or setting a certain threshold value [3]. In the actual detection process, error detection and missing detection are easy to occur. It is very important to find an effective adaptive scene transformation detection algorithm.
Video detection is a detection process that the system can automatically find the required video clip points as long as the user gives an example or a feature description of the video clip to be detected [4]. On the one hand, video data is generated by camera shooting and some compression processing [5], and there is no structure division based on human understanding in the text data; on the other hand, the basic content of video data is physical signals such as images, which contain a lot of complex physical information. Unlike text data, video data can be used for simple content extraction and content comparison. Therefore, in order to achieve video detection, it is necessary to study the structure of video data, divide the video data into independent video units according to its structure, in order to obtain the structural characteristics of video data and video. Each structure unit, that is video scene transformation detection technology in content-based video detection and detection [6–8].
In this paper, based on the research of macroblock type information, a method of detection of sports scene conversion in MPEG compressed video based on fuzzy logic is proposed. The core of this technology is to combine the fuzzy logic with the existing macroblock information detection technology, and comprehensively use the macroblock type information of B frame and P frame to detect the adaptive scene transformation. The algorithm uses a median filter to filter the noise, and sliding window detection can effectively locate the scene transformation position. While improving the detection rate, it also greatly shortens the detection time. The application of fuzzy logic in sports video scene detection can optimize its calculation method, skip the tedious detection steps, and achieve the effect of fast, convenient, and high accuracy. In addition to the above functions, the keyframe complexity detection method is compared, and the experimental data further shows that the detection method in this paper has a higher detection rate.
The content of this paper is arranged as follows: in the second part of this paper, we first introduce the relevant contributions of predecessors, as well as the existing technical problems of various detection methods, and the optimization methods of this paper. The third chapter introduces the basic theory of video feature extraction. In the fourth chapter, according to the shortcomings of the previous algorithms, a method of sports video scene detection based on fuzzy logic is proposed, which realizes the classification of mutation and gradual change in two different environments. The fifth chapter compares the advantages and disadvantages of this method and the traditional method through the experimental part. Chapter six summarizes the advantages and practical application value of the algorithm in this paper.
Related works
Scene transform detection plays an important role in video retrieval, video summarization, target tracking, and video management. In the past few decades, scene transform detection algorithms have been widely studied and proposed. These algorithms can be divided into the pixel domain and the compression domain. In the pixel domain, a video editing model for shot boundary detection is proposed by using a template matching method, edge-based method, model-based method, and histogram detection algorithm [9]. The disadvantage is that the modeling process is very complex, and each mutation type needs to be modeled. Li Fei [10] proposed a method of scene mutation detection combined with the SIFT algorithm. The disadvantage of template matching is that it is very sensitive to lens or object motion, which is strictly limited to pixel position. Noise and object motion will increase the difference between frames, resulting in wrong scene transformation detection. The effect histogram method often unsatisfactory. Because of the color similarity, there are often errors in detection. The main reason is that the method is to distinguish through color data, and it is easy to ignore the differences in structure. Therefore, it is necessary to encode and decode the existing video to detect scene transformation in the pixel domain, which leads to a large number of operations, which is not conducive to the real-time processing of the video. At present, scene transform detection is mainly implemented in the compression domain, especially in the MPEG video stream, the motion vector and DC coefficient can be used for scene transform detection in the compression domain.
At present, the existing macroblock information detection methods have the limitation of the feature threshold. Due to the lack of consideration of all information features of macroblocks, many errors and omissions appear. Jiang Xing put forward [11], for MPEG video, the statistical features of selection and truncation errors are proposed in the process of intra coding, and the features based on macroblock mode are obtained in the process of intra coding. The last feature is to provide a powerful detection function by connecting these two groups of features. However, due to multiple peaks in the video sequence of scene mutation, the frame of scene mutation cannot be located accurately [12]. In the scene overlay, the correction formula of detecting scene mutation is used, and the number of bidirectional prediction macroblocks in frame B is not considered, so the start frame and end frame of scene gradient cannot be located accurately. At present, there are many types of research on video scene transformation detection methods, but there are some technical problems. In order to solve these problems, this paper proposes a method based on fuzzy logic to detect the conversion of the sports scene in MPEG compressed video.
Video feature extraction
Introduction to MPEG standard
MPEG is the abbreviation of a dynamic expert group. The expert group was established in 1985 to develop video and audio compression standards. MPEG organization was initially authorized to develop various “mobile image” coding standards, and then extended to “and accompanying audio” and its combination. Later, due to different application requirements, the limitation of “for digital storage media” was lifted, and now it has become the mechanism to develop a “real-time image and audio coding” standard. So, MPEG has become the most popular and widely used compression technology in the field of video compression. With the development of the Internet and broadband, MPEG technology is more and more used in various fields.
Introduction to MPEG standard
To some extent, the content of the image is almost related to color. Color features are pixel-based features [13]. A color histogram is the most frequently used color feature. Color histogram can describe the proportion of different colors in the whole image, but it cannot describe the specific objects in the image. RGB color space and HSV color space are commonly used in color space. Histogram intersection, center distance, and distance are commonly used feature matching methods. A color model is also called color space, including RGB, HSV, lab, and luv, and so on. Among them, RGB and HSV are two commonly used color spaces. Choosing the right color space plays an important role in feature extraction. Here is a brief introduction to these two color spaces [14, 15].
(1) RGB color model
RGB color model is a widely used color model, which consists of R, G, and b values related to color brightness. The values of these three colors range from 2 to 255. The three axes of the RGB model are R, G, and B respectively. The three vertices on the coordinate axis represent R, G, and B respectively, and the remaining three represent complementary colors of the three primary colors.
(2) HSV color model
Among all kinds of colors, HSV color is closer to human cognition of color than RGB color.
(3) RGB to HSV conversion
RGB color space and HSV color space are used in common digital images, and then two color spaces need to be converted in order to use the color histogram for calculation.
Suppose the RGB color space has a set of color values (r, g, b), where r, g, b ∈ [0, 1, 2, …, 255]. And the color value (h, s, v) in HSV color space, set v’ = max(r, g, b), defined as r’, g’, b’:
Therefore, RGB color space is converted to HSV color space formula:
Among them, h ∈ [0, 360], s ∈ [0, 1], v ∈ [0, 1].
Moment estimation is a simple method of histogram quantitative analysis, which mainly uses the mean and variance of mathematical statistics to analyze the histogram curve. Each evaluation parameter has its own characteristics. Select one or several parameters according to the actual application [16–18]. The evaluation parameter formula is as follows:
(1) Mean:
(2) Variance:
(3) Twist:
(4) Kurtosis:
Where: k i represents the gray value i = 0, 1, 2, ⋯ , N - 1 corresponding to the gray level, and N is the gray level. f (k i ) is the probability value of a normalized gray histogram of gray value k i .
After the first step of shot boundary detection, there will be many redundant frames in the video shot. By selecting keyframes to eliminate these redundant frames, the amount of data in the video index can be greatly reduced and the retrieval efficiency can be improved. Keyframes EXTRACTION, because they represent the same features, in a static way, the key content of the video. In general, people use the conservative principle to extract keyframes, instead of choosing whether they are right or not, they choose multiple keyframes. At present, the popular key frame extraction methods are based on the short boundary, motion analysis, image information, PGF, shot active frame, and video clustering [19].
Scene boundary detection
Scene boundary detection is also called scene segmentation. The steps of video scene boundary detection are as follows: after detecting the shot boundary, extract keyframes according to the shot content, combine similar shots into shot classes according to certain rules, and finally analyze the correlation between shot classes to complete the video scene boundary detection. In the process of video scene boundary detection, people mainly understand the video content at the semantic level, so the recognition of automatic semantic feature extraction affects the effect and quality of video retrieval to a certain extent [20].
Video scene detection method of fuzzy system
Basic thought
The existing scene switching retrieval methods can be divided into two parts: the compressed domain-based and the uncompressed domain-based. The results show that the compressed domain based detection method is better than the uncompressed domain-based detection method. The common detection method is to extract useful features directly from the compressed data for detection, such as the DCT coefficient, DC coefficient, the motion vector, and fuzzy logic, which greatly reduces the computational complexity. Combining the characteristics of MPEG video data and sports video, we find that the fuzzy logic method is most suitable for scene switching retrieval based on MPEG sports video [21]. Therefore, this chapter mainly analyzes how to use fuzzy logic to detect scene switching.
The basic theory of fuzzy logic
In 1968, Zadeh introduced a fuzzy set theory to represent or process data and information containing nonstatistical uncertainty. The theory is proposed for the mathematical representation of uncertainty and fuzziness and provides a powerful tool for dealing with the inaccuracy of many problems. Fuzzy logic provides a form of reasoning, which enables approximate human reasoning ability to be applied to systems with unknown or uncertain models [22]. The fuzzy reasoning system is a mathematical framework based on fuzzy set theory, fuzzy if - then rule, and fuzzy reasoning method. It has the ability to deal with uncertain and inaccurate information. The fuzzy inference system is mainly composed of three parts: fuzzy, fuzzy rule base, fuzzy inference, and de fuzzy. The system diagram is shown in Fig. 1:

Structure of the fuzzy reasoning system.
The sudden change of video scene switching usually refers to the conversion of a video scene in a short time, which involves the process of scene switching between P frame and I frame, and between BF frame and Br frame. In this process, considering the macroblock information of P frame and B frame, this paper divides the scene sudden conversion into three types: the frame before P or me, the frame before BF, the br frame before BF, the br frame before BF, and the br frame before BF.
(1) The relationship between the video frame and macroblock
1) If the scene switches before P frame or I frame, most of the macroblocks in the PR frame are in I mode, because the PR frame is the first frame of the new scene. In B-frame pairs (BF and Br frames), macroblocks are mostly in F mode because they are more like Iframes than PR frames.
2) If the scene is switched before the BF frame, since the PR frame is the first reference frame of the new scene, a considerable number of macroblocks in the PR frame are in I mode.
3) If the scene switches before the br frame, many macroblocks in the PR frame are in I mode. In addition, the macroblock of the BF frame is mostly in F mode, and that of the BR frame is mostly in B mode.
(2) Solutions
We find that the macroblock information of various sudden change scenes has its own characteristics. According to the location of scene switching, the macroblock style also has its own characteristics. According to these characteristics, we propose the following processing methods: because of the conversion of a P frame and B frame, there are 7 cases in total, as shown in Table 1.
Decision rules of scene mutation type
Decision rules of scene mutation type
According to the observation, in the following 5 modes (210440740270), it is the scene mutation point.
A scene transition gradient is a slow transition from one scene to another without an obvious scene jump [23]. In the related work, a forward macroblock rate is proposed to detect the scene transition gradient. However, due to the lack of careful consideration, the scene is not subdivided into transition sections, and no other supplementary conditions are adopted, the test results are not ideal (the recall rate is only about 55%). Through the observation and analysis of scene mutation information in motion video, it is found that the scene transition gradient in motion video is variable. According to the characteristics of sports video, we divide scene transition gradient into two types, one is short shot transition gradient, which only appears in one frame of the image, the other is a long scene transition gradient, which will last for a period of time and be reflected in some frames.
(1) The relationship between scene gradient and macroblock information
1) Short scene transition gradient
In the short scene transition gradient, the macroblock of I mode is dominant in the P frame or B frame, which can be seen from the macroblock type statistics. Although this feature is similar to scab and Scarb, in fact, the number of macros in this mode varies greatly. In the short scene transition gradient, the number of I-mode macroblocks is significantly higher than scab and Scarab. In other words, when the short scene transition gradient occurs, the frame image type is 1, and when scab and Scarb occur, the frame image type is 7.
2) Long scene transition gradient
Under the long scene transition, the macroblock information shows certain time-domain distribution characteristics. For example, in B frame, most macroblocks tend to bi-mode. We find that in the long gradient interval, the backward macroblocks of each frame have certain characteristics. We use the backward macroblock rate (BMBR) to describe these characteristics. The BMBR is set as follows:
Where num f is the number of macroblock information; num b is the number of macroblocks in mode B in each frame of the image.
(2) Solutions
Based on the above analysis of two types of scenes of information gradient transition of the macroblock, we intend to detect the use function of short-term transition gradient lens, which has a large number of macroblock modes, i.e. frame B or frame P. When the image type is 1, there is a short scene switch. The advantage is that the complex calculation of the BMBR value in the B frame is avoided, so the calculation amount is reduced. This method not only reduces the amount of calculation but also puts forward the accuracy of detection because it controls the amount of information from the source.
Experimental results and analysis of scene mutation
In order to illustrate the performance of the proposed algorithm, the proposed algorithm is compared with the proposed algorithm, and many scene mutation algorithms have been proposed. Considering that the algorithm proposed in this paper mainly uses the type of information of fuzzy logic to detect scene mutation, this reference object is the keyframe complexity detection method. The algorithm proposed in this paper is algorithm 1, and the keyframe complexity detection method is algorithm 2. Two algorithms are detected in the same video library.
It can be seen from Table 2 and Fig. 2 that the detection performance of the algorithm in this paper is at least 10 percentage points higher than that of algorithm 2 in a single video library or in the whole, and the recall rate and precision rate can reach more than 90%. The data show that the proposed fuzzy logic based scene detection method can detect the scene mutation accurately and effectively.
Scene mutation test results of different algorithms
Scene mutation test results of different algorithms

Analysis chart of scene mutation experiment results of different algorithms.
From the analysis results of Table 3 and Fig. 3, it can be seen that the results of the fuzzy logic-based detection method in this paper are significantly better than the traditional detection method, and each detection is more than 80%. The main reason for error detection is the change of color or light of some video clips, which will also lead to the change of macroblock type information, as well as the change of stack, fade, and fade. The main reason for missing is the sliding window, median filter, and set threshold. From the above analysis, it can be seen that the algorithm proposed in this paper can effectively detect the gradient of scene start and end frame position, and compared with method 2, the algorithm proposed in this paper has obvious improvement in error detection and missing detection, greatly reducing the error detection rate and missing detection.
Scene gradient experimental results of different algorithms
Scene gradient experimental results of different algorithms

Analysis diagram of scene gradient experiment results of different algorithms.
In order to calculate the processing time of this method, it is necessary to calculate the playback time of the original Mplayer and the processing time of the keyframe complexity detector.
From the experimental results in Table 4 and Fig. 4, it can be seen that the processing time of this system is very short, and the average processing time of four experiments (video over 400 minutes) is 0.28 ms. Therefore, this method can detect scene switching in real-time and is suitable for motion video with strict processing time requirements. Compared with method 2, this method also has a great improvement in processing speed, which is more suitable for the field of motion video while reducing processing time.
Comparative analysis of video processing time of different algorithms
Comparative analysis of video processing time of different algorithms

Comparative analysis of video processing time of different algorithms.
The experimental results of scene detection and comparison based on frequent shot sets show the effectiveness of the algorithm. The comparison experiment is to test the correlation when the threshold value is 10-1000 in video library 1, video library 2, and video library 3.
From the experimental results of Table 5 and Fig. 5, it can be seen that the accuracy and recall rate of the method in this paper are better than that of method 2, and the F value is more than 35% higher than that of method 2. There are 57, 63, and 64 scenes in the three video libraries. In the detection process, compared with the NCuts stage of the keyframe complexity detection method, this method directly uses the scene number as the parameter input for detection. The results show that the video detection method based on the fuzzy system is better than the experimental results on the dataset.
Comparative analysis of scene detection of frequent lens sets with different algorithms
Comparative analysis of scene detection of frequent lens sets with different algorithms

F-value analysis chart under frequent shot set scene detection of different algorithms.
With the development of network and intelligent technology, people put forward higher requirements for dynamic video. Sports video is one of our most common dynamic videos, which usually uses live broadcasts to show us the wonderful game situation for the first time. Because the objects tracked by sports videos are often dynamic targets, and the scene conversion span is large, which is a major test for our existing scene conversion detection technology. How to realize the scene conversion detection of sports video based on MPEG compressed video is one of the research hotspots in the field of video. In view of the current situation, this paper proposes a video scene detection method based on fuzzy logic. This method combines the fuzzy logic with the existing detection methods and optimizes the algorithm and steps of the traditional detection method through the fuzzy theory. The core of this method is to use the macroblock type information to transform the scene, and to transform the color model in the process of image feature extraction.
