Abstract
The current music education information has certain information loss in the real-time transmission process, which leads to poor educational effect. The research content of this thesis is based on FPGA video image acquisition and processing system. At the same time, this research mainly uses FPGA as the platform to realize and simulate video acquisition, transformation, storage, display and transmission. This research solves the problem of long-distance real-time transmission of high-definition video stream, and in order to improve the subjective quality of the recovered video at the receiving end, the algorithm processing is added to reduce the blockiness phenomenon in the video displayed at the receiving end. Finally, the validation test was designed to validate the research perspective. The experimental results show that the system works stably and realizes the functions of video image acquisition, conversion, display and transmission, and achieves the design goal. However, due to time constraints and other conditions, the entire design needs further improvement, so that it can provide a basis for the development of subsequent music education information communication technology.
Introduction
Compared with traditional education, online education is a new educational model that can break through the limitations of time and space, help people learn anywhere, anytime, and let more learners share excellent educational resources. At the same time, online education is characterized by openness, interactivity, collaboration, and autonomy. Therefore, more people, especially those who cannot study on campus, can receive higher education, which makes online education the preferred form of lifelong education [1].
With the gradual improvement of living standards, people’s demand for music education is increasing. However, compared with general education, music education is more limited by geographical conditions and teachers. Multimedia online education can provide more people with the opportunity to receive higher music education and provide better learning conditions.
However, music network education requires high-fidelity audio and video courseware to spread smoothly in the network as well as advanced services [2]. In view of the current development of music network education, the biggest bottleneck is the limitation of bandwidth and technical means. Jingfang Yuan, dean of the School of Modern Distance Music Education at the Central Conservatory of Music, once suggested that remote means can’t solve the problem of music teaching. The sound and image distortion transmitted under the existing bandwidth is serious, and the accuracy of word of mouth cannot be achieved. However, with the expansion of home broadband from the recent popular 512 K to the current l M, 2 M, or even 4 M, and the network education technology is constantly improving, server performance is rapidly strengthened, music network education is facing an excellent development opportunity. Therefore, we should make full use of the development of Internet technology and push the music network education to a practical and fully satisfying position of teaching purposes [3].
China’s application of music education information network digital transmission system still has many shortcomings, which causes China to lag behind developed countries in digital transmission systems. However, in the power system dispatching, civil aviation management and other areas continue to use foreign transmission equipment, which is far from the Chinese manufacturing 2025. Moreover, in some civil and other occasions in China, the optical network interface is mainly used to realize the networked digital transmission system [4], which inevitably brings cost pressure and tedious maintenance work. Through the history of video transmission and the status quo at home and abroad, it can be seen that the quality requirements of high-definition video are not only higher and higher in China, but also the requirements for the transmission distance and delay of high-definition video streams are becoming more and more demanding. Therefore, the transmission of high-definition video streams has broad application prospects [5]. Based on this, this study applies SVGA video transmission technology to music education information dissemination, aiming to improve the real-time transmission effect of video and promote the development of computer technology in the education industry.
Related work
One of the main purposes of video coding is to reduce the correlation between video images in the time domain and the airspace while ensuring a certain quality, and to reduce redundancy between video sequences to achieve high compression efficiency. Based on Shannon information theory and digital signal processing, predictive coding, transform coding, and statistical coding methods, the correlation between video time domain and spatial domain is well removed, which forms the basis of the current universal coding standard. In the more than 50 years of development of image compression coding technology, significant progress and research results have been achieved not only in theoretical research but also in practical applications. Especially with the development of computer and network technology, image coding technology has also developed rapidly in the past decade. At present, there are two major series of video coding standards in the world. One series is the H.26X series of video coding standards developed by the International Telecommunication Union (ITU). It is mainly used for real-time video communication applications such as video conferencing and video telephony. The other series is the MPEG-X standard developed by the International Telecommunications Union and the International Electrotechnical Commission (ISO&IEC). It is mainly used for the storage of video data, network transmission of broadcast television and video streaming [6].
In the practical application of video communication services, real-time video transmission has a wide range of applications. However, because the Internet is based on a “best effort” service model, end-to-end video real-time and transmission quality cannot be fully guaranteed. Research on how to improve the quality of real-time video transmission over the network faces many challenges, such as available bandwidth, latency, packet loss, and code stream fluctuations. Available bandwidth: The Internet does not provide reserved bandwidth for video transmission applications, and traditional routers do not participate in network congestion control. Therefore, when the available bandwidth is lower than the encoding rate of the transmission bandwidth, video quality cannot be guaranteed. Delay: Video communication uses real-time services. Although the codec has a certain caching capability, in real-time systems, the cache is generally small. Moreover, the end-to-end delay should also be tightly controlled (typically within 500 ms), but when the delay exceeds the cache capability, late packets will be discarded by the receiver, affecting image quality. Packet loss: Packet loss is the root cause of image quality degradation. Since media packets are interrelated and different data have different importance for reconstructing images, even a small amount of packet loss (such as the loss of I frame data) may cause the decoder to actively discard other related data packets, resulting in a significant reduction in image quality [7]. Rate fluctuation: When the video is played by the receiver, a stable code stream is required. When the video code rate fluctuates greatly, the decoder will actively drop packets to ensure that the video quality is consistent before and after, which will also cause image quality degradation. The IPv4 protocol did not take into account the needs of real-time transmission services at the beginning of its development. Although many improvements and extensions have been made in the future, it is still not suitable for real-time multimedia video communication. To transmit real-time video information over an IP network, the characteristics of the video transmission and the existing network protocols must be fully studied, and the appropriate transmission protocol should be selected in combination with the video characteristics. A notable feature of multimedia applications is the large amount of data, and many applications require higher real-time performance. The traditional TCP protocol is a connection-oriented protocol, and its retransmission mechanism and congestion control mechanism are unbearable in real-time transmission systems [8].
RTP is a way to transmit real-time data of Internet. The application layer protocol of the real-time streaming media at this protocol is specific to the application. In addition, the Internet Expert Task Force also specifies the standard (RFC) for combining RTP with specific media, that is, the RFC2190 standard for H.263 and the RFC2250 standard for MPEG-1/2. In the RTP encapsulation process of the MPEG video stream, the protocol copies the MPEG stream feature information such as time stamp and coding hierarchy into the RTP message header and parses and encapsulates the MPEG video to the slice layer. Therefore, the terminal system can select or use error control and error concealment methods according to the error in the RTP message header analysis transmission [9]. In addition, the Internet Expert Group has developed the RFC3984 standard for H.264 transmission in combination with the coding characteristics of H.264. A typical application of RTP is built on UDP. RTP itself only guarantees the transmission of real-time data and does not provide a reliable transmission mechanism for packets transmitted in sequence, nor does it provide flow control and congestion control. Moreover, it also needs to rely on RTCP to provide these services. RTCP is responsible for managing transmission quality and exchanging control information between current application processes [10]. By periodically transmitting RTCP data packets to the sender, the sender can dynamically change the transmission rate to adapt to network changes by using the number of transmitted data packets in the data packet, statistics of lost packets, and the like. The combination of RTP and RTCP optimizes transmission efficiency through effective feedback and minimal overhead and is especially suitable for real-time transmission of video [11]. Since subjective standards and methods are already perfect, for multimedia communication services, it is also necessary to have communication-related quality evaluation standards. To this end, in measuring the video quality of the multimedia communication service, the following evaluation criteria are often used to measure the subjective visual effect of the received video. Image hopping: refers to the phenomenon that the motion between images is not smooth, and there is a phenomenon similar to “fast forward". The cause may be: packet loss caused by network congestion, and the packet loss or frame loss or frame rate introduced by the encoder is limited by the fixed code rate (CBR). Blockiness: Blockiness is a phenomenon that can occur with all compression based on DCT technology [12]. The main reason is the transmission of errors. Since the DCT transform is performed on the entire block, an error will cause an error in the result of the entire DCT transform. When the encoding end pursues a high compression ratio encoding, it also causes a certain degree of blockiness. Blur: Blur refers to the blurring of the image caused by the loss of high-frequency detail of the image. The reason may be that the encoder is actively introduced to adapt to the fixed bit rate. In addition, transmission errors and packet loss also cause blurring [13].
Network congestion control
With the development of Internet and multimedia technologies, real-time video transmission has become one of the important applications in the network. However, the video transmission delay and packet loss caused by network congestion have a great impact on the video transmission quality. The most essential reason for congestion during transmission is that the user’s demand for resources in the network is much larger than the upper limit that the resource can provide. Since a large number of users share various resources in the network and the scale of Internet usage has soared in recent years, the probability of congestion is greatly increased. The cause of congestion is attributed to the following two points:
(1) Network transmission bandwidth capabilities are limited. According to the Shannon sampling theorem, the transmission capacity of the channel must be greater than or equal to the rate of the transmitting end. When the channel capacity is less than the rate of signal transmission, it is impossible to achieve accurate transmission. Therefore, the bandwidth bottleneck link is formed at the low-speed link of the network. When it cannot meet the bandwidth requirements of all the senders and receivers, the system will be congested [14]. (2) Too small storage space. In the transmission process of video streams, there is often a shortage of space, so a large amount of information will be abandoned during the transmission process. Assuming that there is a burst video stream in the system, the discarding phenomenon is more serious. To some extent, the problem of dropping packets can be solved by increasing the buffer space. If there is a router with unlimited storage capacity, the packets queued and forwarded in the transmission system may time out, thereby increasing the storage capacity of the router. This will not reduce the burden of transmitting the video stream, but the congestion will become more serious. The transmission of the Internet is a complex system, and there are many difficulties in ensuring the real-time transmission of video from a control perspective, such as, the dynamic variability of the transmission environment; the difficulty of network modeling; the blindness of which strategy is used for control. Despite this, experts and researchers working on control have made great efforts in the analysis and design of control theory and optimization theory and the ability of multimedia to transmit in real time in the network. Among them, the differential model of TCP/AQM given by the researcher Misra thoroughly studied how to model the congestion control model. Based on this model, on the Internet source side, the queue real-time input length is used as input, and various control strategies are designed to adapt to the transmission of complex and variable video streams in the network [15].
Although the RED algorithm can avoid the global synchronization phenomenon and effectively improve the link bandwidth utilization and reduce the average queue length, there are still many problems to be solved. For example, when congestion occurs under a given traffic condition, it will cause a steady error and delay jitter of the packet queue length. Under different flow conditions, the following assumptions must be met to establish a fluid model [16]. Solving these problems can ensure the real-time performance of video transmission and the decoding quality of the receiving end. The algorithm can effectively reduce the network delay and reduce the packet loss rate and improve the video transmission efficiency based on the effective estimation of network congestion. Dynamic network characteristics are judged. Under this assumption, the following equation is used to describe the control mechanism [17]:
The change in queue length can be described by a differential equation as:
Among them: W (t) is the window size of the TCP stream, q (t) is the queue length, T p is the transmission delay, R (t) is the round-trip transmission delay, C is the link capacity, N is the number of TCP links. A block diagram of the video fluid model can be drawn according to Equations (1) and (2), as shown in Fig. 1 [18].

Block diagram of the video fluid model.
In the literature [7], assuming
Among them, R0 = (q0/C + T p ), N (t) = N0.
At the equilibrium point, the local range is defined:
Among them, W R = W (t - R).
From Equations (5) and (6), we can obtain:
So:
Among them,

Linearization block diagram of the congestion mechanism.
H1 is denoted as the transfer function of δp to δW, and H2 is denoted as the transfer function of δW to δq [19].
A block diagram of the flow-controlled transfer function model can be obtained according to Equations (15) and (16), as shown in Fig. 3.

Block diagram of the transfer function model.
As can be seen from Fig. 3, the transfer function of the entire controlled system is:
For the derived model, it can be seen from Equation (17) that the system model is second-order and all poles are located in the left half-plane. If the system model is accurate, the classic PI control can make the system converge to the expected value quickly. However, in the actual transmission system, each parameter may change randomly. Therefore, in the process of designing the control strategy, it is mainly considered whether the algorithm can quickly adapt to changes in network parameters, whether it can have self-adjusting ability and anti-interference ability when the network parameters change, and whether it can quickly return to the steady state value [20].
The system adopts two-dimensional integer DCT (discrete cosine) transform as the algorithm of intraframe compression, but the distortion of the image recovered by the receiving end is still serious. According to the conventional coefficient quantization process and the reconstruction process, it is known that it is mainly caused by the coefficient quantization in the two-dimensional integer DCT transform process. The coefficient quantization process and reconstruction process of the classical integer DCT transform are as follows:
In the formula, F
Q
(u, v) is the quantized two-dimensional integer DCT transform coefficient, F
D
(u, v) is the two-dimensional integer DCT transform coefficient before quantization, Q (u, v) is the quantization step size of the quantization process, F
R
(u, v) is the reconstructed two-dimensional integer DCT transform coefficient, and round (*) is the rounding function. Then, the error of the two-dimensional integer DCT transform coefficient quantization is:
Before the two-dimensional integer discrete cosine transform is used, there is still a large degree of correlation between adjacent pixel blocks in the original single-frame picture content. However, the DCT transformation process breaks the correlation between adjacent pixel blocks. In addition, since the quantization is to simultaneously quantize the entire pixel block (8*8) in the image content and has independence, the quantization errors are not related to each other. Therefore, if the quantization error between adjacent pixel blocks produces a small jump, the original smooth texture will fluctuate at the boundary between adjacent pixel blocks, that is, a square effect phenomenon will occur. As shown in Fig. 4, the edge jump of the reconstructed image of the receiving end gives the human eye a poor visual effect. In addition, considering the limitation of the limited bandwidth of its physical transmission channel, image distortion is inevitable. Therefore, it is necessary to take the necessary quantitative optimization processing techniques to preserve the original information of the image as much as possible.

Comparison of the effects of two-dimensional integer DCT compression before and after partial magnification.
As can be seen from Fig. 4, when the image is partially enlarged, the human eye can clearly see that the visual effect of the partial enlargement of the reconstructed image is poor. The reason is that extremely limited transmission bandwidth and low latency transmission are contradictory to the real-time transmission of big data. Therefore, low-latency transmission of high-definition data streams must be effectively compressed for high-definition video stream data sources. However, the two-dimensional integer DCT transform greatly destroys the correlation between the original image pixel block and the pixel block while the domain transform process of the image is performed. In addition, with the conventional quantization processing of the converted DCT coefficients, this will cause a phenomenon in which the reconstructed image of the receiving end is more severely distorted.
At present, there have been many algorithms for reducing blockiness. Some experts and scholars have theoretically analyzed and verified the causes of the block effect phenomenon and reduced the block effect in the image content by iterative calculation method. However, this method is not suitable for practical application engineering. Despite the high-speed parallel processing capability of the FPGA, the complexity of this method increases exponentially with the number of pixel block elements in the image, so that the number of iterations of a single pixel block (8*8) is greatly increased, which is very difficult to achieve with the low latency transmission required by the design. Adopting the filtering of the restored image to reduce the blockiness phenomenon is an implementation of the cyclic deblocking filtering algorithm. However, because the original image information has been lost, this implementation does not better reduce the square effect in a single frame of picture, especially in the real details of the single frame of picture content.
As can be seen from Fig. 5, there are four boundaries around the pixel block 1. If there is a jump in the quantization error of the neighboring pixel blocks of one of the strips, that is, the boundary error is discontinuous, a block effect will occur in the pixel block. According to the spatial redundancy of the image, the pixel quantization value changes little within a pixel block of the same color at a certain position in the picture. In addition, from the visual characteristics of the human eye, the main block effect comes from the edge of the single-frame picture content. Therefore, when the quantization error of the pixel block 1 is compared with the quantization errors of the other four pixel blocks respectively, if the average value of the quantization errors of the four pixel blocks is smaller, it is not processed. On the other hand, if the pixel block boundary error is large and the region may be prone to blockiness, it needs to be optimized and quantized. The sub-region quantization is actually to quantize the DCT coefficient re-divided region to reduce the quantization error of the current pixel block, thereby enhancing the continuity of the boundary error.

Distribution of pixel block 1 and adjacent pixel blocks.
The specific optimization process is as follows:
(1) After DCT transform and compression, DCT coefficients are obtained. (2) According to the image content edge fast detection algorithm, at the edge of the image content of the single frame, the easy jump region of the pixel block boundary error is found. (3) The n value of the fixed sampling matrix is taken to divide the DCT coefficient into two regions (upper left and lower right). Most of the energy in the image is concentrated on the DCT coefficient in the upper left corner, so the DCT coefficient in the upper left corner region is re-adjusted to the quantization method. The new quantization formula (21) and formula (22) are as follows:
Therefore, each DCT coefficient in the upper left corner region has two possible quantized values. This can form 2
n
different numbers of quantized matrix values, reconstruct each quantized coefficient matrix into
Through a lot of debugging and testing, it shows that although the phenomenon of blockiness can be reduced, the integer DCT transform belongs to the unitary transformation and it has the characteristics of energy conservation. At the same time, the inverse DCT conversion may cause a slight decrease in the signal-to-noise ratio of the image. Therefore, by weighing the influence of both on the subjective visual quality of the image and the influence of the DCT coefficient values of F(0,0), F(1,0), F(0,1), F(2,0), F(0,2), etc. on the block effect phenomenon, in order to better meet the requirements of the system, the selected value of n cannot be less than 7.
This chapter first gives a general description of real-time transmission of remote video, and then designs the software solution of the video transmission system based on this and describes the framework design of the overall software of each version of the remote video real-time transmission system. Finally, a general description of the functions of the common subsystem modules is performed, and the functions of the subsystem modules specific to each version are separately described.
Figure 6 shows the overall network topology structure of the remote video implementation transmission system. Based on this, the design simulation is performed. sys_top.v is the top-level module of the system, which mainly simplifies each functional module to realize the overall function of the system.

Overall network topology of the remote video real-time transmission system.
Sys_cfg.v is the system configuration file, which mainly includes SDRAM register configuration, coordinate position and content of OSD character display, and adjustment of chroma space conversion coefficient. Sys_rtl_tb: System Test Vector Module. It includes the test files of the main module. By writing a test file testbench, it can add excitation to the input signal and verify the correctness of the function by feedback of the output signal. Accurate and complete test vectors are the key to verifying the functionality of the system. Sys_bt_pattern: the file module of the BT.656 test vector. According to the BT.656 video stream defined by the BT.656 specification, the corresponding video content can be generated as needed during simulation and on-board testing. Sys_clk_dcm: Clock management module. The input 100 MHz clock passes through this module to generate a stable working clock. Among them, 100 MHz is used for SDRAM controller, 40 MHz is SVGA pixel clock and system logic clock, 27 MHz is BT.656 code stream clock, and 100 KHz is I2C working clock. Sys_decode: BT.656 decoding module. The single channel D1 format video stream and the 4-channel CIF format video stream can be decoded by the selection of the global signal sys_mode_sel. The decoding of any one of the four channels of D1 video stream can be realized by the selection of sys_chnal_sel. Sys_converison: color space conversion module. It implements color conversion from YCbCr4:2:2 to RGB. The conversion parameters can be modified through the configuration file to achieve different display effects. sys_sdram_ctrl:Image cache module. The folder not only includes the SDRAM controller logic, but also the front-end FIFO module, read-write arbitration module and address generation module. Sys_osd: Character display module. The video channel information is represented by 8×16 half-width characters on the display, and the display position and content can be conFig.d through the system configuration module. Sys_vga: VGA display module, which displays 800×600@60 Hz SVGA format, and VGA pixels are 16-bit 5:6:5 format. While outputting the line sync signal, the necessary digital-to-analog conversion signal is sent to the ADV7123. Sys_usbd: USB slave FIFO controller. At this time, the FX2 USB2.0 works in the SlaveFIFO mode, and the BT.656 video stream is transmitted to the USB host computer through the controller. Sys_model: System model module. It provides SDRAM simulation model and I2C slave model for system simulation.
Figure 7 shows the simulation waveform of the system. The signal in the waveform is the external interface signal of the system. At this point, the video source is provided by the bt656_pattern module. Therefore, bt_dat has no data in the waveform, and the burst read and write length of the SDRAM is 8. The RGB data in the VGA module outputs valid data when the line signals hsync and vysnc are valid, and the I2C signal line is set to high impedance in an idle state.

System simulation waveform.
The 100 M network cable or wireless channel is adopted as a physical transmission medium, so that the HDMI interface can be extended through a wired network or a wireless network. However, the HDMI interface rate does not match the rate at which the system is designed to use the transmission medium, and thus real-time transmission from point to point cannot be achieved. Therefore, the compression processing of the original data stream is necessary. Only by compressing a certain amount of data and processing according to the above algorithm, and adding a suitable hardware platform, the ideal extended HDMI hardware interface transmission video stream can be realized.
Test purpose: The first point is to realize long-distance transmission of HDMI interface video stream. The second point is to realize the real-time transmission of the low latency of the HDMI interface video stream. Test platform: The Zynq-7000 platform is adopted at the transmitting end and the Exynos-4412 platform is adopted at the receiving end.
The system consists of two parts: the sender and the receiver. In order to meet the market demand, the wired network or the wireless channel method is used to extend the HDMI interface, thereby realizing the transmission of the high-definition video stream with long distance and low delay. However, in order to solve a large number of transmissions, this study solves this problem based on the above theory, simulation verification, and verification of each module unit. The physical test results are shown in Fig. 8.

The rendering of the physical demonstration.
As can be seen from the above Fig., when the receiving end holds down the keyboard space, the receiving end will not continue to receive the data stream immediately. It can be seen from the display terminal that the visual effect of the human eye is relatively good, and the video is also very clear. Moreover, there is no obvious block effect at the edge of the picture content, which fully meets the market demand. The result of feature extraction on the image is shown in Fig. 9.

Feature extraction image.
In many cases, it is more useful to transmit video streams over wireless channels, such as concerts in some older buildings, or music evenings on cruise ships near the coast. In order to better meet the various occasions in the market, this paper uses the wireless channel to test the physical object, and the test results are shown in Figs. 10 and 11.

Transmission effect of complex background SVGA music video.

Feature map extraction of complex background SVGA music video.
As can be seen from the above Fig., the wireless channel is used to transmit the video stream. The channel is highly susceptible to external interference, which in turn affects the display effect of the receiving end. Considering that the receiving end display effect is the first principle, the resolution has to be reduced to 1600*900, and all the surrounding hotspots such as WiFi are turned off to avoid affecting the wireless channel for transmitting the video stream.
With the improvement of comprehensive national strength and the development of social economy, the traditional long-distance video transmission method has not met the new industrial or civilian demand. In some occasions or application requirements, long-distance transmission of video, such as remote video conferencing systems and old buildings without re-served fiber interfaces, is required. Especially in some industrial applications, there are more stringent requirements for low-latency transmission of long-distance video.
This research applies SVGA video transmission technology to music education information dissemi-nation to improve the real-time video transmission effect and promote the development of computer technology in the education industry.
The experimental results show that the system works stably and realizes the functions of video im-age acquisition, conversion, display and transmission. Therefore, the design goal is achieved. However, due to time constraints and other conditions, the entire system needs further improvement. The details are as follows: (1) The hardware circuit of the whole system is connected to the daughter board through the external expansion interface, which limits the data transmission bandwidth to a certain extent. Therefore, in order to realize more channels of video channels and higher resolution video streams can be displayed at the same time, it should work hard on the hardware circuit, and strive to make the circuit smaller and better. (2) Due to time constraints, the design of video correlation algorithms, such as 3A algorithm, target detection, and image filtering, can-not be performed on the FPGA platform. At the same time, if the TVP5158 is used in the design to support simultaneous input of four channels of video, it is also a good application in image panorama synthesis. (3) After receiving the video data on the PC host computer, this study only verified the correctness of the data through simulation. The next step is to implement the soft decoding and other processing of BT.656 video under the PC platform. In summary, the use of FPGA as a development platform for video image acquisition and processing has great advantages and can be widely used in various fields. Moreover, with the development of electronic technology, video image technology will have further development.
Conclusion
In order to solve the problem of clarity and real-time in the video transmission of music education information, from the perspective of video real-time and transmission, this study designs to use a 100-megabit wired network cable or wireless channel as a physical transmission medium to extend the HDMI interface. However, the HDMI interface rate does not match the rate of the transmission medium used by the system, and thus the point-to-point real-time transmission cannot be realized. Therefore, only by compressing a certain amount of data and processing the data according to the above algorithm and adding a suitable hardware platform can solve the above problem. The design solves the problem of long-distance real-time transmission of high-definition video streams, and in order to improve the subjective quality of the recovered video at the receiving end, algorithm processing and the like are added to reduce the blockiness phenomenon in the video displayed at the receiving end. At the same time, the design has no blurring phenomenon on the real details of the image and has practicality in transmitting video at a long distance and low delay.
Footnotes
Acknowledgments
This paper was supported by Ministry of education humanities and social sciences research project planning fund project. Theoretical and practical research on career education skill strategy (NO: 14YJA880019).
