Abstract
With the rapid growth of Internet video image information, there is a large amount of redundancy in image data. Use less data stream information to transfer the image or the amount of information contained in the image. Its purpose is to reduce the redundancy of images, so as to store them at low bit rate and reduce the data storage space. In the general image compression method, the hybrid coding framework is adopted. Each algorithm adopts a fixed algorithm mode through a specific design algorithm, without global optimization. Image compression is mainly divided into prediction, transformation, quantization, digital entropy coding and other steps. At present, there are many researches on super-resolution network based on deep learning technology. The main function is to reconstruct high-resolution image replace image magnification low-resolution images such as linear interpolation, which has a great performance improvement image resolution, noise reduction, deblurring and so on, but there is no effective way to use super-resolution network applications to improve quality of compression reconstructed image quality. This paper involves a new method that using image super-resolution residual learning network to improve quality of compression image, our method, the reduced image is encoded into a content stream and a transmission corresponding parameter is encoded into a model stream. Firstly, the original image is scaled down 1/2 size of source image, then encode the small image into content stream with the existing codec. Secondly, the residual learning super-resolution (SR) network is used for image filtering to scale up reconstructed image with decode image resizing method and boost the quality of edge feature extraction of image. Our results show that there is significant performance improvement of h265 in low resolution reconstructed image (bits-per-pixel less than 0.1).
Introduction
At present, deep learning plays an important role in the field of image and speech. Whether in image recognition or super-resolution, deep learning has become an important technology in image research. Also, the combination of deep learning and image compression technology has also become a research hotspot recently [1]. With the interest of live broadcast and short video, the demand for image compression increases, but the technology used to compress and transmit images remains basically unchanged. image compression is mainly divided into prediction, transformation, quantization, entropy coding and other steps [2], the image compression method adopted the hybrid framework long ago, and the follow-up is also based on the original modification, but these technologies are still in use. Current mainstream image compression standards, such as VVC (H.266), HEVC (H.265) standard. Each step is optimized by manually designed algorithms [3]. Recent efforts [4, 5]. In order to improve the quality of reconstructed image and the efficiency of coding and compression, there are many researches using deep learning technology to boost the efficiency of image compression. At present, there are three main methods of image compression using deep learning technology. In the first method, one or more modules of the general codec, such as intra prediction, inter motion vector, etc., are replaced or enhanced by convolutional neural network. For example, in [6], HEVC’s inter prediction algorithm uses convolutional neural filtering algorithm to improve reconstruct the image quality. The second method is to use the post-processing technology of deep learning technology to improve the quality of image reconstruction, so as to improve the quality of decoded image, such as [8, 9] where remove artifact of compression image and improve the reconstructed image quality and remove artifact. The third method adopts an end-to-end complete image compression method [7]. At present, image deep learning super-resolution has achieved good performance in improving image resolution and image denoising. Because image super-resolution is the conversion from low resolution to high resolution, it is not suitable for the current image compression methods. Our proposed to combine the image super-resolution with the traditional image compression. The original image is reduced by 1/2 of the image to obtain a low-frequency image, and then encoded with a standard codec (such as h.265) to obtain the content stream. Because different scaling methods and image quality will affect the reconstructed image quality, we encode the image scaling parameters and the image quality scene as model stream. In this way, the reconstructed image quality can be further improved through RDO. In the image super-resolution network, our proposed by optimizing the network structure and using 1
We have made the following attempts and innovations as below:
We use the method of combining super-resolution with traditional video compression to adapt to super-resolution by scaling down the original image. In addition, the super-resolution network in this paper is a typical lightweight, residual learning network. The model focuses on learning residual output, which greatly helps improving image quality. The super-resolution network is used to improve image resolution and subjective quality of the image, but each network is only for the increase of specific quality low-resolution image. Because the output reconstruction image quality is decided by quantized value (QP), the performance of the super-resolution network will deteriorate when use the same QP configure in codec, Therefore, a super-resolution network parameter model is trained for each QP value to adapt to the applied low-resolution image to high resolution image transformation and improve the image compatibility. Because proposed SR network is a typical residual learning networks, different image scaling methods have a certain impact on the image filtering performance, the image scaling methods method is obtained by RDO in encode coding end, and the scaling methods parameter data is encoded into bitstream. At the decoder, the image quality is improved according to the best image scaling method.
The remainder of this paper is are arranged as follows: Section 2 reviews the related research work of image compression and deep learning. Section 3 describes our proposed compression method and network architecture. Section 4 reports the structure and parameter configuration of convolutional neural network, model training and prediction results are introduced. Section 5 concludes our work and make improvements in the future
Our proposed image compression framework.
There are many images codec standard such as JPEG, WebP, BPG and VVC. These traditional image compression methods adopt hybrid coding framework, mainly removes the redundancy in the image using manually designed algorithms including prediction, transformation, quantization, entropy coding. Some of the latest hybrid coding frame image compression methods use the technology in the video coding standard, WebP adopts VP8 standard video codec technology and BPG adopts HEVC (h.265) standard video codec technology. The intra prediction technology based on angle prediction is introduced. The prediction residual adopts integer discrete cosine transform or integer discrete sine transform. After the quantization coefficient is binarized, the context adaptive binary arithmetic coding is adopted. The image codec compresses the image with one block as unit, one block size may be 16
Therefore, this paper exposes a method to combine the existing traditional image codec algorithm with super-resolution neural network algorithm. The algorithm includes two parts of bit stream: content stream and model stream. In one part, the small image is encoded with the reduced image to obtain content stream data, which is scaled down 1/2. In the other part, the image super-resolution network based on neural network is used to filter and interpolate the decoded image, and the reconstructed image with the size of the original image is obtained, the interpolate method information is encoded as model stream. This method can improve the image quality at low bit rate. After decode the reconstructed image, the convolutional neural network is used to interpolate and filter the image to improve the image quality. In this way, it is compatible with the original image compression codec and improves the image compression efficiency.
The proposed compression methods
The overall architecture of proposed compression image methods
Our architecture of image compression framework as be shown in Fig. 1. Our proposed compression framework to combine the image super-resolution with the traditional image codec, use the traditional video encoder to encode low-resolution image to obtain the content stream, we encode the image scaling parameters and the image quality scene as model stream. Firstly, when input RGB data, we transfer to yuv420 data, and get the small image by scaling down the original image, then encode the small image to bitstreams with H265 encode as content stream. Secondly, because different quantized value (QP) and scale up method affects image quality, these different parameters can be use in super-resolution convolutional neural networks and encode these parameters as model stream with entropy coding.
Content stream: firstly, the original image is scale down 1/2 to obtain the encoded image, and then the H.265 video codec is used to encode the small image to generate a bit stream. Image reconstruction can be carried out at the decoding end. Because the image transmits pixel content information, it is called content stream.
Model stream: because the quantized value (QP) and scale up method that the decoder uses to up sample the small image will affect the image quality. So, we encode the image scaling parameters and the image quality scene as model stream. In the encoder, the model stream can be got by rate distortion optimization (RDO) which compare different upscale method and QP value and encode the parameter by entropy coding to improve the quality of image encoding. In encoding process, image scaling up methods parameter have been calculated by rate–distortion optimization (RDO), at the same time, the image scaling up methods parameter which encode as model stream which can effectively improve the image quality.
According to the information of decode model stream, a small fraction of the less computation resources model parameters super-resolution convolutional neural networks has been applied to improve the reconstructive image quality.
Traditional image codec introduce
As shown in Fig. 2, the propose image compression framework include transform coding, quantization coding, intra prediction, deblocking filter, entropy coding as same as other traditional codec architecture. The image compression can be finished by video codec H265, which use the I frame as image compression. The original image is RGB data and get the small RGB data by scaling down. Then the small RGB format image is Converted to yuv420 data, so we can use H265 codec encode the small yuv420 to content stream.
Traditional codec compression framework.
The codec can be spited in two major parts: 1) encoder, 2) decoder. Firstly, Image is partitioned in to coding tree unit (CTU) from 16
After the original image is scale down 1/2 to obtain the encoded image, and the H.265 video codec is used to encode the small image to generate a bit stream, it is called content stream. Image reconstruction can be carried out at the decoding end, and then the super-resolution network is used to filter and enhance the reconstruction.
As shown in Fig. 3, our proposed super-resolution image model is CNNs architecture, if the reconstructed frame input to super-resolution convolutional neural networks model, we can get better quality frame as output frame, the reconstructed frame is scale up twice and output is the same width and height as the original image, therefore, the super-resolution convolution neural network can reduce image noise and provides better image reconstruction details.
Super-resolution data flow.
There is a QP parameter setting in H265, the larger the QP value, the better the reconstructed image quality; The smaller the QP value, the worse the reconstructed image quality. Our proposed train a super-resolution model according to the QP valueï¼under different image quality, the trained network model can provide the adaptability of the model and reduce image noise, better obtain image edge characteristics and improve image reconstruction details. In the encoder, the original image is scale down to a small image, which is encoded by h265 coding to obtain the content stream and the content stream can be decode as reconstructed small image. The reconstructed small image from content stream and the original image will be used to form a train dataset to train a super-resolution network. Super-resolution convolutional neural networks have two important steps: up-sample the reconstructed small image and residuals learn reconstruction network. Because different up-sample method can lead to different reconstructed image quality, our proposed choose the best up-sample method in the encoder by RDO( rate–distortion optimization), its general purpose is to obtain as little image distortion as possible at the lowest possible coding rate. At this time, the coding efficiency of the encoder is the highest. R is indicate the image distortion, which PSNR between the original image and the encoded reconstructed image. D is indicate the coding code rate, by selecting different coding parameters, quantization parameters, and the number of motion vectors, reference frame numbers, prediction residual values, and other overall coding data that need to be transmitted in the final prediction mode. By calculating the PSNR value of each different scaling method mode, the corresponding code rate mainly refers to the code rate required to represent the corresponding scaling method, the nearest neighbor interpolation, linear interpolation, cubic spline interpolation, regions interpolation, four methods can be binary coded as 00, 01, 10 and 11 respectively. The up-sample method and QP value parameter are encoded as model bitstream by entropy coding to improve image coding efficiency.
In this paper, our super-resolution network is composed of convolutional neural network, and through the optimizing network structure, the network runs faster and can better adapt to low complex equipment. The image features can be extracted through the network. This network is named feature extraction network. When the features are extracted, in order to update the restored image, the reconstruction network is used to restore the image details. Each layer of network is composed of CNN weights, biases and non-linear layer. All the outputs of the hidden layer in the reconstructed network are jump connected.
Two parallelized CNNs A1 and B1, B2 are used to reconstruct the image details after concatenating all the features. 4 channels image have been output by the last CNN layer L1, the four-channel image will be up sampled to restore to the original image size, here is the residual detail image. The reconstructed image is been decode from content stream. Finally, the final image is obtained by adding the image extracted by convolutional neural network and the input reconstructed image. This network focuses more on the extraction of image edge features, which makes it easier to recover image details.
Super-resolution network architecture.
The content bitstream decoded image is defined as
This network is mainly used to extract features. Local and global features are extracted by convolution neural network of the original decoded image. Connect layers and assemble all features. The network- consists of 7 filter layers, and each of layers output feature number: 16, 14, 12, 11, 10, 9, 8. In order to prevent over fitting, we also use the parameterized ReLU units as activation units. Since the size of the output image is 4 times that of the reconstructed input image, the final output layer of the convolutional neural network is the size of the reconstructed input image, but it has 4 channels of data. Rearranging the pixel values will get the final output.
Image detail reconstruction network
Because the previous feature extraction network adopts 3
Model training
Learning mapping function
Where
Experimental setup
All the experiments were computed on images from BSDS200 [20]. The networks were trained solely on the merged train and test part of BSDS200 which contain 200 images. The images were transformed to yuv420 using the YCbCr colour model by keeping the luma component-Y only. Specifically, we use images compressed with the QP: 28, 32, 36, 40, 44, 48, and 52.
We train the image by NVIDIA Tesla T4 16 GB machine. At the beginning of training, the learning rate is large, and the corresponding small batch use is also small. It can quickly carry out model training and obtain optimized parameter data. In the later stage of training, a small learning rate is used to fine tune the model, and over fitting is prevented by setting different image sizes and enhancing the data. In the training stage, SET5 [21] data set is used to evaluate the performance of the model. For testing, we use kodak [23] datasets, as shown in Fig. 5.
Some image in dataset kodak.
We have trained a set of model parameters for each scene with the setting H265’s QP: 28, 32, 36, 40, 44, 48, and 52ï¼after get the reconstructed small image from content stream, we can use the super-resolution network model parameters according to QP, which decode from model stream.
Our results show that there is significant performance improvement of h265 in low resolution reconstructed image (bits-per-pixel less than 0.1), as shown in Fig. 6. Bitrate is Number of streaming media bits transmitted per second, the higher the bit rate, the higher the image quality. Bits-per-pixel(bpp) is used to represent the compressed domain, that is, the number of bits/total pixels occupied by the picture after compression, in each QP test scenario, bpp is obtained by calculating and testing the size of all image encoded bitstreams divided by the pixel value of all image size. Here we use PSNR as evaluation criteria for compression image reconstruction quality.
We test the performance in kodak datasets with 24 images including three test ways: (1) same size decodes indicate that without down samples the original image, and only encode and decode the source image, calculate the PSNR between the image and the original image. (2) Bicubic scale down and up indicate that after the image is first down samples and then up samples the image, calculate the PSNR between the image and the original image. (3) We propose to use the content stream to decode the reduced reconstructed image, then use the super-resolution network to enlarge the reconstructed image to obtain the final image, and calculate the peak signal-to-noise ratio between the final image and the original image.
Proposed reconstructed image quality boosts.
In this work, we present combine traditional Image compression method with image super-resolution, by transforming the original image from high resolution to low resolution, the super-resolution network can be compatible with the traditional image compression methods. At the same time, in order to adapt to different image scenes and image quality, this paper also designs a parameter transfer model. Adopt rate distortion optimization of parameters at the encoder, which can more effectively improve the image quality. Experiments show that, this method can significantly improve the quality of reconstructed image in low resolution reconstructed image. At the same time, the network parameters are small, so it can be popularized and used. For future work, we’ll focus on optimizing the model, establish a lightweight model, and improve the efficiency of image coding at high bit rate.
Footnotes
Acknowledgments
The authors acknowledge Foundation for 2022 Basic and Applied Basic Research Project of Guangzhou Basic Research Plan (research on video compression algorithm based on dual neural network, Grant: 202201011753), Young Innovative Talents in Department of Education of Guangdong Province, China (Grant: 2019GKQNCX043), Sichuan Science and Technology Program (Grant: 2021JDRC0063), Special Projects in Key Fields of Colleges and Universities in Guangdong Province, China. (Grant: 2021ZDZX3040), 2022 Annual Scientific Research Project of Guangdong Polytechnic of Science and Technology (Grant: XJPY202205), 2022 Annual Scientific Research Project of Guangdong Polytechnic of Science and Technology (Grant: XJPY202206), Research on Classified and Accurate Training of Higher Vocational IT Talents based on Education Big Data Under the Background of Enrolment Expansion (Grant: 2021GXJK714), Supported by Innovative Research Team in Universities of Guangdong Province of China (Grant: 2021KCXTD079).
