Abstract
Variable bit rate encoded video bit streams are suitable for a wide range of high delay applications such as video streaming applications. In these applications, the visual quality of encoded video and the buffering constraint are concerned during encoding at the same time. In this paper, a fuzzy video rate controller for variable bit rate applications of the new high efficiency video coding (HEVC) standard is proposed. The proposed controller considers a given buffer size and a long-term target bit rate as controlling constraints. It provides a variable bit rate video bit stream by controlling the quantization parameter (QP) at frame-level. The bitrate of each frame is controlled by a fuzzy controller to minimize the fluctuations of QP and PSNR that leads to high visual quality. The proposed algorithm is implemented on the HEVC standard reference software (HM) and experimental results show that it can provide an average quality, in terms of PSNR, similar to the HM rate controller and constant QP case while the buffer constraint is completely obeyed. Also in comparison with the HM rate controller, the fluctuation of QP and PSNR are less in the proposed algorithm that means a higher visual quality.
Introduction
The latest video coding standard called high efficiency video coding (HEVC/H.265) has been proposed recently to achieve the coding efficiency improvement about 50% or more for equal perceptual visual quality with respect to the previous standard AVC/H.264 [6]. Due to the new capabilities that have been included in this standard, it has the potential to be widely used in many applications such as video streaming, mobile TV, video chat, UHD TV and so on.
Rate control (RC) is the process of adjusting output bits of encoded video to ensure some limitations such as bandwidth, delay, and so on. It is not a part of standard and it is designed by user according to application constraints. In the literature, there are four video RC scenarios including constant QP (CQP), constant bit rate (CBR), constant quality (CQ), and variable bit rate (VBR) scenarios [13]. Figure 1 shows the operating regions of different RC scenarios in rate-distortion (R-D) space.
In CQP scenario a constant quantization parameter (QP) is assigned to the whole video sequence and no control on bit rate is applied. In CBR, rate controller tries to maintain a short-term constant bit rate close to a reference target rate during encoding each video segment. This scenario provides a strict RC while the QP and the quality of compressed video vary too much. In CBR, the buffering delay is small, so it is suitable for low delay video communication applications such as video conferencing. As it is shown in Fig. 1, an ideal CBR algorithm operates in a narrow region parallel to distortion axis in R-D space. In CQ scenario, controller tries to keep the quality of encoded video as constant as possible and close to a reference target quality. So, according to the coding complexity of video segments, the bit rate of produced bit stream varies too much. In R-D space, an ideal constant quality method operates in a narrow region parallel to rate axis. Finally, a VBR controller can operate in a wide range in R-D space as shown in Fig. 1. The operating region of VBR controller has some overlap with those of CQ and CBR algorithms. Bit rate and quality are controlled at the same time in this scenario, while the average bit rate of encoded video reaches to a target value in long-term. In VBR, the variation in the bit rate is more than that in CBR and less than that in CQP and CQ cases and vice versa from the quality point of view. A more constant quality for a bit stream can be achieved by more variations in the bit rate that results in more transmitting and buffering delays [22]. In applications with unidirectional communication and acceptable increment in delay, VBR is preferred to other scenarios because it can provide bit streams with a higher visual quality while it controls the bit rate to meet bandwidth limitations and buffer constraint.
In HEVC, RC can be performed at different levels such as group of pictures (GOP) level, frame or picture level, and coding unit level. When a RC algorithm is implemented in lower levels such as coding unit level, it can provide a more constant rate for encoded video and vice versa. In this paper, a rate control algorithm for VBR video applications of HEVC is proposed that operates at frame level.
Rest of the paper is organized as follows. In Section 2 recent proposed RC methods for HEVC are reviewed. Section 3 represents our method in details. Section 4 contains experimental results and finally the paper is concluded in Section 5.
Background
So far, different RC algorithms have been proposed for previous video coding standards but since new structural changes and capabilities have been included in HEVC standard, new RC schemes are needed. Recently, some methods have been proposed but most of them are suitable for constant bit rate applications.
In [8], an adaptive bit allocation scheme for RC at largest coding unit (LCU) level is presented for HEVC. The proposed method uses a quasi-linear relationship between complexity and bit rate to predict the required bit budget for each LCU. Another RC scheme has been designed in [12] for HEVC standard that aims at enhancing the quality of regions of interest. The method is developed for a video conferencing system, in which a higher bit rate is allocated to the region of interest while keeping the global bit rate close to the assigned target value. A rate controller for HEVC intra frames is proposed in [25] which is composed of a linear R-D model, a PID controller with feedback from buffer, and an incremental computation approach to QP. The proposed method uses a new term called bit per weight to enhance the quality of face regions, especially at the facial features. In [19], a frame level RC algorithm for HEVC is proposed that is based on the reference picture set (RPS) mechanism. In this algorithm, a hierarchical partition structure based on RPS is designed and efficient bit allocation scheme is proposed. Also, a header bits ratio prediction method is provided to improve the accuracy of bit allocation. Finally, a prediction scheme is provided for a quadratic rate-quantization (R-Q) model to calculate the QP. New R-D models based on new features of the HEVC standard such as ALF (adaptive loop filter) and SAO (sample adaptive offset) are presented in [17]. In rate model, models for header bits and coefficient bits are proposed, separately. For distortion modeling, quantization-distortion and distortion-reduction induced by SAO/ALF are modeled, jointly. In [10], an efficient bit allocation and coding tree unit (CTU) level RC algorithm is proposed for HEVC. On the top of RC algorithm, a bit allocation method considering the HEVC hierarchical coding structure is proposed. The QP at CTU level is calculated based on the frame level QP with feedback from the coding status of CTUs. Then, to improve the coding performance of RC algorithm, a QP adjustment strategy is proposed and incorporated into the algorithm. A buffer constrained RC algorithm with hierarchical GOP structures is proposed in [18]. In the proposed method, QP is increased from a temporal layer to the next by applying a QP cascading approach while considering the buffer fullness level. In [15], an R-Q model is proposed for intra frames and an adaptive RC method is implemented for intra frames based on a rate estimation method. A rate-complexity-QP model for HEVC intra picture RC is proposed in [11]. The proposed model includes a linear distortion- quantization (D-Q), an exponential R-Q, and a linear rate-complexity models.
In [1], a perceptual video coding method is proposed to improve upon the current HEVC based on structural similarity index (SSIM)-inspired divisive normalization scheme. In fact, the method attempts to transform the DCT domain frame prediction residuals to a perceptually uniform space before encoding. Based on the residual divisive normalization process, it defines a distortion model for mode selection with simplified R-D optimization procedure. In [9], a complexity based bit allocation method is proposed to improve the encoding performance. Firstly, the relationship between bitrate and texture complexity is modeled by a linear function. Then, by considering the spatial-temporal correlations, a model is proposed to measure texture complexity. Finally, based on the proposed rate function and texture complexity measurement model, an adaptive bit allocation scheme for RC is proposed. A frame-level RC scheme that is based on texture and non-texture rate models for HEVC is proposed in [3]. In the proposed method, a texture rate model is constructed for the transform residues. For each transform residues category, according to the depth level of coding unit (CU), a single Laplacian PDF is applied to derive an R-Q model. In addition, a rate model for non-texture bits is proposed, which also takes different characteristics of non-texture bits occurring in various depths of CUs into account. In [2], according to different characteristics of transform residues, based on the CU depth levels, Laplacian PDFs for three CU types including low, medium, and high-texture are proposed.
In [24], an inter-dependent R-D model is proposed that derives a D-Q model and an R-Q model based on the relationship between the predicted residual of one frame and the distortion of its reference frame. After that, a window-based RC scheme is proposed with the complexity-based frame bit allocation and video quality optimization. To overcome lagging RC parameter setting in encoding video sequences with discontinuous scenes, an RC algorithm at GOP level is proposed for HEVC in [21]. By constructing the correlation between bit allocation of every GOP and the intensity of scene change, a new bit allocation is presented. Moreover, the impact of RC parameter updating at the frame level is investigated to obtain a high accuracy bit allocation for discontinuous scenes. For the low delay video coding configuration of HEVC, an RC algorithm is proposed in [23]. In the proposed algorithm, the relationship between bit rate and quantization step is exploited firstly to formulate an accurate quadratic R-Q model. Then, a method of determining QP for the first frame within GOP is proposed. Afterward, an accurate frame level bit allocation method is proposed. In [20], a new ρ-domain rate-GOP based frame level RC scheme is proposed with the consideration of new coding tools in HEVC including RPS. By considering the dependency between a frame and its reference frame, an inter-frame dependency based distortion and bit rate models are proposed.
To keep the consistent objective quality in HEVC, a new RC algorithm is developed in [5]. In the proposed algorithm, the PDF of transformed coefficients is modeled based on a Laplacian function that considers the quad tree coding structure. Then, D-Q and R-Q models are derived, accordingly. In [7], a pixel-based RC scheme based on unified R-Q model is proposed for HEVC that was accepted to be implemented in the standard reference software HM-6.0. The RC method operates in all levels including GOP, frame, and coding unit level. Recently, it has been shown in [4] that QP is no longer the critical factor in RC. Therefore, they proposed a λ-domain RC algorithm for HEVC which is implemented in the last versions of standard reference software HM. In this algorithm, bit allocation is implemented at GOP, picture, and coding unit levels. After bit allocation, λ value and QP is determined and encoding is executed. The conventional R-λ model implemented in the HM reference software is based on bit per pixel (bpp) to allocate bits but [16] implies that bpp does not reflect the visual importance variation of pixels. So it proposes a weight-based R-λ scheme for RC to improve the perceived visual quality of conversational videos. In [14], a gradient based rate-lambda model is proposed for intra frame RC, where the gradient can effectively measure the frame-content complexity and enhance the performance of traditional R-λ method with a new CTU-level bit allocationmethod.
Regarding to the operating regions in R-D space, all RC algorithms studied above including the proposed algorithms in [1–4, 23–25] use a target bit rate as reference for controller and try to reach the reference bit rate in short term. Therefore, they can be categorized as CBR algorithms. Moreover, the proposed algorithm in [1, 20], and [24] do not consider the buffer constraint which is essential for many applications.
In a different rate control approach, a semi-fuzzy RC algorithm for VBR video applications of H264/AVC is proposed in [13]. It uses a fuzzy controller and a quality controller for inter prediction pictures and several other controller for intra pictures. It can provide video bit streams in a wide range from constant quality to almost constant bit rate. This algorithm can operate under given buffer size, delay, and quality constraints.
Inspiring from the semi-fuzzy RC algorithm, a fuzzy RC algorithm with buffer constraint is proposed for VBR applications of HEVC standard in this paper. In comparison with the semi-fuzzy RC algorithm, the proposed algorithm is optimized for HEVC/H.265 standard while the semi-fuzzy algorithm is designed for AVC/H.264. Moreover, the semi-fuzzy algorithm uses a quality controller and a fuzzy controller and several other controllers while the proposed algorithm uses only a fuzzy controller. Furthermore, the semi-fuzzy algorithm can operate in a wide range from CQ to almost CBR while the proposed algorithm is optimized only for VBR applications.
In the proposed RC algorithm, the fuzzy controller is designed to minimize the variations of QP to provide encoded videos with high and stable visual quality. In this algorithm, the rate of encoded bit stream and the buffer status are used as feedback inputs. The QP of previous encoded frame is used as reference for calculating the QP of current frame while the required change of QP around the reference is computed by the fuzzy controller. Regarding the fact that QP includes information from rate and distortion at the same time, the proposed controller can operate in a wide range in R-D space as expected for a VBR controller.
Proposed RC method
The proposed RC algorithm in this paper controls the bit rate by adjusting QP for each picture. The fuzzy controller utilizes a virtual buffer to impose bandwidth and delay constraints for bit stream. The block diagram of proposed RC algorithm is depicted in Fig. 2. The main parts of rate controller are the virtual buffer and the fuzzy controller.
The virtual buffer simulates a decoder buffer at the receiver side of a constant bandwidth communication channel. The buffer fullness is updated after encoding each picture as:
where O B (i) denotes the occupancy of virtual buffer after encoding ith frame. R T indicates the target bit rate for bit stream and f stands for video frame rate. B F (i) is the number of used bits for encoding ithframe.
The fuzzy controller calculates the variation of QP instead of QP itself. The QP for current picture is the sum of QP used for encoding previous picture and the output of fuzzy controller i.e.
The fuzzy controller is adopted since the nonlinear relations that exist in video rate control can be easily included in fuzzy rules and membership functions. The inputs of fuzzy controller are buffer fullness and resulting bit rate which are normalized through dividing by the buffer size and target bit rate, respectively. The normalized fuzzy inputs are formulated as:
As we use the random access configuration structure of HM for experiment, according to its hierarchical structure, there exist four layers of predicted pictures. According to the well-known QP cascading technique, an offset QP is added to the computed QP in each layer. Regarding to the QP cascading, four types of B-frames exist in each GOP. To impose the effect of frame type complexity in QP calculation, we calculate a target bit budget for each frame type as:
The fuzzy rules are shown in Table 1. The contents of table show the output of fuzzy controller. The letters in table represents linguistic representations as follows. L: Low, H: High, M: Medium, V: Very, E: Extra, and U: Ultra. As an example of fuzzy rule from the table it can be seen:
IF I1 is ML AND IF I2 is ML,
For I1 and I2, the number of 9 and 7 trapezoidal membership functions with boundaries shown in Tables 2(a) and 2(b) are employed. These membership functions are depicted graphically in Fig. 3. The fuzzy rules and membership functions were designed based on our previous experiences in [13]. The nonlinearity of R-D function, different buffer status, and preventing unnecessary changes of QP are the key points in designing membership functions.
After preliminary design of fuzzy systems, for tuning fuzzy membership functions, an optimization process was performed. In optimization process, several parameters such as average bit rate, average QP, average PSNR, and the standard deviation of PSNR were considered. The desired central values for the output of fuzzy system are shown in Table 3.
A simple fuzzy system with two inputs using product inference engine, singleton fuzzifier, and center average defuzzifier as in [13] yields:
The output of fuzzy system is multiplied by a control gain and a term which is inversely proportional to delay, in order to adapt required change in QP according to the buffer size and target rate as follows:
The proposed RC algorithm was implemented on the HEVC reference software HM-16.0 and a set of experiments were run. To evaluate the proposed fuzzy algorithm, two sets of sequences were used for experiments. In the first set, four well-known video sequences with different frame rate and contents including Blowing bubbles, Basketball pass, Flower vase, and Keiba sequences were used. The frame rate of Blowing bubbles and Basketball pass is 50 fps and the frame rate of other two sequences is 30 fps. All of four mentioned sequences have 300 frames length. The second set includes longer sequences which have been made by concatenating the short sequences that have similar frame rates, i.e. Blowing bubbles is concatenated to basketball pass and Racehorse is concatenated to Keiba. These longer sequences are more suitable for evaluating the performance of VBR RC algorithms that control the average bit rate in long-term.
Table 4 shows the results of experiments. To evaluate the performance of proposed fuzzy RC algorithm from different points of view, we compared the encoding results of fuzzy controller to the results of CQP scenario and also with the results of HM-16.0 RC algorithm in terms of quality in PSNR, standard deviation of PSNR, average QP, standard deviation of QP, and required minimum buffering delay in similar bit rates. The buffer size is chosen equal to one second buffering of a bit stream with the target bit rate. The fuzzy gain G F , α and β are set to 0.5, 0.4, and 0.6, respectively. The QP cascading technique was used in the proposed method and also in CQP case. According to the average results presented in Table 4, in comparison with CQP and the HM RC algorithm, the proposed fuzzy algorithm provided a much lower buffering delay while the average PSNR is preserved. Moreover, the standard deviations of PSNR and QP for the fuzzy algorithm are between those of the other two algorithms. That means the visual quality of encoded videos by the proposed algorithm is lower than that of CQP and higher than that of the HM RC algorithm, as expected.
We also encoded the second test video set in four different R-D points (QP value of 10, 20, 30, 40 for CQP) by the three algorithms to compare the algorithms by R-D graphs and also by the Bjontegaard difference measures. The R-D graphs for the long sequence of Blowing bubbles-Basketball pass is shown in Fig. 4. The graphs show a little higher R-D performance for the proposed algorithm. Table 5 presents computed Bjontegaard Delta PSNR and Delta Rate between the proposed algorithm and the other algorithms for concatenated sequences. Small positive values for Delta PSNR and negative values for Delta Rate in Table 5 mean a little higher R-D performance for the proposed algorithm.
As sample results, detailed graphical experimental results including buffer fullness, PSNR, and QP for Flower vase and Racehorses-Keiba are presented in Figs. 5 and 6, respectively.
As shown in the figures, encoded bit streams by the proposed RC algorithm completely obeyed the buffer constraint with no buffer over flow and underflow while provided bit streams by the other algorithms caused buffer overflow and underflow insome points.
Conclusion
In this paper a novel video rate control algorithm for the new HEVC/H.265 video coding standard is proposed. The proposed algorithm is optimized for variable bit rate video applications with buffer constraint. The algorithm utilizes a fuzzy controller and a virtual buffer that simulates a decoder buffer at the receiver side of a communication channel with a constant bandwidth. The fuzzy controller uses two input feedback signals including the buffer fullness and resulting bit rate to compute a QP for each picture. Unlike conventional rate control algorithm, the fuzzy controller computes the changes of QP regarding to the used QP for encoding previous picture. In fact, in each operating point in R-D space the QP is used as a reference for the controller. While QP includes information from rate and from quality at the same time, it allows controller to operate in a wide area in R-D space. The fuzzy controller is designed to minimized the fluctuation of QP and thereafter improve the visual quality. Experimental results show that encoded bit streams by the proposed fuzzy rate control algorithm completely obeyed the buffer constraint while the quality in terms of average PSNR is preserved. In comparison with the HM rate control algorithm, the proposed controller provided higher PSNR, lower buffering delay, and lower PSNR standard deviation for tested videos.
