Video decoding method, video encoding method, apparatus, device, and storage medium
By determining and encoding only the valid quantization matrices used during encoding, the method addresses the high computational complexity issue in VVC decoding, enhancing processing efficiency and reducing bit overhead.
Patent Information
- Authority / Receiving Office
- JP · JP
- Patent Type
- Applications
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2026-04-23
- Publication Date
- 2026-07-02
Smart Images

Figure 2026110723000001_ABST
Abstract
Description
Technical Field
[0001] Embodiments of the present application relate to the technical field of video encoding and decoding, and in particular to video decoding methods, video encoding methods, devices, equipment, and storage media.
[0002] This application claims the priority of a Chinese patent application with an application number of 201911309768.6 and an invention title of "Video Decoding Method, Device, Equipment, and Storage Media", which was filed on December 18, 2019, and the entire content thereof is incorporated herein by reference.
Background Art
[0003] H.266 is a next-generation video encoding technology improved based on H.265 / HEVC (High Efficient Video Coding), and has been officially named VVC (Versatile Video Coding), and is continuously updated and refined under the guidance of the JVET (Joint Video Experts Team) organization.
[0004] At the 14th JVET meeting, it was decided that frequency-related scaling can be supported by using two types of quantization matrices in VVC: the default quantization matrix and the user-defined quantization matrix. When enabling the quantization matrix, individual quantization can be performed on the transform coefficients in the TB (Transform Block) based on the quantization coefficients (i.e., integer weighting values) included in the quantization matrix.
[0005] Currently, the decoding method of the quantization matrix adopted in VVC has a relatively high computational complexity on the decoder side.
Summary of the Invention
Problems to be Solved by the Invention
[0006] Embodiments of the present invention provide a video decoding method, a video encoding method, an apparatus, a device, and a storage medium, which can reduce the complexity of calculations on the decoder side. The above technical means are as follows. [Means for solving the problem]
[0007] In one embodiment, the present invention provides a video decoding method, the method is A step of obtaining a first parameter set corresponding to a video frame to be decoded, wherein the first parameter set includes a parameter set used to define syntax elements related to the QM (Quantization Matrix), A step of determining a valid QM based on the syntax elements included in the first parameter set, wherein the valid QM refers to the QM actually used when performing inverse quantization on the quantized transformation coefficients in the decoding process of the video frame to be decoded. The step includes performing decoding on the valid QM.
[0008] In another embodiment, an embodiment of the present application provides a video encoding method, the method is A step of determining a valid QM corresponding to a video frame to be encoded, wherein the valid QM refers to the QM actually used when quantizing the transformation coefficients in the encoding process of the video frame to be encoded. The process includes a step of generating a code stream corresponding to a first parameter set by encoding syntax elements used to determine the valid QM and the valid QM, wherein the first parameter set includes a parameter set used to define syntax elements related to the QM.
[0009] In yet another embodiment, an embodiment of the present application provides a video decoding device, the device comprising a parameter acquisition module, a QM determination module, and a QM decoding module, The parameter acquisition module is used to acquire a first parameter set corresponding to the video frame to be decoded, and the first parameter set includes a parameter set used to define syntax elements related to QM. The QM determination module is used to determine a valid QM based on the syntax elements included in the first parameter set, wherein the valid QM refers to the QM actually used when performing inverse quantization on the quantized transformation coefficients in the decoding process of the video frame to be decoded. The aforementioned QMdecrypt module is used to decrypt the valid QM.
[0010] In yet another embodiment, an embodiment of the present application provides a video coding apparatus, the apparatus including a QM decision module and a QM coding module, The QM determination module is used to determine a valid QM corresponding to the video frame to be encoded, and the valid QM refers to the QM actually used when quantizing the transformation coefficients in the encoding process of the video frame to be encoded. The QM coding module is used to determine the valid QM and to encode the valid QM to generate a code stream corresponding to a first parameter set, the first parameter set includes a parameter set used to define the syntax elements associated with the QM.
[0011] In yet another embodiment, an embodiment of the present application provides a computer device comprising a processor and memory, wherein at least one instruction, at least one program, code set or instruction set is stored in the memory, and the at least one instruction, at least one program, code set or instruction set is loaded and executed by the processor to realize the video decoding method or the video encoding method.
[0012] In yet another embodiment, an embodiment of the present application provides a computer-readable storage medium in which at least one instruction, at least one program, a code set or instruction set is stored, and the at least one instruction, the at least one program, the code set or instruction set is loaded and executed by a processor to realize the video decoding method or the video encoding method.
[0013] In yet another embodiment, the present invention provides a computer program product which, when executed by a processor, is used to implement the video decoding method or the video encoding method. [Effects of the Invention]
[0014] The technical means provided by the embodiments of this application may include the following beneficial effects.
[0015] By obtaining a first parameter set corresponding to the video frame to be decoded, a valid QM is determined based on the syntax elements included in the first parameter set. This valid QM refers to the QM actually used when quantizing the conversion coefficients during the process of encoding and generating the video frame to be decoded. Decoding is then performed using this valid QM. In this way, the decoder only needs to decode using the valid QM, thereby reducing the complexity of the decoder's calculations. [Brief explanation of the drawing]
[0016] [Figure 1] This is a schematic diagram of video coding as illustrated in the present invention. [Figure 2] This is a simplified block diagram of a communication system provided by one embodiment of the present invention. [Figure 3] This is a schematic diagram illustrating the arrangement of a video encoder and video decoder in a streaming environment, as exemplified in this application. [Figure 4]It is an encoding schematic diagram under the inter-frame prediction mode provided by one embodiment of the present application. [Figure 5] It is an encoding schematic diagram under the intra-frame prediction mode provided by one embodiment of the present application. [Figure 6] It is a schematic diagram of the functional modules of a video encoder provided by one embodiment of the present application. [Figure 7] It is a schematic diagram of the functional modules of a video decoder provided by one embodiment of the present application. [Figure 8] It is a schematic diagram of generating QM by downsampling copy provided by one embodiment of the present application. [Figure 9] It is a schematic diagram of the diagonal scanning order provided by one embodiment of the present application. [Figure 10] It is a flowchart of a video decoding method provided by one embodiment of the present application. [Figure 11] It is a flowchart of a video encoding method provided by one embodiment of the present application. [Figure 12] It is a block diagram of a video decoding device provided by one embodiment of the present application. [Figure 13] It is a block diagram of a video decoding device provided by another embodiment of the present application. [Figure 14] It is a block diagram of a video encoding device provided by one embodiment of the present application. [Figure 15] It is a structural block diagram of a computer device provided by one embodiment of the present application.
Embodiments for Carrying Out the Invention
[0017] To make the objectives, technical means, and advantages of the present application clearer, the embodiments of the present application will be described in more detail below with reference to the drawings.
[0018] As shown in Figure 1, the current block 101 contains samples already detected by the encoder during the motion detection process, and these samples can be predicted based on previous blocks of the same size where a spatial offset has occurred. Alternatively, the Motion Vector (MV) can be derived from metadata associated with one or more reference pictures, rather than directly encoding the MV. For example, the MV associated with one of the five surrounding samples A0, A1, B0, B1, and B2 (corresponding to 102-106, respectively) can be used, and the MV can be derived from the metadata of the nearest reference picture (depending on the decoding order).
[0019] As shown in Figure 2, it illustrates a simplified block diagram of a communication system provided by one embodiment of the present invention. The communication system 200 includes a plurality of devices which can communicate with each other, for example, by a network 250. For example, the communication system 200 includes a first device 210 and a second device 220 which are interconnected by the network 250. In the embodiment of Figure 2, the first device 210 and the second device 220 perform one-way data transmission. For example, the first device 210 can encode video data, for example, a video picture stream collected by the first device 210, and transmit it to the second device 220 via the network 250. The encoded video data is transmitted in the form of one or more encoded video code streams. The second device 220 can receive the encoded video data from the network 250, decode the encoded video data to recover the video data, and display a video picture based on the recovered video data. One-way data transmission is commonly found in applications such as media services.
[0020] In another embodiment, the communication system 200 includes a third device 230 and a fourth device 240 that perform bidirectional transmission of encoded video data, which can occur, for example, during a video conference. For bidirectional data transmission, each of the third device 230 and the fourth device 240 can encode video data (e.g., a video picture stream collected by the device) and transmit it to the other of the third device 230 and the fourth device 240 via the network 250. Each of the third device 230 and the fourth device 240 can further receive the encoded video data transmitted by the other of the third device 230 and the fourth device 240, decode the encoded video data to recover the video data, and display the video picture on an accessible display device based on the recovered video data.
[0021] In the embodiment shown in Figure 2, the first device 210, the second device 220, the third device 230, and the fourth device 240 may be computer devices such as servers, personal computers, and smartphones, but the principles disclosed herein are not limited thereto. The embodiments of this application apply to PCs (Personal Computers), mobile phones, tablet computers, media players, and / or dedicated video conferencing equipment. Network 250 represents any number of networks that transmit encoded video data between the first device 210, the second device 220, the third device 230, and the fourth device 240, including, for example, wired connections and / or wireless communication networks. Communication network 250 can exchange data over circuit-switched and / or packet-switched channels. The network may include electronic communication networks, local area networks, wide area networks, and / or the Internet. For the purposes of this application, the architecture and topology of network 250 may be irrelevant to the operations disclosed herein, unless otherwise interpreted below.
[0022] As an example, Figure 3 shows a configuration of a video encoder and video decoder in a streaming environment. The subject matter disclosed herein is similarly applicable to other applications that support video, such as video conferencing, digital television, and storing compressed video on digital media including CDs (Compact Discs), DVDs (Digital Versatile Discs), and Memory Sticks.
[0023] The streaming system may include a collection subsystem 313, which may include a video source 301 such as a digital camera, which creates an uncompressed video picture stream 302. In this embodiment, the video picture stream 302 includes a sample captured by the digital camera. Compared to encoded video data 304 (or encoded video code stream), the video picture stream 302 is drawn as a thick line to highlight video picture streams with a high data volume, and the video picture stream 302 may be processed by an electronic device 320. The electronic device 320 includes a video encoder 303 coupled to the video source 301. To realize or implement each aspect of the disclosed subject matter, which is described in more detail below, the video encoder 303 may include hardware, software, or a combination of software and hardware. Compared to the video picture stream 302, encoded video data 304 (or encoded video code stream 304) is drawn as a thin line to highlight encoded video data 304 (or encoded video code stream 304) with a relatively low data volume, which may be stored in a streaming server 305 for future use. One or more streaming client terminal subsystems, such as client terminal subsystems 306 and 308 in Figure 3, can access the streaming server 305 to retrieve copies 307 and 309 of the encoded video data 304. Client terminal subsystem 306 may include, for example, a video decoder 310 in an electronic device 330. The video decoder 310 decodes the introduced copy 307 of the encoded video data and produces an output video picture stream 311 that can be presented on a display 312 (e.g., a display screen) or another presentation device (not shown). In some streaming systems, encoding can be performed on the encoded video data 304, video data 307, and video data 309 (e.g., video code stream) based on some video encoding / compression standard.
[0024] It should be noted that electronic devices 320 and 330 may include other components (not shown). For example, electronic device 320 may include a video decoder (not shown), and electronic device 330 may further include a video encoder (not shown). The video decoder is used to decode the received encoded video data, and the video encoder is used to encode the video data.
[0025] When encoding image blocks in a video frame, either inter-frame prediction mode or intra-frame prediction mode can be used, generating a single prediction block based on one or more encoded reference blocks. The prediction block may be an estimated version of the original block. A residual block can be generated by subtracting the original block from the prediction block, and vice versa, and this residual block can be used to represent the prediction residual (or prediction error). Since the amount of data required to represent the prediction residual is usually less than the amount of data required to represent the original block, encoding the residual block can achieve a relatively high compression ratio. For example, as shown in Figure 4, in inter-frame prediction mode, the encoded reference block 41 and the block to be encoded 42 are located in two different video frames. As shown in Figure 5, in intra-frame prediction mode, the encoded reference block 51 and the block to be encoded 52 are located in the same video frame.
[0026] Next, the residual values of the residual blocks in the spatial domain can be converted into transformation coefficients in the frequency domain. This conversion can be achieved by a two-dimensional transformation similar to, for example, the Discrete Cosine Transform (DCT). In the transformation matrix, low-index transformation coefficients (e.g., located in the upper left region) can correspond to large spatial features and have relatively large values, while high-index transformation coefficients (e.g., located in the lower right region) can correspond to small spatial features and have relatively small values. Furthermore, a quantization matrix containing quantization coefficients can be applied to the transformation matrix to quantize all transformation coefficients, resulting in quantized transformation coefficients. As a result of quantization, the scale or quantity of the transformation coefficients may be reduced. Some high-index transformation coefficients can be reduced to zero and then skipped in subsequent scanning and coding steps.
[0027] Figure 6 shows a portion of an exemplary video encoder 60, including a transformation module 62, a quantization module 64, and an entropy coding module 66. Although not shown in Figure 6, it should be understood that the video encoder 60 may include other modules, such as a prediction module, a dequantization module, and a reconstruction module. In operation, the video encoder 60 can acquire video frames, which may contain multiple image blocks. For simplicity, encoding on a single image block may be considered here as one example. To encode an image block, a prediction block can first be generated to estimate the image block. In conjunction with the above, the prediction block can be generated by the prediction module in inter-frame prediction or intra-frame prediction mode. Subsequently, the difference between the image block and the prediction block can be calculated to generate a residual block. The residual block can be converted into transformation coefficients by the transformation module 62. During the transformation, the residual values in the spatial domain include large and small features and are converted into transformation coefficients in the frequency domain, which includes high-frequency and low-frequency bands. Subsequently, the quantization module 64 can quantize the conversion coefficients using QM, thereby generating quantized conversion coefficients. Furthermore, these quantized conversion coefficients can be encoded by the entropy coding module 66 and finally transmitted from the video encoder 60 as part of the bitstream.
[0028] Figure 7 shows a portion of an exemplary video decoder 70, including an entropy decoding module 72, an inverse quantization (dequantization) module 74, and an inverse transform module 76. Although not shown in Figure 7, it should be understood that the video decoder 70 may include other modules, such as a prediction module, a transform module, and a quantization module. In operation, the video decoder 70 receives a bitstream output from the video encoder 60, performs decoding on the bitstream according to inter-frame prediction or intra-frame prediction mode, and outputs a reconstructed video frame. Here, the entropy decoding module 72 can generate quantized transformation coefficients by performing entropy decoding on the input bitstream. The inverse quantization module 74 can perform inverse quantization on the quantized transformation coefficients based on QM to obtain the inversely quantized transformation coefficients. The inverse transform module 76 performs an inverse transform on the inversely quantized transformation coefficients to generate a reconstructed residual block. Subsequently, a reconstructed image block is generated based on the reconstructed residual block and prediction block.
[0029] As can be seen from the above, QM is an essential part of the video encoding and decoding process. The setting of QM can determine how much information of the conversion coefficients is retained or filtered, and therefore QM can affect encoding performance and encoding quality. In fact, both encoders and decoders require QM. Specifically, in order to accurately decode an image, the encoder needs to encode the information regarding the quantization coefficients in the QM and transmit this information from the encoder to the decoder. In video encoding and decoding techniques and standards, QM may be referred to as a scaling matrix or weight matrix. Therefore, the term "QM" as used herein may be a general term that covers quantization matrix, scaling matrix, weight matrix, and other equivalent terms.
[0030] The following provides an introduction and explanation of some basic concepts relating to the embodiments of this application.
[0031] 1. Quantization Matrix
[0032] In the latest version of VTM (VVC Test Model), namely VTM7, not only are square TBs permitted, but non-square TBs are also permitted, resulting in a relatively large number of QMs. To reduce the number of digits and memory needs for QM signaling, VVC employs upsampling and copy designs for both non-square TBs and large square TBs.
[0033] Non-square QMs do not exist in VVC bitstreams; they are obtained by copying the corresponding square QMs on the decoder side. More specifically, a 32x4 QM is obtained by copying rows 0, 8, 16, and 24 of a 32x32 QM. As shown in Figure 8, a 32x4 QM is obtained by downsampling the 32x32 QM. Rows 0, 8, 16, and 24, which are shaded, are copied from the 32x32 QM to the 32x4 QM.
[0034] When the size of a square TB is larger than 8x8, the corresponding QM size in VTM7 is constrained to 8x8. Upsampling methods are applied to these 8x8 QMs to create 16x16, 32x32, and 64x64 QMs. More specifically, to create a 16x16 QM, each element in the corresponding 8x8 QM is upsampled and copied to a 2x2 area, and to create a 32x32 QM, each element in the corresponding 8x8 QM is upsampled and copied to a 4x4 area.
[0035] VTM7 requires encoding a large number of QMs, 28 in total. Table 1 determines the QM identifier variable (id) based on the variables sizeId and matrixId specified in Tables 2 and 3, respectively. Here, sizeId represents the size of the quantization matrix, and matrixId is an identifier for the QM type based on the prediction mode (predMode) and color component (cIdx).
[0036] [Table 1]
[0037] [Table 2]
[0038] [Table 3]
[0039] In Table 2, when sizeId is greater than 3, it has a DC (Direct Current) coefficient, and the DC coefficient is the element value at the (0,0) position in the QM. In VVC, when the DC value is 0, the QM may use the default QM, but it can still be transmitted. The main reason is that an unencoded QM may need to refer to it. When the DC value is not 0, the QM uses a user-defined QM and is transmitted after being encoded using the encoding scheme described below.
[0040] In Table 3, MODE_INTRA represents the intra-frame prediction mode, MODE_INTER represents the inter-frame prediction mode, and MODE_IBC represents the IBC (Intra Block Copy) prediction mode. Y represents luminance, and Cb and Cr represent color difference.
[0041] 2. Quantization Matrix Coding Scheme
[0042] To reduce bit overhead, VTM7 employs intra-frame and inter-frame predictive coding to encode the 28 QMs.
[0043] In the in-frame prediction mode, DPCM (Differential Pulse Code Modulation) coding is applied to the QM in diagonal scan order. The DPCM intraframe residual also needs to be transmitted to the bitstream. For example, as shown in Figure 9, using a 4x4 size QM, the diagonal scan order is (0,0), (1,0), (0,1), (2,0), (1,1), ..., (2,3), (3,3).
[0044] There are two types of interframe prediction modes: copy mode and prediction mode. In copy mode, the current QM to be encoded is exactly the same as one QM available for decoding, called the reference QM. This also means that copy mode has zero interframe residuals, and of course, there is no need to transmit a signal to notify the residual. The encoder needs to transmit an incremental ID between the current QM and its reference QM so that the decoder can reconstruct the current QM by directly copying the reference QM. Prediction mode is similar to copy mode but has additional interframe residuals. DPCM encoding is applied to the interframe residuals in diagonal scan order, and the encoder needs to transmit the DPCM interframe residuals in the bitstream.
[0045] As described above, when the sizeId of a QM is greater than 3, an upsampling algorithm is applied to copy each element in the QM into a larger square region. Since the DC coefficient at the (0,0) position is most important for video reconstruction, VTM7 encodes it directly rather than copying it from the corresponding element of another QM. For each QM, mode determination is used to calculate the bit costs of the three candidate modes of the QM (i.e., the copy mode of the inter-frame prediction mode, the prediction mode of the inter-frame prediction mode, and the intra-frame prediction mode), and the one with the smallest bit cost is selected as the final optimal mode. Then, encoding is performed on the QM using this optimal mode.
[0046] 3. Quantized Matrix Signaling
[0047] By using QM, VVC supports frequency-related quantization of the transformation block. Assuming QM is W, W[x][y] represents the QM weights of the transformation coefficients at position (x,y) in TB. For the transformation coefficients coeff[x][y], the quantized transformation coefficients level[x][y] are calculated using Equation 1 below.
[0048]
number
[0049] Here, QP is the quantization parameter (which may also be called the quantization stride), and offset is the offset value. W[x][y]=16 indicates that no weighting is applied to the transformation coefficients at position (x,y). Also, when the values of all elements in QM are equal to 16, it has the same effect as not using QM.
[0050] The SPS (Sequence Parameter Set) syntax element sps_scaling_list_enable_flag is used to indicate whether to enable QM for images whose Picture Header (PH) has already referenced that SPS. When this flag is enabled, i.e., when sps_scaling_list_enable_flag is enabled, additional flags in the PH are used to control whether to use the default QM where all elements are equal to 16, or to use a user-defined QM. In VTM7, user-defined QMs are notified in the APS (Adaptive Parameter Set). If user-defined QMs are enabled in both the SPS and PH, one APS index can be sent in the PH, which is used to specify the QM set for images that reference this PH.
[0051] In a single APS, a large number of QM coding modes (28 groups), Δid (increment id), AC, and DC coefficients should be notified. In each APS, the 28 groups of QM are coded and decoded according to the increasing order of the ids.
[0052] In VVC Draft 7, the definitions of the QM coding mode, Δid (increment id), and the syntax and semantics of the AC and DC coefficients are shown in Table 4 below.
[0053] [Table 4]
[0054] If scaling_list_copy_mode_flag[id] is equal to 1, it means that the element values of the current QM and its reference QM are the same. The reference QM is represented by scaling_list_pred_id_delta[id]. If scaling_list_copy_mode_flag[id] is equal to 0, it means that scaling_list_pred_mode_flag exists.
[0055] A value of `scaling_list_pred_mode_flag[id]` equal to 1 indicates that the current QM can be predicted from the reference QM. The reference QM is represented by `scaling_list_pred_id_delta[id]`. A value of `scaling_list_pred_mode_flag[id]` equal to 0 indicates that an explicit signal is sent to notify the element value of the current QM. If it does not exist, the value of `scaling_list_pred_mode_flag[id]` is inferred to be equal to 0.
[0056] `scaling_list_pred_id_delta[id]` represents the reference QM used to infer the predicted QM, i.e., `ScalingMatrixPred[id]`. When it does not exist, the value of `scaling_list_pred_id_delta[id]` is inferred to be equal to 0. The value of `scaling_list_pred_id_delta[id]` should be within the range of 0 to `maxIdDelta`, and `maxIdDelta` is inferred based on `id`, as shown in Equation 2 below.
[0057]
number
[0058] In other words, if id < 2, then maxIdDelta = id; if id ≥ 2 and < 8, then maxIdDelta = id - 2; and if id ≥ 8, then maxIdDelta = id - 8.
[0059] The variables refId and matrixSize are calculated using the following formulas.
[0060] refId=id- scaling_list_pred_id_delta[ id ] Formula 3
number
[0061] In other words, if id < 2, then matrixSize = 2; if id ≥ 2 and < 8, then matrixSize = 4; and if id ≥ 8, then matrixSize = 8.
[0062] The QM prediction matrix of matrixSize × matrixSize is represented as ScalingMatrixPred[x][y], where x∈[0,matrixSize-1] and y∈[0,matrixSize-1], and the variable ScalingMatrixDCPred represents the predicted value of DC, which is calculated specifically as follows.
[0063] When both scaling_list_copy_mode_flag[id] and scaling_list_pred_mode_flag[id] are equal to 0, all elements of ScalingMatrixPred are set to be equal to 8, and the value of ScalingMatrixDCPred is set to be equal to 8.
[0064] Otherwise, when scaling_list_pred_id_delta[id] is equal to 0, all elements of ScalingMatrixPred are set to equal to 16, and the value of ScalingMatrixDCPred is set to equal to 16.
[0065] Otherwise, when scaling_list_copy_mode_flag[id] or scaling_list_pred_mode_flag[id] is equal to 1 and scaling_list_pred_id_delta[id] is greater than 0, ScalingMatrixPred is set to be equal to ScalingMatrixPred[refId], and the value of ScalingMatrixDCPred is calculated as follows: If refId is greater than 13, the value of ScalingMatrixDCPred is set to be equal to ScalingMatrixDCRec[refId - 14], otherwise (i.e., refId is 13 or less), the value of ScalingMatrixDCPred is set to be equal to ScalingMatrixPred[0][0].
[0066] The variable scaling_list_dc_coef[id - 14] is used to calculate the value of the variable ScalingMatrixDC[id - 14] when id is greater than 13, and is shown in equation 5 below.
[0067] ScalingMatrixDCRec[ id - 14 ] = ( ScalingMatrixDCPred + scaling_list_dc_coef[ id - 14 ] + 256 ) % 256 ) Formula 5
[0068] Here, % represents finding the remainder.
[0069] If it does not exist, the value of scaling_list_dc_coef[ id - 14 ] is inferred to be equal to 0. The value of scaling_list_dc_coef[ id - 14 ] should be within the range of -128 to 127 (including -128 and 127). The value of ScalingMatrixDCRec[ id - 14 ] should be greater than 0.
[0070] `scaling_list_delta_coef[id][i]` represents the difference between the current matrix coefficient `ScalingList[id][i]` and the previous matrix coefficient `ScalingList[id][i-1]` when `scaling_list_copy_mode_flag[id]` is equal to 0. The value of `scaling_list_delta_coef[id][i]` should be within the range of -128 to 127 (including -128 and 127). When `scaling_list_copy_mode_flag[id]` is equal to 1, all elements of `ScalingList[id]` are set to equal to 0.
[0071] The ScalingMatrixRec[id] of a QM of matrixSize × matrixSize can be calculated using the following formula 6.
[0072] ScalingMatrixRec[ id ][ x ][ y ] = ( ScalingMatrixPred[ x ][ y ] + ScalingList[ id ][ k ] + 256 ) % 256 ) Formula 6
[0073] Here, % represents finding the remainder, and k ∈ [0, (matrixSize × matrixSize - 1)].
[0074] x= DiagScanOrder[ Log2( matrixSize ) ][ Log2( matrixSize ) ][ k ]
[0000] , and y = DiagScanOrder[ Log2( matrixSize ) ][ Log2( matrixSize ) ][ k ]
[0001] .
[0075] The value of ScalingMatrixRec[id][x][y] should be greater than 0.
[0076] Consider the decoding process of a single QM, that is, the process of decoding based on the syntax elements described above to obtain ScalingMatrixRec[id][x][y] and ScalingMatrixDCRec.
[0077] 4. Limiting the size of TB using SPS
[0078] In VVC Draft 7, the definitions of SPS syntax and semantics related to TB size constraints are shown in Table 5 below.
[0079] [Table 5]
[0080] A value of sps_max_luma_transform_size_64_flag equal to 1 indicates that the maximum transformation block size in luminance sampling is equal to 64. A value of sps_max_luma_transform_size_64_flag equal to 0 indicates that the maximum transformation block size in luminance sampling is equal to 32.
[0081] chroma_format_idc represents the color difference sampling corresponding to luminance sampling, as shown in Table 6.
[0082] [Table 6]
[0083] In Table 6 above, SubWidthC and SubHeightC represent the width and height of the CTU (Coding Tree Unit) corresponding to the color difference component, respectively, while Monochrome indicates the absence of a color difference component.
[0084] A value of 1 for `separate_colour_plane_flag` indicates that each of the three color components of the 4:4:4 color difference format is encoded. A value of 0 for `separate_colour_plane_flag` indicates that no color component is encoded individually. If `separate_colour_plane_flag` does not exist, its value is inferred to be equal to 0.
[0085] When separate_colour_plane_flag is equal to 1, the encoded image consists of three separate components, each component consisting of an encoded sample of one color plane (Y, Cb, or Cr), and using a monochromatic encoding syntax. In this case, each color plane is associated with a specific colour_plane_id value.
[0086] `colour_plane_id` specifies the slice associated with a PH and the associated color plane. When `separate_colour_plane_flag` is equal to 1, the value of `colour_plane_id` should be in the range of 0 to 2 (including 0 and 2). Values 0, 1, and 2 of `colour_plane_id` correspond to the Y, Cb, and Cr planes, respectively. It is important to note that there is no dependency between the decoding processes of images with different `colour_plane_id` values.
[0087] sps_log2_ctu_size_minus5+5 represents the size of the luminance coding tree block for each CTU. The value of sps_log2_ctu_size_minus5 being 2 or less is a requirement for bitstream consistency.
[0088] Based on sps_log2_ctu_size_minus5, the maximum luminance coding block size can be calculated.
[0089] CtbLog2SizeY = sps_log2_ctu_size_minus5 + 5 CtbSizeY = 1 << CtbLog2SizeY
[0090] Here, CtbSizeY represents the maximum luminance coding block size, CtbLog2SizeY represents the base-2 logarithm of CtbSizeY, and << is the left shift operator.
[0091] log2_min_luma_coding_block_size_minus2+2 represents the minimum luminance coding block size. The numerical range of log2_min_luma_coding_block_size_minus2 should be within the range of 0 to sps_log2_ctu_size_minus5+3 (including 0 and sps_log2_ctu_size_minus5+3).
[0092] The calculation process for the variables MinCbLog2SizeY, MinCbSizeY, and VSize is as follows.
[0093] MinCbLog2SizeY = log2_min_luma_coding_block_size_minus2 + 2 (Equation 7) MinCbSizeY = 1 << MinCbLog2SizeY Formula 8 VSize = Min( 64 , CtbSizeY ) Formula 9
[0094] Here, MinCbSizeY represents the smallest luminance coding block size, MinCbLog2SizeY represents the base-2 logarithm of MinCbSizeY, VSize represents the largest luminance coding block size, and << is the left shift operator. The value of MinCbSizeY should be less than or equal to VSize.
[0095] The width and height of each color difference CTB (Coding Tree Block), i.e., the variables CtbWidthC and CtbHeightC, are determined using the following method.
[0096] If chroma_format_idc is equal to 0 (monochromatic) or Separate_color_Plane_flag is equal to 1, then both CtbWidthC and CtbHeightC are equal to 0.
[0097] Otherwise, CtbWidthC and CtbHeightC are calculated using the following formulas.
[0098] CtbWidthC = CtbSizeY / SubWidthC Formula 10 CtbHeightC = CtbSizeY / SubHeightC Formula 11
[0099] Here, CtbSizeY represents the size of the luminance CTB.
[0100] Currently, the encoding method for the quantization matrix employed by VVC involves encoding all 28 QMs and transmitting them in APS. This requires QM signaling to occupy a relatively large codeword, resulting in high bit overhead and increased computational complexity on the decoder side. In the technical means provided by the embodiment of the present invention, a first parameter set corresponding to the video frame to be decoded is obtained, and the valid QMs are determined based on the syntax elements included in the first parameter set. These valid QMs refer to the QMs actually used when quantizing the conversion coefficients during the encoding and generation process of the video frame to be decoded. Decoding is then performed on these valid QMs. In this way, the encoder encodes and transmits only the valid QMs, thereby saving codewords that need to be occupied by QM signaling and reducing bit overhead. The decoder only needs to decode on the valid QMs, thereby reducing computational complexity on the decoder side.
[0101] It should be noted that the technical means provided by the embodiments of this application are applicable to the H.266 / VCC standard or next-generation video coding and decoding standards, but the embodiments of this application are not limited thereto.
[0102] It is worth further explanation that in the video decoding method provided by the embodiment of the present application, the entity executing each step is the decoding device, and in the video encoding method provided by the embodiment of the present application, the entity executing each step is the encoding device, and both the decoding device and the encoding device may be computer devices. Such computer devices refer to electronic devices equipped with data calculation, processing, and storage capabilities, such as PCs, mobile phones, tablet computers, media players, dedicated video conferencing equipment, or servers.
[0103] Furthermore, the methods provided herein may be used alone or in combination with other methods in any order. Encoders and decoders based on the methods provided herein may be implemented by one or more processors or one or more integrated circuits. The technical means of the present invention will be introduced and described below by several embodiments.
[0104] As shown in Figure 10, it illustrates a flowchart of a video decoding method provided by one embodiment of the present invention. In this embodiment, the method is primarily described as being applied to the decoding device described above. The method may include several steps (1001-1003) as follows.
[0105] Step 1001: Obtain the first parameter set corresponding to the video frame to be decoded.
[0106] The video frame to be decoded may be any single video frame (or image frame) to be decoded in the video to be decoded. The first parameter set includes a set of parameters used to define syntax elements related to the QM, for example, the decoding device can decode and obtain the QM based on the syntax elements in the first parameter set.
[0107] The first parameter set is APS, which is optional. Of course, in some other embodiments, the first parameter set may not be APS, but may be SPS, etc., and the embodiments of this application are not limited thereto.
[0108] Step 1002: Determine the valid QM based on the syntax elements included in the first parameter set. The valid QM refers to the QM actually used when performing inverse quantization on the quantized transformation coefficients during the decoding process of the video frame to be decoded.
[0109] Assuming that the number of QMs that may be used when performing inverse quantization on quantized transformation coefficients is n, the effective number of QMs may be less than n, or equal to n, where n is a positive integer. For example, if all n QMs are actually used when performing inverse quantization on quantized transformation coefficients, the effective number of QMs is n; if only some of the n QMs (e.g., m QMs, where m is a positive integer less than n) are actually used when performing inverse quantization on quantized transformation coefficients, the effective number of QMs is m.
[0110] By defining syntax elements used to determine valid QMs in the first parameter set, the decoding device can determine which QMs are valid and which are not by reading these syntax elements. QMs that are not valid (may be called invalid QMs), i.e., QMs that were not actually used when quantizing the transformation coefficients in the process of encoding and generating the video frame to be decoded, do not need to be decoded by the decoding device.
[0111] For optional QMs that do not belong to a valid QM, it is predefined that all of their elements are default values. Optionally, the default value is 16, and referring to Equation 1, in this case, the stretching quantization coefficients of all transformation coefficients in TB are all 1, which is equivalent to not using a QM.
[0112] Step 1003: Decode the valid QM.
[0113] After determining the valid QMs, since there may be one or multiple valid QMs, the decoding device needs to decode each valid QM. Taking any one valid QM as an example, when decoding a valid QM, the encoding mode corresponding to that valid QM can be determined, and then the valid QM is decoded based on that encoding mode.
[0114] For example, referring to Table 1 above, if we assume that there are 28 possible QMs to use when quantizing the conversion coefficients, and that 12 of them are determined to be valid QMs, then the decoding device only needs to decode the 12 valid QMs and does not need to decode the remaining 16 invalid QMs.
[0115] As described above, in the technical means provided by the embodiment of the present application, a first parameter set corresponding to the video frame to be decoded is obtained, and a valid QM is determined based on the syntax elements included in the first parameter set. The valid QM refers to the QM actually used when quantizing the conversion coefficients in the process of encoding and generating the video frame to be decoded, and then decoding is performed on the valid QM. In this way, the decoder only needs to decode on the valid QM, thereby reducing the complexity of the decoder's calculations.
[0116] In an exemplary embodiment, the above step of determining a valid QM based on the syntax elements included in the first parameter set includes several substeps.
[0117] 1: Determine the valid size range of the QM based on the syntax elements included in the first parameter set.
[0118] The effective size range of the QM defines the minimum and maximum sizes of the QM actually used when performing inverse quantization on the quantized transformation coefficients during the decoding process. The numerical values of the QM size are powers of 2, such as 2, 4, 8, 16, 32, and 64.
[0119] 2. Determine that QMs that fall within the valid size range are valid QMs.
[0120] For example, when the valid size range for QM is [4,32], valid QMs include 4x4, 8x8, 16x16, and 32x32 size QMs. Also, for example, when the valid size range for QM is [8,16], valid QMs include 8x8 and 16x16 size QMs.
[0121] For example, assuming that the valid size range for QM is determined to be [8,16], referring to Tables 1 and 2 above, the sizeId corresponding to an 8x8 QM is 3, and the sizeId corresponding to a 16x16 QM is 4. The decoding device determines that a total of 12 QMs with IDs from 8 to 19 are valid QMs, and the remaining 16 QMs with IDs from 0 to 7 and 20 to 27 are invalid QMs.
[0122] In one example, the following method is employed to determine the effective size range of the QM based on the syntax elements included in the first parameter set.
[0123] 1.1: Based on the syntax elements included in the first parameter set, the minimum luminance coding block size, the luminance coding tree block size, and the maximum luminance TB size are determined.
[0124] Selectively, a first syntax element is defined in the first parameter set, which is used to indicate the minimum luminance coding block size. A second syntax element is defined in the first parameter set, which is used to indicate the block size of the luminance coding tree. A third syntax element is defined in the first parameter set, which is used to indicate the maximum luminance TB size. The decoding device reads the first, second, and third syntax elements from the first parameter set to determine the minimum luminance coding block size, the block size of the luminance coding tree, and the maximum luminance TB size.
[0125] 1.2: Determine the effective size range of the luminance QM based on the minimum luminance coding block size, the block size of the luminance coding tree, and the maximum luminance TB size. Here, the effective size range of the luminance QM includes the minimum and maximum sizes of the luminance QM.
[0126] Selectively, the decoding device determines the minimum size of the luminance QM based on the minimum luminance coding block size. For example, the minimum luminance coding block size is determined as the minimum size of the luminance QM. The decoding device determines the maximum size of the luminance QM to be a relatively larger value between the block size of the luminance coding tree and the maximum luminance TB size. For example, when the block size of the luminance coding tree is greater than the maximum luminance TB size, the block size of the luminance coding tree is determined as the maximum size of the luminance QM; when the block size of the luminance coding tree is less than the maximum luminance TB size, the maximum luminance TB size is determined as the maximum size of the luminance QM; and when the block size of the luminance coding tree is equal to the maximum luminance TB size, since they are equal, the device determines either the block size of the luminance coding tree as the maximum size of the luminance QM or the maximum luminance TB size as the maximum size of the luminance QM, and the result is the same.
[0127] 1.3: Based on the effective size range of the luminance QM and the sampling rate of the chrominance component relative to the luminance component, the effective size range of the chrominance QM is determined, where the effective size range of the chrominance QM includes the minimum and maximum sizes of the chrominance QM.
[0128] Optionally, a fourth syntax element is defined in the first parameter set, which is used to specify the sampling rate of the chrominance component relative to the luminance component.
[0129] Selectively, the decoding device calculates the minimum size of the chromatic difference QM based on the minimum size of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component, and calculates the maximum size of the chromatic difference QM based on the maximum size of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component.
[0130] In an exemplary embodiment, if the first parameter set is APS, the syntax elements and syntax structure table included in APS are shown in Table 7 below.
[0131] [Table 7]
[0132] The `aps_qm_size_info_present_flag` flag indicates whether syntax elements related to the QM size are present in the bitstream. A value of 1 means that syntax elements related to the QM size appear in the bitstream, and based on this, the valid size range of the QM can be determined, thereby determining which sizes of QM need to be decoded. A value of 0 means that syntax elements related to the QM size are not present in the bitstream, and all sizes of QM need to be decoded.
[0133] For aps_log2_ctu_size_minus5, the value + 5 indicates the block size of the luminance coding tree. This value is specified to be the same as the numerical value of the syntax element sps_log2_ctu_size_minus5.
[0134] For aps_log2_min_luma_coding_block_size_minus2, its value + 2 indicates the minimum luminance coding block size. This value is specified to be the same as the numerical value of the syntax element sps_log2_min_luma_coding_block_size_minus2.
[0135] For the aps_max_luma_transform_size_64_flag, a value of 1 indicates that the maximum luminance TB size is 64, and a value of 0 indicates that the maximum luminance TB size is 32. It is specified that this value is the same as the numerical value of the syntax element sps_max_luma_transform_size_64_flag.
[0136] aps_chroma_format_idc indicates the sampling rate of the chrominance component relative to the luminance component, as shown specifically in Table 6. Its value is specified to be the same as the numerical value of the syntax element chroma_format_idc.
[0137] Based on the above syntax elements, the derivation process for the variables minQMSizeY (representing the minimum size of luminance QM) and maxQMSizeY (representing the maximum size of luminance QM) is as follows.
[0138] When the value of the syntax element aps_qm_size_info_present_flag is 1, the following applies:
[0139] minQMSizeY = 1 << (aps_log2_min_luma_coding_block_size_minus2+2) Formula 12
number
[0140] Here, << is the left shift operator,
number
[0141] When the value of the syntax element aps_qm_size_info_present_flag is 0, the following applies:
[0142] minQMSizeY = 4, maxQMSizeY = 64.
[0143] The derivation process for the variables minQMSizeUV (representing the minimum size of the chromatic difference QM) and maxQMSizeUV (representing the maximum size of the chromatic difference QM) is as follows.
[0144] When the value of the syntax element aps_qm_size_info_present_flag is 1, the following applies:
[0145]
number
[0146] Here, ! represents the logical negation operation,
number
[0147] Formulas 14 and 15 above are interpreted as follows.
[0148] If aps_chroma_format_idc does not exist, then minQMSizeUV = 0, and conversely, minQMSizeUV = minQMSizeY / SubWidthC. If aps_chroma_format_idc does not exist, maxQMSizeUV = 0, and conversely, maxQMSizeUV = maxQMSizeY / SubHeightC.
[0149] When the value of the syntax element aps_qm_size_info_present_flag is 0, the following applies:
[0150] minQMSizeUV = 2, maxQMSizeUV = 32.
[0151] In the syntax structure table shown in Table 7, the variable cIdx represents the color component corresponding to the current QM. For the luminance component Y, its value is 0; for the chrominance component Cb, its value is 1; and for the chrominance component Cr, its value is 2. The variable matrixSize represents the actual encoded size of the current QM and is indicated by the third column of Table 2. The variable matrixQMSize represents the TB size corresponding to the current QM and is indicated by Tables 1 and 2.
[0152] In the syntax structure table shown in Table 7, the decoding device first makes a judgment against the two conditions proposed in this application, and then decides whether or not to decode the current QM. Taking the determination of whether the first QM is a valid QM as an example (the first QM may be any one usable QM, i.e., any one of the 28 QMs mentioned above), if the first QM satisfies either the first or second condition, the first QM is determined to be a valid QM.
[0153] Here, the first condition is cIdx==0 && (matrixQMSize >= minQMSizeY && matrixQMSize <= maxQMSizeY ), which means that the first QM belongs to the luminance component used in the quantization process of luminance TB, and the first QM is within the effective size range of luminance QM [MinQMSizeY, MaxQMSizeY], where MinQMSizeY represents the minimum size of luminance QM and MaxQMSizeY represents the maximum size of luminance QM. The second condition is cIdx!=0 && (matrixQMSize >= minQMSizeUV && matrixQMSize <= maxQMSizeUV ), which means that the first QM belongs to the color difference component used in the quantization process of color difference TB, and the first QM is within the effective size range of color difference QM [MinQMSizeUV, MaxQMSizeUV], where MinQMSizeUV represents the minimum size of color difference QM and MaxQMSizeUV represents the maximum size of color difference QM.
[0154] In the above example, the decoding device needs to calculate the valid size range of the QM based on the syntax elements included in the first parameter set, and then determine the valid QM based on that valid size range. In the example presented below, the syntax elements for the valid size range of the luminance QM may be defined directly in the first parameter set, and the decoding device can directly obtain the valid size range of the luminance QM after reading the syntax elements, and then determine the valid size range of the chromatic difference QM in conjunction with the chromatic difference format based on that valid size range of the luminance QM. Specifically, this is as follows.
[0155] In another example, the following method is employed to determine the effective size range of the QM based on the syntax elements included in the first parameter set.
[0156] 1.1: Based on the syntax elements included in the first parameter set, the effective size range of the luminance QM is determined, where the effective size range of the luminance QM includes the minimum and maximum sizes of the luminance QM.
[0157] Optionally, a fifth syntax element is defined in the first parameter set, which is used to indicate the minimum size of the luminance QM. A sixth syntax element is defined in the APS, which is used to indicate the maximum size of the luminance QM. The decoding device reads the fifth and sixth syntax elements from the first parameter set to determine the minimum and maximum sizes of the luminance QM.
[0158] 1.2: The effective size range of the chromatic difference QM is determined based on the effective size range of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component. Here, the effective size range of the chromatic difference QM includes the minimum and maximum sizes of the chromatic difference QM.
[0159] Optionally, a fourth syntax element is defined in the first parameter set, which is used to specify the sampling rate of the chrominance component relative to the luminance component.
[0160] Selectively, the decoding device calculates the minimum size of the chromatic difference QM based on the minimum size of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component, and calculates the maximum size of the chromatic difference QM based on the maximum size of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component.
[0161] In an exemplary embodiment, taking APS as an example, the syntax elements and syntax structure table included in APS are shown in Table 8 below.
[0162] [Table 8]
[0163] The `aps_qm_size_info_present_flag` flag indicates whether syntax elements related to the QM size are present in the bitstream. A value of 1 means that syntax elements related to the QM size appear in the bitstream, and based on this, the valid size range of the QM can be determined, thereby determining which sizes of QM need to be decoded. A value of 0 means that syntax elements related to the QM size are not present in the bitstream, and all sizes of QM need to be decoded.
[0164] Regarding aps_log2_min_luma_qm_size_minus2, its value + 2 indicates the minimum size of the luminance QM.
[0165] Regarding aps_log2_max_luma_qm_size_minus5, adding 5 to this value indicates the maximum size of the luminance QM.
[0166] Based on the above syntax elements, the derivation process for the variables minQMSizeY (representing the minimum size of luminance QM) and maxQMSizeY (representing the maximum size of luminance QM) is as follows.
[0167] When the value of the syntax element aps_qm_size_info_present_flag is 1, the following applies:
[0168] minQMSizeY = 1 << (aps_log2_min_luma_qm_size_minus2 + 2) Formula 16 maxQMSizeY = 1 << (aps_log2_max_luma_qm_size_minus5 + 5) Formula 17
[0169] Here, << is the left shift operator.
[0170] When the value of aps_qm_size_info_present_flag is 1, minQMSizeY and maxQMSizeY are defined to be the same as the numerical values of the TB size variables MinCbSizeY and VSize, respectively, which are obtained by calculating using the SPS syntax element.
[0171] When the value of the syntax element aps_qm_size_info_present_flag is 0, the following applies:
[0172] minQMSizeY = 4, maxQMSizeY = 64.
[0173] aps_chroma_format_idc indicates the sampling rate of the chrominance component relative to the luminance component, as shown specifically in Table 6. Its value is specified to be the same as the numerical value of the syntax element chroma_format_idc.
[0174] The derivation process for the variables minQMSizeUV (representing the minimum size of the chromatic difference QM) and maxQMSizeUV (representing the maximum size of the chromatic difference QM) is as follows.
[0175] When the value of the syntax element aps_qm_size_info_present_flag is 1, the following applies:
[0176]
number
[0177] Here, ! represents the logical negation operation,
number
[0178] When the value of the syntax element aps_qm_size_info_present_flag is 0, the following applies:
[0179] minQMSizeUV = 2, maxQMSizeUV = 32.
[0180] In some other examples, the decoding device can also determine the valid QM based on the syntax elements contained in the SPS. Specifically, the decoding device can calculate the valid size range for luminance QM [MinQMSizeY, MaxQMSizeY] and the valid size range for chrominance QM [MinQMSizeUV, MaxQMSizeUV] based on the syntax elements contained in the SPS. Here, the variable MinQMSizeY represents the minimum size of luminance QM, the variable MaxQMSizeY represents the maximum size of luminance QM, the variable MinQMSizeUV represents the minimum size of chrominance QM, and the variable MaxQMSizeUV represents the maximum size of chrominance QM.
[0181] As can be seen in conjunction with the SPS syntax structure table shown in Table 5 above, the above variables can be calculated and obtained using the following formulas.
[0182]
number
[0183] Here, << is the left shift operator, and ! represents the logical negation operator.
number
[0184] Compared to determining a valid QM based on the syntax elements contained in the SPS, defining relevant syntax elements in the APS and determining a valid QM based on those defined in the APS eliminates the parsing dependency between the code streams of the APS and the SPS, thereby eliminating the need for APS decoding to depend on the syntax elements of the SPS.
[0185] In an exemplary embodiment, the above step of determining a valid QM based on the syntax elements included in the first parameter set includes several substeps.
[0186] 1: Read the value of the flag syntax element corresponding to the first QM from the first parameter set.
[0187] 2: If the value of the flag syntax element corresponding to the first QM is the first number, then it is determined that the first QM belongs to a valid QM.
[0188] 3. If the value of the flag syntax element corresponding to the first QM is the second number, then it is determined that the first QM does not belong to a valid QM.
[0189] In this embodiment, a flag syntax element is defined in the APS to indicate whether a QM belongs to a valid QM or not. The descriptor of the flag syntax element may be u(1), which represents a 1-bit unsigned integer. For example, a value of 1 for the flag syntax element indicates that the QM belongs to a valid QM and needs to be decoded, while a value of 0 indicates that the QM does not belong to a valid QM and does not need to be decoded. For QMs that do not need to be decoded, all of their elements are predefined to be default values. Selectively, the default value is 16, and referring to Equation 1, in this case, all the stretching quantization coefficients of the transformation coefficients in TB are all 1, which is the same as not using a QM.
[0190] Furthermore, the first QM may be any one available QM, that is, any one of the 28 QMs listed above.
[0191] The first parameter set is APS, which is optional. Of course, in some other embodiments, the first parameter set does not have to be APS, and the embodiments of this application are not limited thereto.
[0192] In an exemplary embodiment, taking APS as an example, the syntax elements and syntax structure table included in APS are shown in Table 9 below.
[0193] [Table 9]
[0194] Selectable, the above flag syntax element is scaling_matrix_present_flag. For scaling_matrix_present_flag[id], a value of 1 indicates that the current QM needs to be decoded, and a value of 0 indicates that the current QM does not need to be decoded, allowing the decoding device to infer that all elements of the QM are 16.
[0195] Selectively, a luminance QM corresponds to a single flag syntax element indicating whether or not the luminance QM needs to be decoded. For first-order chromatic difference QMs (i.e., QMs corresponding to Cb) and second-order chromatic difference QMs (i.e., QMs corresponding to Cr) having the same prediction mode and size, they share the same flag syntax element indicating whether or not the first-order and second-order chromatic difference QMs need to be decoded. In other words, the first-order and second-order chromatic difference QMs do not each need to use a separate flag syntax element, which contributes to further saving bit overhead in QM coding signaling.
[0196] In an exemplary embodiment, taking APS as an example, the syntax elements and syntax structure table included in APS are shown in Table 10 below.
[0197] [Table 10]
[0198] When the value of scaling_matrix_present_flag[ predMode != MODE_INTRA ][ cIdx != 0 ][ sizeId ] is 1, it indicates that when decoding as a luminance QM, the APS will encode the luminance QM, and when decoding as a chrominance QM, it indicates that the APS will encode the QMs corresponding to the chrominances Cb and Cr whose prediction mode is predMode and have the same size. When the value of this syntax element is 0, it indicates that there is no need to decode the luminance QM or the two chrominance QMs, and the decoding device can infer that both of these elements are 16.
[0199] It is important to explain that when the encoding device sets the value of the flag syntax element corresponding to each QM, that is, when determining which QMs need to be encoded and which do not, it may do so based on the size of the QM, based on the coding prediction mode corresponding to the QM, based on the YUV color component corresponding to the QM, or it may be done by considering a combination of several of the QM sizes, coding prediction modes, and YUV color components together, and the embodiments of this application are not limited thereto.
[0200] In this embodiment, by defining a single flag syntax element in the first parameter set, it is possible to indicate whether a QM belongs to a valid QM or not, thereby providing more flexibility in indicating whether or not each QM needs to be decoded.
[0201] As shown in Figure 11, it illustrates a flowchart of a video encoding method provided by one embodiment of the present invention. In this embodiment, the method is primarily described as being applied to the encoding equipment described above. The method may include the following steps (1101-1102).
[0202] Step 1101: Determine the valid QM corresponding to the video frame to be encoded. The valid QM refers to the QM actually used when quantizing the transformation coefficients during the encoding process of the video frame to be encoded.
[0203] The video frame to be encoded may be any single video frame (or image frame) in the video to be encoded.
[0204] Assuming that the number of QMs that may be used when quantizing the transformation coefficients is n, the effective number of QMs may be less than n, or equal to n, where n is a positive integer. For example, if all n QMs are actually used when quantizing the transformation coefficients, the effective number of QMs is n; if only some of the n QMs (e.g., m QMs, where m is a positive integer less than n) are actually used when quantizing the transformation coefficients, the effective number of QMs is m.
[0205] For optional QMs that do not belong to a valid QM, it is predefined that all of their elements are default values. Optionally, the default value is 16, and referring to Equation 1, in this case, the stretching quantization coefficients of all transformation coefficients in TB are all 1, which is equivalent to not using a QM.
[0206] Step 1102: Encode the valid QM and the syntax elements used to determine the valid QM to generate a code stream corresponding to the first parameter set, where the first parameter set includes a set of parameters used to define the syntax elements associated with the QM.
[0207] After determining the valid QMs, since there may be one or multiple valid QMs, the encoding device needs to encode each valid QM separately. Taking any one valid QM as an example, when encoding this valid QM, the optimal mode corresponding to this valid QM can be determined, and then the valid QM is encoded based on this optimal mode. Here, the optimal mode may be the mode with the minimum bit cost, selected from the three candidate modes introduced above: a copy model of the inter-frame prediction mode, a prediction mode of the inter-frame prediction mode, and an intra-frame prediction mode.
[0208] For example, referring to Table 1 above, if we assume that there are 28 possible QMs to use when quantizing the conversion coefficients, and that 12 of them are determined to be valid QMs, then the encoding device only needs to encode the 12 valid QMs and does not need to encode the remaining 16 invalid QMs.
[0209] Furthermore, the encoding device needs to encode not only the valid QM but also the syntax elements used to determine the valid QM, so that the decoding device can determine the valid QM based on those syntax elements. The encoding device encodes the syntax elements used to determine the valid QM and the valid QM, and generates a code stream corresponding to the first parameter set. The first parameter set may be an APS, or another parameter set used to define the syntax elements related to the QM, and the embodiments of this application are not limited thereto.
[0210] As described above, in the technical means provided by the embodiment of the present application, by determining the valid QM corresponding to the video frame to be encoded, the valid QM refers to the QM actually used when quantizing the conversion coefficients in the encoding process of the video frame to be encoded, and then encoding is performed on the syntax elements used to determine the valid QM and the valid QM to generate a code stream corresponding to the first parameter set. In this way, the encoder side encodes and transmits only the valid QM, thereby contributing to saving codewords that need to be occupied by QM signaling and reducing bit overhead, and the decoder side only needs to decode on the valid QM, thereby reducing the complexity of the decoder side's calculations.
[0211] Furthermore, the encoding process of the encoding device corresponds to the decoding process of the decoding device. Details not explained in detail in the encoding process can be found in the introduction and explanation of the above-mentioned embodiment of the decoding process, and will not be explained in detail again here.
[0212] The following are embodiments of the apparatus of the present application, which can be used to carry out embodiments of the method of the present application. Details not disclosed in the embodiments of the apparatus of the present application can be referenced to embodiments of the method of the present application.
[0213] As shown in Figure 12, it shows a block diagram of a video decoding device provided by one embodiment of the present invention. The device has the function of implementing the example of the video decoding method described above, which may be implemented by hardware or by the hardware running corresponding software. The device may be the decoding-side device described above or installed on top of the decoding-side device. The device 1200 may include a parameter acquisition module 1210, a QM determination module 1220, and a QM decoding module 1230.
[0214] The parameter acquisition module 1210 is used to acquire a first parameter set corresponding to the video frame to be decoded, and this first parameter set is a parameter set used to define syntax elements related to QM.
[0215] The QM determination module 1220 is used to determine a valid QM based on the syntax elements included in the first parameter set, and the valid QM refers to the QM actually used when performing inverse quantization on the quantized transformation coefficients in the decoding process of the video frame to be decoded.
[0216] The QM decoding module 1230 is used to decode the above valid QM.
[0217] In an exemplary embodiment, as shown in Figure 13, the QM determination module 1220 includes a range determination unit 1221 and a QM determination unit 1222.
[0218] The range determination unit 1221 is used to determine the effective size range of the QM based on the syntax elements included in the first parameter set described above.
[0219] The QM determination unit 1222 is used to determine QMs that belong to the above-mentioned valid size range as valid QMs.
[0220] In the exemplary embodiment, the range determination unit 1221 is Based on the syntax elements included in the first parameter set described above, the minimum luminance coding block size, the luminance coding tree block size, and the maximum luminance TB size are determined. The effective size range of the luminance QM is determined based on the minimum luminance coding block size, the block size of the luminance coding tree, and the maximum luminance TB size, wherein the effective size range of the luminance QM includes the minimum and maximum sizes of the luminance QM. The effective size range of the chromatic difference QM is determined based on the effective size range of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component, wherein the effective size range of the chromatic difference QM includes the minimum and maximum sizes of the chromatic difference QM.
[0221] In the exemplary embodiment, the range determination unit 1221 is Based on the minimum luminance coding block size mentioned above, the minimum size of the luminance QM is determined, The larger of the block size of the luminance coding tree and the maximum luminance TB size is used to determine the maximum size of the luminance QM.
[0222] In the exemplary embodiment, the range determination unit 1221 is The effective size range of the luminance QM is determined based on the syntax elements included in the first parameter set described above, wherein the effective size range of the luminance QM includes the minimum and maximum sizes of the luminance QM. The effective size range of the chromatic difference QM is determined based on the effective size range of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component, wherein the effective size range of the chromatic difference QM includes the minimum and maximum sizes of the chromatic difference QM.
[0223] In the exemplary embodiment, the range determination unit 1221 is Based on the minimum size of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component, the minimum size of the chromatic difference QM is calculated. This is used to calculate the maximum size of the chromatic difference QM based on the maximum size of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component.
[0224] In an exemplary embodiment, the QM determination unit 1222 is: If the first QM satisfies either the first or second condition, it is used to determine the first QM as the valid QM. Here, the first condition above is cIdx==0 && (matrixQMSize >= minQMSizeY && matrixQMSize <= maxQMSizeY ), the first condition above indicates that the first QM belongs to the luminance component used in the quantization process of luminance TB, and the first QM is within the effective size range [MinQMSizeY, MaxQMSizeY] of the luminance QM, where MinQMSizeY represents the minimum size of the luminance QM and MaxQMSizeY represents the maximum size of the luminance QM. The second condition above is cIdx!=0 && (matrixQMSize >= minQMSizeUV && matrixQMSize <= maxQMSizeUV ), which means that the first QM belongs to the color difference component used in the quantization process of the color difference TB, and the first QM is within the effective size range [MinQMSizeUV, MaxQMSizeUV] of the color difference QM, where MinQMSizeUV represents the minimum size of the color difference QM and MaxQMSizeUV represents the maximum size of the color difference QM.
[0225] In an exemplary embodiment, as shown in Figure 13, the QM determination module 1220 includes an element reading unit 1223 and a QM judgment unit 1224.
[0226] The element reading unit 1223 is used to read the value of the flag syntax element corresponding to the first QM from the first parameter set described above.
[0227] The QM determination unit 1224 is used to determine that the first QM belongs to the valid QM if the value of the flag syntax element corresponding to the first QM is a first number, and to determine that the first QM does not belong to the valid QM if the value of the flag syntax element corresponding to the first QM is a second number.
[0228] In exemplary embodiments, the first and second chromatic difference QMs having the same prediction mode and size share the same flag syntax elements.
[0229] In the exemplary embodiment, the flag syntax element is scaling_matrix_present_flag.
[0230] In the exemplary embodiment, the first parameter set is APS.
[0231] In the exemplary embodiment, all elements of any other QM that do not belong to the above valid QM are predefined to be their default values.
[0232] In the exemplary embodiment, the default value is 16.
[0233] As described above, in the technical means provided by the embodiment of the present application, a first parameter set corresponding to the video frame to be decoded is obtained, and a valid QM is determined based on the syntax elements included in the first parameter set. The valid QM refers to the QM actually used when quantizing the conversion coefficients in the process of encoding and generating the video frame to be decoded, and then decoding is performed on the valid QM. In this way, the decoder only needs to decode on the valid QM, thereby reducing the complexity of the decoder's calculations.
[0234] As shown in Figure 14, it shows a block diagram of a video encoding device provided by one embodiment of the present invention. The device has the function of implementing the example of the video encoding method described above, which may be implemented by hardware or by the hardware running corresponding software. The device may be the encoding device described above or installed on top of the encoding device. The device 1400 may include a QM determination module 1410 and a QM encoding module 1420.
[0235] The QM determination module 1410 is used to determine the valid QM corresponding to the video frame to be encoded. The valid QM refers to the QM actually used when quantizing the transformation coefficients during the encoding process of the video frame to be encoded.
[0236] The QM coding module 1420 is used to determine the valid QM and to encode the valid QM to generate a code stream corresponding to the first parameter set. Here, the first parameter set is a set of parameters used to define the syntax elements associated with the QM.
[0237] As described above, in the technical means provided by the embodiment of the present application, by determining the valid QM corresponding to the video frame to be encoded, the valid QM refers to the QM actually used when quantizing the conversion coefficients in the encoding process of the video frame to be encoded, and then encoding is performed on the syntax elements used to determine the valid QM and the valid QM to generate a code stream corresponding to the first parameter set. In this way, the encoder side encodes and transmits only the valid QM, thereby contributing to saving codewords that need to be occupied by QM signaling and reducing bit overhead, and the decoder side only needs to decode on the valid QM, thereby reducing the complexity of the decoder side's calculations.
[0238] One point that needs to be explained is that, while the above embodiment of the device was described using only the division of each functional module as an example when realizing its functions, in actual applications, the above functions can be completed by assigning them to different functional modules as needed. That is, the internal structure of the device can be divided into different functional modules to complete all or some of the functions described above. Furthermore, the device provided in the above embodiment belongs to the same concept as the embodiment of the method, and its specific implementation process can be found in detail in the embodiment of the method, so it will not be explained in detail again here.
[0239] As shown in FIG. 15, it shows a structural block diagram of a computer device provided by one embodiment of the present application. The computer device may be the encoding-side device introduced above, or may be the decoding-side device introduced above. The computer device 150 may include a processor 151, a memory 152, a communication interface 153, an encoder / decoder 154, and a bus 155.
[0240] The processor 151 includes one or more processing cores, and the processor 151 executes various functional applications and information processing by operating software programs and modules.
[0241] The memory 152 can be used to store a computer program, and the processor 151 is used to execute the computer program, thereby realizing the above video encoding method or realizing the above video decoding method.
[0242] The communication interface 153 can be used to communicate with other devices, for example, to transmit and receive audio and video data.
[0243] The encoder / decoder 154 can be used to realize encoding and decoding functions, for example, to perform encoding and decoding on audio and video data.
[0244] The memory 152 is connected to the processor 151 by the bus 155.
[0245] Also, the memory 152 may be implemented by any type of volatile or non-volatile storage device or a combination thereof. The volatile or non-volatile storage device includes, but is not limited to, magnetic disks or compact disks, EEPROM (Electrically Erasable Programmable Read-Only Memory), EPROM (Erasable Programmable Read-Only Memory), SRAM (Static Random-Access Memory), ROM (Read-Only Memory), magnetic memory, flash memory, and PROM (Programmable read-only memory).
[0246] As can be understood by those skilled in the art, the structure shown in FIG. 15 does not constitute a limitation on the computer device 150, and may include more or fewer components than shown, or some components may be combined, or different components may be adopted and arranged.
[0247] In an exemplary embodiment, a computer-readable storage medium is further provided, and at least one instruction, at least one program, a code set or an instruction set is stored in the computer-readable storage medium. When the at least one instruction, the at least one program, the code set or the instruction set is executed by a processor, the video decoding method is realized, or the video encoding method is realized.
[0248] In an exemplary embodiment, a computer program product is further provided, and when the computer program product is executed by a processor, it is used to realize the video decoding method or the video encoding method.
[0249] To be understood, “plural” as used herein refers to two or more. “And / or” describes the relationship between related objects and indicates that there may be three kinds of relationships; for example, A and / or B may represent three situations: A existing alone, A and B existing simultaneously, or B existing alone. The letter “ / ” generally indicates that the preceding and following related objects are in a kind of “or” relationship.
[0250] The foregoing are merely illustrative embodiments of the present application and are not intended to limit it. Any modifications, substitutions with equivalents, or improvements made within the spirit and principles of the present application should also be included within the scope of protection. [Explanation of symbols]
[0251] 60 Video Encoders 62 Conversion Modules 64 Quantization Modules 66 Entropy Coding Module 70 Video Decoders 72 Entropy Decoding Module 74 Inverse Quantization Module 76 Inverse Conversion Module 150 Computer Equipment 151 processors 152 memory 153 Communication Interface 154 Decoder 155 Bus 200 Communication Systems 210 1st device 220 2nd device 230 Third equipment 240 4th equipment 250 Networks 301 Video Sources 302 Video Picture Stream 303 Video Encoder 304 Video Data 305 Streaming Server 310 Video Decoder 311 Output video picture stream 312 displays 313 Collection Subsystem 1210 Parameter Acquisition Module 1220 QM Decision Module 1221 Range Determination Unit 1222 QM Decision Unit 1223 Element Reading Unit 1224 QM Judgment Unit 1230 QM Decryption Module 1410 QM Decision Module 1420 QM Encoding Module
Claims
1. A video decoding method, wherein the method is A step of obtaining a first parameter set corresponding to the video frame to be decoded, wherein the first parameter set includes syntax elements related to the quantization matrix QM, A step of determining a valid QM based on the syntax elements included in the first parameter set, wherein the valid QM refers to the QM used when performing inverse quantization on the quantized transformation coefficients in the decoding process of the video frame to be decoded. A video decoding method comprising the step of decoding the valid QM.
2. The step of determining a valid QM based on the syntax elements included in the first parameter set is: The steps include determining an effective size range for QM based on the syntax elements included in the first parameter set, The method according to claim 1, comprising the step of determining a QM that falls within the effective size range as the effective QM.
3. The step of determining an effective size range for QM based on the syntax elements included in the first parameter set is: The steps include determining the minimum luminance coding block size, the block size of the luminance coding tree, and the maximum luminance conversion block TB size based on the syntax elements included in the first parameter set, A step of determining an effective size range for luminance QM based on the minimum luminance coding block size, the block size of the luminance coding tree, and the maximum luminance TB size, wherein the effective size range for luminance QM includes the minimum and maximum sizes of luminance QM. The method according to claim 2, comprising the step of determining an effective size range of a chromatic difference QM based on an effective size range of the luminance QM and a sampling rate of the chromatic difference component relative to the luminance component, wherein the effective size range of the chromatic difference QM includes a minimum size and a maximum size of the chromatic difference QM.
4. The step of determining an effective size range of luminance QM based on the minimum luminance coding block size, the block size of the luminance coding tree, and the maximum luminance TB size is as follows: A step of determining the minimum size of the luminance QM based on the minimum luminance coding block size, The method according to claim 3, comprising the step of determining a relatively large value among the block size of the luminance coding tree and the maximum luminance TB size as the maximum size of the luminance QM.
5. The step of determining an effective size range for QM based on the syntax elements included in the first parameter set is: A step of determining an effective size range of luminance QM based on syntax elements included in the first parameter set, wherein the effective size range of luminance QM includes the minimum and maximum sizes of luminance QM. The method according to claim 2, comprising the step of determining an effective size range of a chromatic difference QM based on an effective size range of the luminance QM and a sampling rate of the chromatic difference component relative to the luminance component, wherein the effective size range of the chromatic difference QM includes a minimum size and a maximum size of the chromatic difference QM.
6. The step of determining the effective size range of the chromatic difference QM based on the effective size range of the luminance QM and the sampling rate of the chromatic difference component relative to the luminance component is as follows: A step of calculating the minimum size of the color difference QM based on the minimum size of the luminance QM and the sampling rate of the color difference component relative to the luminance component, The method according to claim 2 or 5, comprising the step of calculating the maximum size of the chromatic difference QM based on the maximum size of the luminance QM and the sampling rate of the chromatic difference component with respect to the luminance component.
7. The step of determining the QMs that fall within the effective size range as the effective QMs is, If the first QM satisfies one of the first and second conditions, the step includes determining the first QM as the valid QM, The first condition is cIdx == 0 && (matrixQMSize >= minQMSizeY && matrixQMSize <= maxQMSizeY), the first condition indicates that the first QM belongs to the luminance component used in the quantization process of luminance TB, and the first QM is within the effective size range of the luminance QM [MinQMSizeY, MaxQMSizeY], where MinQMSizeY represents the minimum size of the luminance QM and MaxQMSizeY represents the maximum size of the luminance QM. The method according to claim 3 or 5, wherein the second condition is cIdx! = 0 && (matrixQMSize >= minQMSizeUV && matrixQMSize <= maxQMSizeUV), the second condition indicates that the first QM belongs to the color difference component used in the quantization process of color difference TB, and the first QM is within the effective size range [MinQMSizeUV, MaxQMSizeUV] of the color difference QM, where MinQMSizeUV represents the minimum size of the color difference QM and MaxQMSizeUV represents the maximum size of the color difference QM.
8. The step of determining a valid QM based on the syntax elements included in the first parameter set is: The steps include reading the value of the flag syntax element corresponding to the first QM from the first parameter set, If the value of the flag syntax element corresponding to the first QM is a first numerical value, then it is determined that the first QM belongs to the valid QM. The method according to claim 1, comprising the step of determining that if the value of the flag syntax element corresponding to the first QM is a second numerical value, then the first QM does not belong to the valid QM.
9. The method according to claim 8, wherein the first and second chromatic difference QMs having the same prediction mode and the same size share the same flag syntax elements.
10. The method according to 8, characterized in that the flag syntax element is scanning_matrix_present_flag.
11. The method according to claim 1, wherein the first parameter set is a self-adaptive parameter set APS.
12. The method according to claim 1, wherein all elements of other QMs that do not belong to the valid QMs are defined in advance as default values.
13. The method according to claim 12, wherein the default value is 16.
14. A video encoding method, wherein the method is A step of determining an effective quantization matrix QM corresponding to a video frame to be encoded, wherein the effective QM refers to the QM used when quantizing the transformation coefficients in the encoding process of the video frame to be encoded. A video coding method comprising: a step of generating a code stream corresponding to a first parameter set by encoding syntax elements used to determine the valid QM and the valid QM, wherein the first parameter set includes syntax elements related to the QM.
15. A video decoding device, the device including a parameter acquisition module, a QM determination module, and a QM decoding module, The parameter acquisition module is used to acquire a first parameter set corresponding to the video frame to be decoded, and the first parameter set includes syntax elements related to the quantization matrix QM. The QM determination module is used to determine a valid QM based on the syntax elements included in the first parameter set, wherein the valid QM refers to the QM used when performing inverse quantization on the quantized transformation coefficients in the decoding process of the video frame to be decoded. The QM decoding module is a video decoding device used to perform decoding on the valid QM.
16. A computer device comprising a processor and memory, wherein at least one program, code set or instruction set is stored in the memory, and the at least one program, code set or instruction set is loaded and executed by the processor to realize the method according to any one of claims 1 to 14.
17. A computer-readable storage medium wherein at least one program, code set, or instruction set is stored in the computer-readable storage medium, and the at least one program, code set, or instruction set is loaded and executed by a processor to realize the method according to any one of claims 1 to 14.