Video encoding method, computing device, computer-readable recording medium, and method for storing a bitstream

PROF and BDOF refine motion predictions in video coding, addressing inefficiencies in VVC by aligning bit depth and workflows, enhancing encoding efficiency and reducing complexity in hardware implementations.

JP2026110786APending Publication Date: 2026-07-02BEIJING DAJIA INTERNET INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
JP · JP
Patent Type
Applications
Current Assignee / Owner
BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
Filing Date
2026-04-27
Publication Date
2026-07-02

Smart Images

  • Figure 2026110786000001_ABST
    Figure 2026110786000001_ABST
Patent Text Reader

Abstract

This provides a video encoding method. [Solution] Two GCI (general constraint information) level control flags are determined, and the two GCI level control flags include a first GCI level control flag and a second GCI level control flag, the first GCI level control flag indicating whether the first SPS (sequence parameter set) level control flag is equal to 0, and the second GCI level control flag indicating whether the second SPS level control flag is equal to 0. vinegar .
Need to check novelty before this filing date? Find Prior Art

Description

[Technical Field]

[0001] This disclosure relates to video coding and compression. More specifically, this disclosure relates to methods and apparatus for two interpretation tools investigated in the VVC (versatile video coding) standard, namely PROF (prediction refinement with optical flow) and BDOF (bi-directional optical flow). [Background technology]

[0002] Various video coding techniques can be used to compress video data. Video coding is performed according to one or more video coding standards. For example, video coding standards include VVC (versatile video coding), JEM (joint exploration test model), H.265 / HEVC (high-efficiency video coding), H.264 / AVC (advanced video coding), and MPEG (moving picture expert group) coding. Video coding generally utilizes prediction methods (e.g., inter-prediction, intra-prediction) that take advantage of redundancy present in video images or sequences. A key goal of video coding techniques is to compress video data into a format that uses a lower bitrate while avoiding or minimizing degradation of video quality. [Overview of the project] [Problems that the invention aims to solve]

[0003] Examples of this disclosure provide methods and apparatus for PROF (prediction refinement with optical flow) and BDOF (bi-directional optical flow) in video coding. [Means for solving the problem]

[0004] According to a first aspect of the present disclosure, a method of PROF is provided. The method includes a decoder that obtains a first reference picture I (0) and a second reference picture I (1) associated with a video block encoded in affine mode with a video signal. The decoder can also obtain first and second horizontal and vertical gradient values based on a first prediction sample I (0) and a second prediction sample I (1) (i,j) associated with the first reference picture I (0) and the second reference picture I (1) (i,j) of the video block. The decoder can further obtain first and second horizontal and vertical motion refinements based on a control point motion vector (CPMV) associated with the first reference picture I (0) and the second reference picture I (1) of the video block. Also, the decoder can obtain a first prediction refinement ΔI (0) (i,j) and a second prediction refinement ΔI (1) (i,j) based on the first and second horizontal and vertical gradient values and the first and second horizontal and vertical motion refinements. The decoder can further obtain a final prediction sample of the video block based on the first prediction sample I (0) (i,j), the second prediction sample I (1) (i,j), the first prediction refinement ΔI (0) (i,j), the second prediction refinement ΔI (1) (i,j), and prediction parameters. The prediction parameters can include weights and offset parameters for weighted prediction (WP) and parameters for bi-prediction by block-level weight (BCW).

[0005] A PROF method is provided according to a second aspect of this disclosure. This method may include an encoder that signals two General Constraint Information (GCI) level control flags. The two GCI level control flags may include a first GCI level control flag and a second GCI level control flag. The first GCI level control flag indicates whether BDOF is enabled for the current video sequence. The second GCI level control flag indicates whether PROF is enabled for the current video sequence. The encoder may also signal two Sequence Parameter Set (SPS) level control flags. The two SPS level control flags signal whether BDOF and PROF are enabled for the current video block in the current video sequence. For the current video block encoded in non-affine mode, a first prediction sample I (0) (i,j) and second predicted sample I (1) Based on the determination that BDOF is applied to derive motion refinement of the video block based on (i,j), the first SPS level control flag can signal that BDOF is enabled for the current video block. For the current video block encoded in affine mode, the first prediction sample I (0) (i,j) and second predicted sample I (1) Based on the determination that PROF is applied to derive motion refinement of the video block based on (i,j), the second SPS level control flag can signal that PROF is enabled for the current video block.

[0006] A computing device is provided according to a third aspect of this disclosure. This computing device may include one or more processors and non-temporary computer-readable memory for storing instructions that can be executed by the one or more processors. One or more processors have a first reference picture I associated with a video block encoded in affine mode in a video signal. (0)and second reference picture I (1) It may be configured to obtain the first reference picture I of the video block. One or more processors may also obtain the first reference picture I of the video block. (0) and second reference picture I (1) First predicted sample I associated with (0) (i,j) and second predicted sample I (1) Based on (i,j), it may be configured to obtain first and second horizontal and vertical gradient values. One or more processors may obtain the first reference picture I of the video block. (0) and second reference picture I (1) Based on the CPMV associated with the first and second horizontal and vertical motion refinements, one or more processors may also be configured to obtain a first predictive refinement ΔI based on the first and second horizontal and vertical gradient values ​​and the first and second horizontal and vertical motion refinements. (0) (i,j) and second prediction refinement △I (1) It may be configured to obtain (i,j). One or more processors will obtain the first predicted sample I (0) (i,j), Second Prediction Sample I (1) (i,j), First prediction refinement △I (0) (i,j), Second prediction refinement △I (1) (i,j), and the prediction parameters can be further configured to obtain the final predicted sample of the video block. The prediction parameters may include weighting and offset parameters for WP and parameters for BCW.

[0007] A fourth aspect of this disclosure provides a non-temporary computer-readable recording medium storing instructions. When an instruction is executed by one or more processors of the device, the instruction can cause the device to signal two GCI level control flags. The two GCI level control flags may include a first GCI level control flag and a second GCI level control flag. The first GCI level control flag indicates whether BDOF is enabled for the current video sequence. The second GCI level control flag indicates whether PROF is enabled for the current video sequence. The instruction can also cause the device to signal two SPS level control flags. The two SPS level control flags signal whether BDOF and PROF are enabled for the current video block of the current video sequence. For the current video block encoded in non-affine mode, a first prediction sample I (0) (i,j) and second predicted sample I (1) Based on the determination that BDOF is applied to derive motion refinement of the video block based on (i,j), the first SPS level control flag can signal that BDOF is enabled for the current block. For the current video block encoded in affine mode, the first prediction sample I (0) (i,j) and second predicted sample I (1) Based on (i,j), the second SPS level control flag can signal that PROF is enabled for the current block, based on the determination that PROF is applied to derive motion refinement of the video block. [Brief explanation of the drawing]

[0008] [Figure 1] This is a block diagram of an encoder according to an example of the disclosure. [Figure 2] This is a block diagram of a decoder according to an example of the disclosure. [Figure 3A]This figure shows a block partition in a multi-type tree structure, as an example of the disclosure. [Figure 3B] This figure shows a block partition in a multi-type tree structure, as an example of the disclosure. [Figure 3C] This figure shows a block partition in a multi-type tree structure, as an example of the disclosure. [Figure 3D] This figure shows a block partition in a multi-type tree structure, as an example of the disclosure. [Figure 3E] This figure shows a block partition in a multi-type tree structure, as an example of the disclosure. [Figure 4] This is a diagram of a BDOF (bi-directional optical flow) model, as an example of this disclosure. [Figure 5A] This is a diagram of an affine model, which is an example of the disclosure. [Figure 5B] This is a diagram of an affine model as an example of this disclosure. [Figure 6] This is a diagram of an affine model, which is an example of the disclosure. [Figure 7] This is a diagram of PROF (prediction refinement with optical flow) as an example of the present disclosure. [Figure 8] This is an example of a BDOF workflow as described in this disclosure. [Figure 9] This is an example of a PROF workflow as described in this disclosure. [Figure 10] This is a BDOF method according to an example of the disclosure. [Figure 11] This disclosure provides an example of a method for BDOF and PROF. [Figure 12] This is a diagram illustrating the PROF workflow for bi prediction, as an example of this disclosure. [Figure 13] This diagram shows the pipeline stages of the BDOF and PROF processes as disclosed herein. [Figure 14] This diagram illustrates the method for deriving the gradient of the BDOF according to this disclosure. [Figure 15] This is a diagram illustrating the method for deriving the gradient of PROF according to this disclosure. [Figure 16A] This figure shows the derivation of an affine mode template sample, as an example of this disclosure. [Figure 16B] This figure shows the derivation of an affine mode template sample, as an example of this disclosure. [Figure 17A] This diagram illustrates how to exclusively enable PROF and LIC in affine mode, as shown in this disclosure. [Figure 17B] This diagram illustrates how affine mode PROF and LIC can be used together, as an example of this disclosure. [Figure 18A] This figure shows a proposed padding method applied to a 16×16BDOF CU, as an example of the present disclosure. [Figure 18B] This figure shows a proposed padding method applied to a 16×16BDOF CU, as an example of the present disclosure. [Figure 18C] This figure shows a proposed padding method applied to a 16×16BDOF CU, as an example of the present disclosure. [Figure 18D] This figure shows a proposed padding method applied to a 16×16BDOF CU, as an example of the present disclosure. [Figure 19] This figure shows a computing environment connected to a user interface, as an example of the disclosure. [Modes for carrying out the invention]

[0009] Please understand that the general descriptions above and the detailed descriptions below are illustrative and explanatory only and are not intended to limit this disclosure.

[0010] The accompanying drawings incorporated herein and constituting part thereof illustrate examples consistent with the present disclosure and, together with the description, serve to illustrate the principles of the present disclosure.

[0011] Next, exemplary embodiments will be referenced in detail, with examples shown in the accompanying drawings. The following description will refer to the accompanying drawings, and the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described below in the description of exemplary embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with the embodiments related to the present disclosure as described in the accompanying claims.

[0012] The terms used in this disclosure are for the sole purpose of describing specific embodiments and are not intended to limit this disclosure. As used in this disclosure and the attached claims, singular "one," etc., are intended to include plural terms unless explicitly indicated that they are out of context. Furthermore, the terms "and / or" as used herein are intended to mean and include any or all possible combinations of one or more of the related enumerated items.

[0013] Terms such as “first,” “second,” and “third” may be used herein to describe various types of information, but it should be understood that the information should not be limited by these terms. These terms are used solely to distinguish one category of information from another. For example, without departing from the scope of this disclosure, first information may be referred to as second information, and similarly, second information may be referred to as first information. As used herein, the term “if” may be understood, depending on the context, to mean “if,” “if,” or “in response to a determination.”

[0014] The first version of the HEVC standard was completed in October 2013, offering approximately 50% bitrate savings or equivalent perceived quality compared to the previous generation video encoding standard, H.264 / MPEG AVC. While the HEVC standard offers significant encoding improvements over its predecessors, there is evidence that superior encoding efficiency can be achieved with HEVC using additional encoding tools. Based on this, both VECG and MPEG began exploring new encoding techniques for future video encoding standardization. In October 2015, the Joint Video Exploration Team (JVET) was formed by ITU-T VECG and ISO / IEC MPEG to initiate important research into advanced techniques that would enable significant improvements in encoding efficiency. A reference software called the Joint Exploration Model (JEM) is maintained by JVET by integrating several additional encoding tools on top of the HEVC Test Model (HM).

[0015] In October 2017, the ITU-T and ISO / IEC published a Joint Call for Proposals (CfP) for video compression beyond HEVC. In April 2018, at the 10th JVET meeting, 23 CfP responses were received and evaluated, showing an improvement in compression efficiency of approximately 40% compared to HEVC. Based on these evaluation results, JVET launched a new project to develop a next-generation video coding standard called Versatile Video Coding (VVC). In the same month, a reference software codebase called the VVC test model (VTM) was established to demonstrate a reference implementation of the VVC standard.

[0016] Like HEVC, VVC is built on a block-based hybrid video encoding framework.

[0017] Figure 1 shows a general diagram of a block-based video encoder for VVC. Specifically, Figure 1 shows a typical encoder 100. The encoder 100 has a video input 110, motion compensation 112, motion estimation 114, intra-mode / inter-mode determination 116, block predictor 140, adder 128, transducer 130, quantization 132, prediction-related information 142, intra-prediction 118, picture buffer 120, inverse quantization 134, inverse transform 136, adder 126, memory 124, in-loop filter 122, entropy coding 138, and bitstream 144.

[0018] In encoder 100, a video frame is divided into multiple video blocks for processing. For each given video block, a prediction is formed based on either an interprediction approach or an intraprediction approach.

[0019] The prediction residual, representing the difference between the current video block, which is part of the video input 110, and its predictor, which is part of the block predictor 140, is sent from the adder 128 to the converter 130. The conversion coefficients are then sent from the converter 130 to the quantizer 132 for entropy reduction. The quantized coefficients are then fed to the entropy encoder 138 to generate a compressed video bitstream. As shown in Figure 1, prediction-related information 142 from the intra / intermode determination 116, such as video block division information, motion vectors (MV), reference picture index, and intra-prediction mode, is also fed via the entropy encoder 138 and stored in the compressed bitstream 144. The compressed bitstream 144 contains the video bitstream.

[0020] Encoder 100 also requires decoder-related circuitry to reconstruct pixels for prediction purposes. First, the prediction residual is reconstructed through inverse quantization 134 and inverse transform 136. This reconstructed prediction residual is combined with block predictor 140 to generate unfiltered reconstructed pixels for the current video block.

[0021] Spatial prediction (or "intra prediction") predicts the current video block using pixels from samples of already encoded adjacent blocks (called reference samples) within the same video frame as the current video block.

[0022] Time prediction (also called "interpretation") predicts the current video block using reconstructed pixels from an already encoded video picture. Time prediction reduces the temporal redundancy inherent in video signals. The time prediction signal for a given encoding unit (CU) or encoding block is typically signaled by one or more MVs indicating the amount and direction of movement between the current CU and its time reference. In addition, if multiple reference pictures are supported, one reference picture index is transmitted, which is used to identify which reference picture in the reference picture storage device the time prediction signal is coming from.

[0023] The motion estimation unit 114 takes in signals from the video input 110 and the picture buffer 120 and outputs the motion estimation signal to the motion compensation unit 112. The motion compensation unit 112 takes in signals from the video input 110, the picture buffer 120, and the motion estimation signal from the motion estimation unit 114 and outputs the motion compensation signal to the intra-mode / inter-mode determination unit 116.

[0024] After spatial and / or temporal predictions are performed, the encoder 100's intra / intermode determination 116 selects the best prediction mode, for example, based on a rate-distortion optimization method. The block predictor 140 is then subtracted from the current video block, and the resulting prediction residual is decorrelated using transformation 130 and quantization 132. The obtained quantized residual coefficients are dequantized by inverse quantization 134 and inversely transformed by inverse transformation 136 to form a reconstructed residual, which is then added to the prediction block to form the reconstructed signal of the CU. Further in-loop filters 122, such as a deblocking filter, SAO (sample adaptive offset), and / or ALF (adaptive in-loop filter), can be applied to the reconstructed CU before it is placed in the reference picture storage of the picture buffer 120 and used to encode future video blocks. To form the output video bitstream 144, the encoding mode (inter or intra), prediction mode information, motion information, and quantized residual coefficients are all sent to the entropy encoding unit 138, where they are further compressed and packed to form the bitstream.

[0025] Figure 1 shows a block diagram of a typical block-based hybrid video coding system. The input video signal is processed block by block (called CUs). In VTM-1.0, a CU can be up to 128 x 128 pixels. However, in VVC, unlike HEVC which divides blocks based solely on quadtrees, a single CTU (coding tree unit) is divided into CUs based on quadtrees / binary / ternary trees to adapt to changing local characteristics. Furthermore, the concept of multiple division unit types in HEVC is eliminated; that is, the separation of CU, PU (prediction unit), and TU (transform unit) no longer exists in VVC, and instead, each CU is always used as the base unit for both prediction and transformation without further division. In a multi-type tree structure, first, one CTU is divided into a quadtree structure. Then, each quadtree leaf node can be further divided into binary and ternary structures. As shown in Figures 3A, 3B, 3C, 3D, and 3E, there are five types of divisions: quad division, horizontal 2 division, vertical 2 division, horizontal 3 division, and vertical 3 division.

[0026] Figure 3A shows a block division into four parts in a multi-type tree structure according to this disclosure.

[0027] Figure 3B shows a vertical division of a block in a multi-type tree structure according to the present disclosure.

[0028] Figure 3C shows a horizontal division of a block in a multi-type tree structure according to this disclosure.

[0029] Figure 3D shows a diagram illustrating the vertical three-part division of a block in a multi-type tree structure according to this disclosure.

[0030] Figure 3E shows a horizontal division of a block into three parts in a multi-type tree structure according to this disclosure.

[0031] In Figure 1, spatial and / or temporal prediction may be performed. Spatial prediction (or “intra prediction”) uses pixels from samples of already encoded adjacent blocks (called reference samples) of the same video picture / slice to predict the current video block. Spatial prediction reduces the spatial redundancy inherent in the video signal. Temporal prediction (also called “inter prediction” or “motion-compensated prediction”) predicts the current video block using reconstructed pixels from already encoded video pictures. Temporal prediction reduces the temporal redundancy inherent in the video signal. A temporal prediction signal for a given CU is typically signaled by one or more motion vectors (MVs) indicating the amount and direction of motion between the current CU and its time reference. If multiple reference pictures are supported, one additional reference picture index is also transmitted, and the reference picture index is used to identify which reference picture in the reference picture store the temporal prediction signal is coming from. After spatial and / or temporal prediction, the encoder’s mode determination block selects the best prediction mode, for example, based on a rate-distortion optimization method. Next, the predicted block is subtracted from the current video block, and the predicted residual is decorrelated and quantized using a transform. The quantized residual coefficients are inversely quantized and inversely transformed to form the reconstructed residual, which is then re-added to the predicted block to form the reconstructed signal of the CU. Furthermore, in-loop filtering such as deblocking filters, SAO (sample adaptive offset), and ALF (adaptive in-loop filter) can be applied to the reconstructed CU before it is placed in the reference picture store and used to encode future video blocks. To form the output video bitstream, the encoding mode (inter or intra), predicted mode information, motion information, and quantized residual coefficients are all sent to the entropy encoding unit, where they are further compressed and packed to form the bitstream.

[0032] Figure 2 shows a typical block diagram of a video decoder for VVC. Specifically, Figure 2 shows a block diagram of a typical decoder 200. The decoder 200 has a bitstream 210, entropy decoding 212, inverse quantization 214, inverse transform 216, adder 218, intra / intermode selection 220, intra prediction 222, memory 230, in-loop filter 228, motion compensation 224, picture buffer 226, prediction-related information 234, and video output 232.

[0033] Decoder 200 is similar to the reconstruction-related portion present in encoder 100 in Figure 1. In decoder 200, the input video bitstream 210 is first decoded via entropy decoding 212 to derive quantized coefficient levels and prediction-related information. The quantized coefficient levels are then processed via inverse quantization 214 and inverse transform 216 to obtain reconstructed prediction residuals. The block prediction mechanism implemented in intra / intermode selection 220 is configured to perform either intra prediction 222 or motion compensation 224 based on the decoded prediction information. The reconstructed prediction residuals from the inverse transform 216 and the prediction output generated by the block prediction mechanism are summed using adder 218 to obtain a set of unfiltered reconstructed pixels.

[0034] The reconstructed blocks can pass through the in-loop filter 228 before being stored in the picture buffer 226, which functions as a reference picture store. The reconstructed video in the picture buffer 226 may be used not only to predict future video blocks but also to drive the display device. When the in-loop filter 228 is on, filtering operations are performed on these reconstructed pixels to derive the final reconstructed video output 232.

[0035] Figure 2 shows a typical block diagram of a block-based video decoder. The video bitstream is first entropically decoded in the entropy decoding unit. The encoding mode and prediction information are sent to either the spatial prediction unit (if intra-encoded) or the temporal prediction unit (if inter-encoded) to form a prediction block. The residual transformation coefficients are sent to the inverse quantization unit and the inverse transform unit to reconstruct the residual block. The prediction block and the residual block are then added together. The reconstructed block can undergo further in-loop filtering before being stored in the reference picture storage. The reconstructed video is then sent to the reference picture storage to drive the display device and is also used to predict future video blocks.

[0036] Generally, the basic inter-prediction techniques applied to VVC remain the same as those of HEVC, except that some modules are further extended and / or enhanced. In particular, for all preceding video standards, one encoded block is associated with only one single MV if the encoded block is single-predicted, and with two MVs if the encoded block is bi-predicted. Due to this limitation of conventional block-based motion compensation, small motions can remain within the predicted samples even after motion compensation, thus negatively impacting the overall efficiency of motion compensation. To improve both the granularity and accuracy of MVs, two sample-wise refinement methods based on optical flow, namely PROF (prediction refinement with optical flow) and BDOF (bi-directional optical flow) for affine modes, are currently being considered for the VVC standard. Below, we will briefly examine the main technical aspects of the two inter-coding tools.

[0037] Bidirectional Optical Flow (BDOF) In VVC, BDOF is applied to refine the predicted samples of bi-predicted coded blocks. Specifically, as shown in FIG. 4, BDOF is a sample-wise motion refinement that is performed on block-based motion compensation prediction when bi-prediction is used.

[0038] FIG. 4 shows a diagram of the BDOF model according to the present disclosure.

[0039] The motion refinement (v x , v y ) of each 4×4 sub-block is calculated by minimizing the difference between the L0 predicted sample and the L1 predicted sample after BDOF is applied within one of the 6× six windows Ω around the sub-block. Specifically, the value of (v x , v y ) is derived as follows.

Equation

[0040] The L-shaped and the L-shaped symbol with left-right inversion are floor functions, clip3(min, max, x) is a function that clips a given value x within the range of [min, max], the symbol >> represents a bitwise right shift operation, the symbol << represents a bitwise left shift operation, and th BDOF is a motion refinement threshold for preventing propagation errors due to irregular local motion, equal to 1<<max(5, bit-depth - 7), and bit-depth is the internal bit depth.

[0041] In (1), it is as follows.

Equation

[0042] The values of S1, S2, S3, S5, and S6 are calculated as follows.

Equation

number

number

[0043] Based on the motion refinement derived in equation (1), the final bi-predicted sample of CU is calculated by interpolating the L0 / L1 predicted sample along the motion trajectory based on the optical flow model, as follows:

number

[0044] Affine Mode In HEVC, only translational motion models were applied for motion compensation prediction. In contrast, in the real world, there are many types of motion, such as zooming in / out, rotation, perspective motion, and other irregular motions. In VVC, affine motion compensation prediction is applied by signaling a single flag for each inter-coded block, indicating whether a translational motion model or an affine motion model is applied to the inter-prediction. The current VVC design supports two affine modes for a single affine coded block, including a 4-parameter affine mode and a 6-parameter affine mode.

[0045] The four-parameter affine model has the following parameters: two parameters for horizontal and vertical translational motion, one parameter for zoom motion, and one parameter for rotational motion in both directions. The horizontal zoom parameter is the same as the vertical zoom parameter. The horizontal rotation parameter is the same as the vertical rotation parameter. To achieve better adaptation of motion vectors and affine parameters, in VVC, these affine parameters are translated into two MVs (also called CPMV (control point motion vector)) located at the upper left and upper right corners of the current block. As shown in Figures 5A and 5B, the affine motion field of the block is described by the two control point MVs (V0, V1).

[0046] Figure 5A shows a diagram of the four-parameter affine model according to this disclosure.

[0047] Figure 5B shows a diagram of the four-parameter affine model according to this disclosure.

[0048] Based on the control point motion, the motion field of one affine coding block (v x ,v y ) is written as follows:

number

[0049] The 6-parameter affine mode has two parameters for horizontal and vertical translational motion, one parameter for zoom motion, and one parameter for horizontal rotational motion, one parameter for zoom motion, and one parameter for vertical rotational motion. The 6-parameter affine motion model is encoded with three MVs in three CPMVs.

[0050] Figure 6 shows a diagram of the 6-parameter affine model according to this disclosure.

[0051] As shown in Figure 6, the three control points of a six-parameter affine block are located at the top-left, top-right, and bottom-left corners of the block. The motion at the top-left control point relates to translational motion, the motion at the top-right control point relates to horizontal rotation and zoom motion, and the motion at the bottom-left control point relates to vertical rotation and zoom motion. Compared to a four-parameter affine motion model, the horizontal rotation and zoom motion of the six-parameter model do not necessarily have to be the same as the vertical motion. Assuming (V0, V1, V2) are the motion vectors (MV) of the top-left, top-right, and bottom-left corners of the current block in Figure 6, the motion vectors (v) of each subblock are given by x ,v y ) is derived using three MVs at the control points as follows.

number

[0052] Predictive refinement of affine modes using optical flow To improve the accuracy of affine motion compensation, PROF is currently being considered in the current VVC, which refines subblock-based affine motion compensation based on the optical flow model. Specifically, after performing subblock-based affine motion compensation, the brightness prediction sample of one affine block is corrected by one sample refinement value derived based on the optical flow equation. In detail, the calculation of PROF can be summarized as follows in four steps.

[0053] Step 1: Subblock-based affine motion compensation is performed to generate subblock predictions I(i,j) using subblock MVs such that they are derived in (6) for 4-parameter affine models and in (7) for 6-parameter affine models.

[0054] Step 2: Spatial gradient g for each predicted sample x (i,j) and g y (i,j) is calculated as follows:

number

[0055] To calculate the gradient, one additional row / column of prediction samples needs to be generated on each side of a subblock. To reduce memory bandwidth and complexity, samples on the extended boundary are copied from the nearest integer pixel position in the reference picture to avoid additional interpolation processes.

[0056] Step 3: The brightness prediction refinement value is calculated as follows:

number

number

[0057] Figure 7 shows the PROF process for affine mode according to this disclosure.

[0058] Since the affine model parameters and the pixel position relative to the subblock center do not change for each subblock, Δv(i,j) can be calculated for the first subblock and reused for other subblocks within the same CU. Let Δx and Δy be the horizontal and vertical offsets from the sample position (i,j) to the center of the subblock to which the sample belongs, and Δv(i,j) can be derived as follows.

number

[0059] Based on the affine subblock MV derivation equations (6) and (7), the MV difference Δv(i,j) can be derived. Specifically, for a four-parameter affine model, it is as follows:

number

[0060] In a 6-parameter affine model, the following applies:

number

[0061] Local lighting compensation LIC (local illumination compensation) is an encoding tool used to address the problem of local illumination changes between temporally adjacent pictures. A pair of weight and offset parameters are applied to a reference sample to obtain a predicted sample for one current block. A general mathematical model is given as follows:

number

number

[0062] I represents the number of samples in the template. P c [x i ] is the i-th sample of the template for the current block, P r [x i ] is a reference sample of the i-th template sample based on the motion vector v.

[0063] In addition to being applied to normal interblocks containing at most one motion vector for each prediction direction (L0 or L1), LIC is also applied to affine mode coded blocks, where one coded block can be further divided into multiple smaller subblocks, each subblock associated with different motion information. To derive reference samples for LIC of affine mode coded blocks, as shown in Figures 16A and 16B (described later), reference samples in the top-level template of one affine coded block are fetched using the motion vectors of each subblock in the top-level subblock row, while reference samples in the left-side template are fetched using the motion vectors of the subblocks in the left-side subblock column. Then, as shown in (12), the same LLMSE derivation method is applied to derive the LIC parameters based on the composite template.

[0064] Figure 16A shows a diagram for deriving a template sample for affine mode according to this disclosure. The diagram includes Cur Frame 1620 and Cur CU 1622. Cur Frame 1620 is the current frame. Cur CU 1622 is the current coding unit.

[0065] Figure 16B shows a diagram for deriving template samples for affine mode. The diagram includes Ref Frame 1640, Col CU 1642, A Ref 1643, B Ref 1644, C Ref 1645, D Ref 1646, E Ref 1647, F Ref 1648, and G Ref 1649. Ref Frame 1640 is the reference frame. Col CU 1642 is the juxtaposed coding unit. A Ref 1643, B Ref 1644, C Ref 1645, D Ref 1646, E Ref 1647, F Ref 1648, and G Ref 1649 are the reference samples.

[0066] Inefficiency of predictive refinement using optical flow for affine modes While PROFs can improve the coding efficiency of affine modes, their design can still be further improved. In particular, given that both PROFs and BDOFs are built on the optical flow concept, it is highly desirable to harmonize the designs of PROFs and BDOFs as much as possible so that PROFs can make the most of the existing logic of BDOFs to facilitate hardware implementation. Based on these considerations, the following inefficiencies in the interaction between current PROF and BDOF designs are identified in this disclosure.

[0067] Firstly, as stated in the "predictive refinement by optical flow for affine modes" term of equation (8), the accuracy of the gradient is determined based on the internal bit depth. On the other hand, the MV difference, i.e., Δv x Δv y This is always derived with an accuracy of 1 / 32-pel. Correspondingly, based on equation (9), the accuracy of the derived PROF refinement depends on the internal bit depth. However, similar to BDOF, PROF is applied to the predicted sample value at an intermediate high bit depth (i.e., 16 bits) to maintain higher PROF derivation accuracy. Therefore, regardless of the internal coding bit depth, the accuracy of the prediction refinement derived by PROF must match the accuracy of the intermediate prediction sample, i.e., 16 bits. In other words, the representation bit depths of the MV difference and gradient in existing PROF designs are not perfectly aligned to derive accurate prediction refinements compared to the prediction sample accuracy (i.e., 16 bits). On the other hand, based on the comparison of equations (1), (4), and (8), existing PROF and BDOF use different accuracies to represent the sample gradient and MV difference. As previously noted, such inconsistent designs are undesirable for hardware because they prevent the reuse of existing BDOF logic.

[0068] Secondly, as discussed in the section "Prediction Refinement by Optical Flow for Affine Modes," when a current affine block is bi-predicted, PROF is applied separately to the prediction samples of lists L0 and L1, and then the enhanced L0 and L1 prediction signals are averaged to produce the final bi-prediction signal. Conversely, instead of deriving PROF refinements separately for each prediction direction, BDOF derives a prediction refinement once, which is then applied to enhance the combined L0 and L1 prediction signals. Figures 8 and 9 (described later) compare the current BDOF and PROF workflows for bi-prediction. In actual codec hardware pipeline designs, different primary encoding / decoding modules are typically assigned to each pipeline stage to allow for the parallel processing of more encoded blocks. However, differences between the BDOF and PROF workflows can make it difficult to have one identical pipeline design that can be shared by BDOF and PROF, which is disadvantageous for practical codec implementations.

[0069] Figure 8 shows the workflow of the BDOF according to this disclosure. Workflow 800 includes L0 motion compensation 810, L1 motion compensation 820, and BDOF 830. L0 motion compensation 810 may be, for example, a list of motion compensation samples from a previous reference picture. The previous reference picture is a reference picture prior to the current picture in the video block. For example, L1 motion compensation 820 may be a list of motion compensation samples from a next reference picture. The next reference picture is a reference picture after the current picture in the video block. BDOF 830 takes motion compensation samples from L1 motion compensation 810 and L1 motion compensation 820 and outputs predicted samples, as described above with respect to Figure 4.

[0070] Figure 9 shows the workflow of an existing PROF according to this disclosure. Workflow 900 includes L0 motion compensation 910, L1 motion compensation 920, L0 PROF 930, L1 PROF 940, and average 960. L0 motion compensation 910 may be, for example, a list of motion compensation samples from a previous reference picture. The previous reference picture is a reference picture prior to the current picture in the video block. For example, L1 motion compensation 920 may be a list of motion compensation samples from a next reference picture. The next reference picture is a reference picture after the current picture in the video block. L0 PROF 930 takes L0 motion compensation samples from L0 motion compensation 910 and outputs a motion refinement value, as described with respect to Figure 7 above. L1 PROF 940 takes L1 motion compensation samples from L1 motion compensation 920 and outputs a motion refinement value, as described with respect to Figure 7 above. The average of 960 averages the motion refinement value outputs of L0 PROF930 and L1 PROF940.

[0071] Thirdly, for both BDOF and PROF, the gradient must be calculated for each sample in the current encoded block, which requires generating one additional row / column of predicted samples on each side of the block. To avoid the complexity of the additional calculation of sample interpolation, predicted samples in the extended region around the block are copied directly from the reference sample at integer positions (i.e., without interpolation). However, existing designs select integer samples at different positions to generate the gradient values ​​for BDOF and PROF. Specifically, for BDOF, integer reference samples located to the left of the predicted sample (for horizontal gradients) and above the predicted sample (for vertical gradients) are used, while for PROF, the integer reference sample closest to the predicted sample is used for gradient calculation. Similar to the bit depth representation problem, such non-uniform gradient calculation methods are undesirable for hardware codec implementations.

[0072] Fourth, as previously noted, the motivation for PROF is to compensate for small MV differences between the MV of each sample and the subblock MV derived at the center of the subblock to which the sample belongs. According to the current PROF design, PROF is always invoked when a single coded block is predicted by an affine mode. However, as shown in equations (6) and (7), the subblock MV of a single affine block is derived from the control point MV. Therefore, if the difference between control point MVs is relatively small, the MV at each sample position should be consistent. In such cases, the benefits of applying PROF may be very limited, and considering the performance / complexity trade-off, it may not be worthwhile to perform PROF.

[0073] Improvement of predictive refinement for affine modes using optical flow This disclosure provides methods for improving and simplifying existing PROF designs to facilitate hardware codec implementation. Particular attention is paid to harmonizing the designs of BDOF and PROF in order to share existing BDOF logic with PROF as much as possible, and the main aspects of the techniques proposed in this disclosure are generally summarized below.

[0074] Firstly, in order to improve the coding efficiency of PROF while achieving one or more unified designs, we propose a method to unify the representative bit depth of the sample gradient and MV difference used by BDOF and PROF.

[0075] Secondly, to facilitate hardware pipeline design, it is proposed to harmonize the PROF workflow with the BDOF workflow for bi-prediction. Specifically, unlike existing PROFs that derive predictive refinements separately for L0 and L1, the proposed method derives a predictive refinement that is applied to the combined predictive signal of L0 and L1.

[0076] Thirdly, two methods are proposed to harmonize the derivation of integer reference samples in order to compute the gradient values ​​used by BDOF and PROF.

[0077] Fourthly, to reduce computational complexity, we propose an early termination method that adaptively disables the PROF process for affine-coded blocks when certain conditions are met.

[0078] Improvement of the bit depth representation design for PROF gradient and MV difference. As analyzed in the "Problem Description" section, the bit depth representation of the MV difference and the sample gradient in the current PROF are not aligned to derive accurate predictive refinement. Furthermore, the bit depth representation of the sample gradient and the MV difference are inconsistent between BDOF and PROF, which is disadvantageous for hardware. In this section, we propose an improved bit depth representation method by extending the BDOF bit depth representation method to PROF. Specifically, the proposed method calculates the horizontal and vertical gradients at each sample position as follows.

number

[0079] In addition, assuming that Δx and Δy are the horizontal and vertical offsets expressed with 1 / 4-pel accuracy from one sample position to the center of the subblock to which the sample belongs, the corresponding PROF MV difference Δv(x,y) at the sample position can be derived as follows:

number

number

[0080] In a 6-parameter affine model, the following applies:

number

[0081] In the above description, a pair of fixed right shifts are applied to calculate the gradient and MV difference values, as shown in equations (13) and (14). In practice, different bit-direction right shifts can be applied to (13) and (14) to achieve various representation accuracies of the gradient and MV difference for different trade-offs between intermediate calculation accuracy and bit width in the internal PROF derivation process. For example, if the input video contains a lot of noise, the derived gradient may not be reliable in representing the true local horizontal / vertical gradient values ​​at each sample. In such cases, it makes more sense to represent the MV difference using more bits than the gradient value. On the other hand, if the input video exhibits stationary motion, the MV difference derived by the affine model must be very small. In such cases, there is no additional benefit in improving the accuracy of the derived PROF refinement by using a high-precision MV difference. In other words, in such cases, it is beneficial to use more bits to represent the gradient value. Based on the above considerations, one general method for calculating the gradient and MV difference of a PROF is proposed below in one or more embodiments of the present disclosure, specifically, the horizontal and vertical gradients at each sample position are the difference between adjacent predicted samples n aAssume that the calculation is performed by applying a right shift.

number

number

number

[0082] In some embodiments of this disclosure, another PROF bit depth control method is proposed as follows: In this method, the horizontal and vertical gradients at each sample position are right-shifted to the difference value of the adjacent predicted sample by n a By applying the bits, it is still calculated as in (15). The corresponding PROF MV difference Δv(x,y) at the sample position should be calculated as follows:

number

[0083] Furthermore, in order to maintain the entire PROF derivation at an appropriate internal bit depth, clipping is applied to the derived MV difference as follows:

number

number

[0084] Furthermore, in one or more embodiments of the present disclosure, a single PROF bit depth control solution is proposed. In this method, the horizontal and vertical PROF motion refinements at each sample position (i, j) are derived as follows:

number

[0085] Furthermore, the derived horizontal and vertical motion refinements are clipped as follows:

number

[0086] Given the motion refinement derived above, the final PROF sample refinement at position (i, j) is calculated as follows:

number

[0087] In another embodiment, a different PROF bit depth control solution is proposed. In the second method, the horizontal and vertical PROF motion refinements at the sample position (i, j) are derived as follows:

number

[0088] Next, the derived motion refinement is clipped as follows:

number

[0089] Therefore, given the motion refinement derived above, the final PROF sample refinement at position (i, j) is calculated as follows:

number

[0090] In one or more embodiments of this disclosure, it is proposed to combine a method for controlling the motion refinement accuracy of a solution with a method for deriving the PROF sample refinement of a second solution. Specifically, by this method, the horizontal and vertical PROF motion refinements at each sample position (i, j) are derived as follows.

number

[0091] Furthermore, the derived horizontal and vertical motion refinements are clipped as follows:

number

[0092] Finally, given the motion refinement derived above, the final refinement of the PROF sample at position (i, j) is calculated as follows:

number

[0093] Harmonized workflow of PROF and BDOF for biprediction As discussed earlier, when a single affine-encoded block is bi-predicted, the current PROF is applied unidirectionally. More specifically, PROF sample refinements are derived separately and applied to the predicted samples in lists L0 and L1. The refined predicted signals from lists L0 and L1 are then averaged to produce the final bi-predicted signal for the block. This is in contrast to the BDOF design, where sample refinements are derived and applied to the bi-predicted signal. Such differences between the bi-prediction workflows of BDOF and PROF can be detrimental to practical codec pipeline design.

[0094] To facilitate hardware pipeline design, one simplification method in this disclosure is to modify the PROF bi-prediction process so that the workflows of the two predictive refinement methods are harmonized. Specifically, instead of applying refinement separately for each predictive direction, the proposed PROF method first derives the predictive refinement based on the control point MV of lists L0 and L1. Then, to enhance quality, the derived predictive refinement is applied to the combined L0 and L1 predictive signals. Specifically, based on the MV difference derived in equation (14), the final bi-prediction sample of one affine coding block is calculated by the proposed method as follows:

number

[0095] Figure 12 shows the corresponding PROF process when the proposed bi-prediction PROF method is applied. PROF process 1200 includes L0 motion compensation 1210, L1 motion compensation 1220, and bi-prediction PROF 1230. For example, L0 motion compensation 1210 may be a list of motion compensation samples from a previous reference picture. The previous reference picture is the reference picture before the current picture in the video block. L1 motion compensation 1220 may be, for example, a list of motion compensation samples from the next reference picture. The next reference picture is the reference picture after the current picture in the video block. Bi-prediction PROF 1230 takes motion compensation samples from L1 motion compensation 1210 and L1 motion compensation 1220 as described above and outputs bi-prediction samples.

[0096] To demonstrate the potential benefits of the proposed method for hardware pipeline design, Figure 13 shows one example illustrating the pipeline stages when both BDOF and the proposed PROF are applied. In Figure 13, the decoding process for one interblock mainly consists of three steps. First, the MV of the encoded block is parsed / decoded and the reference sample is fetched. Secondly, the L0 and / or L1 prediction signals of the coded block are generated. Thirdly, if the coded block is predicted by a single non-affine mode, sample-wise refinement of the generated bipredicted samples is performed based on BDOF, and if the coded block is predicted by an affine mode, PROF is performed.

[0097] Figure 13 shows an example of pipeline stages when both BDOF and the proposed PROF are applied according to this disclosure. Figure 13 illustrates the potential advantages of the proposed method for hardware pipeline design. Pipeline stage 1300 includes parsing / decoding the MV and fetching the reference sample 1310, motion compensation 1320, and BDOF / PROF 1330. Pipeline stage 1300 encodes video blocks BLK0, BKL1, BKL2, BKL3, and BLK4. Each video block begins parsing / decoding the MV, fetches the reference sample 1310, moves to motion compensation 1320, and then sequentially moves to motion compensation 1320, BDOF / PROF 1330. This means for BLK0 does not start processing in pipeline stage 1300 until BLK0 moves to motion compensation 1320. This is the same for all stages and video blocks as time progresses from T0 to T1, T2, T3, and T4.

[0098] In Figure 13, the decoding process for one interblock mainly involves three steps. First, the MV of the encoded block is parsed / decoded and the reference sample is fetched. Secondly, the L0 and / or L1 prediction signals of the coded block are generated. Thirdly, if the coded block is predicted by a single non-affine mode, sample-wise refinement of the generated bipredicted samples is performed based on BDOF, and if the coded block is predicted by an affine mode, PROF is performed.

[0099] As shown in Figure 13, after the proposed harmonization method is applied, both BDOF and PROF are applied directly to the bi-prediction samples. Assuming that BDOF and PROF are applied to different types of coded blocks (i.e., BDOF is applied to non-affine blocks and PROF is applied to affine blocks), the two coding tools cannot be invoked simultaneously. Therefore, their corresponding decoding processes can be performed by sharing the same pipeline stage. This is more efficient than existing PROF designs, and assigning the same pipeline stage to both BDOF and PROF is difficult due to the different workflows of bi-prediction.

[0100] While the proposed method only considers the harmonization of BDOF and PROF workflows, existing designs also use two coding tools with different basic unit sizes. For example, in the case of BDOF, one coding block is of size W s ×H s It is divided into multiple subblocks. Here, W s =min(W, 16) and H s =min(H, 16), where W and H are the width and height of the coded block. BODF operations, such as gradient calculation and sample refinement derivation, are performed independently for each subblock. On the other hand, as previously mentioned, the affine coded block is divided into 4x4 subblocks, each subblock assigned one individual MV derived based on either a 4-parameter or 6-parameter affine model. Since PROF is applied only to affine blocks, its basic unit of operation is a 4x4 subblock. Similar to the bi-prediction workflow problem, using different basic unit sizes for PROF from BDOF is inconvenient for hardware implementation and makes it difficult for BDOF and PROF to share the same pipeline stage throughout the decoding process. To solve such problems, in one or more embodiments, it is proposed to make the subblock size of the affine mode the same as the subblock size of the BDOF.

[0101] According to the proposed method, when one coded block is coded by affine mode, it becomes W s ×H s It is divided into subblocks of size W. s =min(W,16) and H s =min(H,16), where W and H are the width and height of the coding block. Each subblock is assigned one separate MV and is considered an independent PROF calculation unit. It is worth noting that an independent PROF calculation unit ensures that the PROF calculations performed on it are performed without referencing information from adjacent PROF calculation units. Specifically, the PROF MV difference at one sample position is calculated as the difference between the MV at the sample position and the MV at the center of the PROF calculation unit where the sample is located, and the gradient used by the PROF derivation is calculated by padding the samples along each PROF calculation unit. The claimed advantages of the proposed method mainly include the following aspects: 1) a simplified pipeline architecture with a unified basic calculation unit size for both motion compensation and BDOF / PROF improvement; 2) reduced memory bandwidth usage due to an enlarged subblock size for affine motion compensation; and 3) reduced per-sample computational complexity for fractional sample interpolation.

[0102] It should be noted that, in order to reduce the computational complexity of the proposed method (i.e., item 3), the existing 6-tap interpolation filter constraint for affine-coded blocks can be removed. Instead, the default 8-tap interpolation for non-affine-coded blocks is also used for affine-coded blocks. In this case, the overall computational complexity remains comparable to the existing PROF design (based on 4x4 subblocks with 6-tap interpolation filters).

[0103] Harmonization of gradient derivations for BDOF and PROF As mentioned earlier, both BDOF and PROF calculate the gradient of each sample in the current encoded block, accessing one additional row / column of predicted samples on each side of the block. To avoid the complexity of additional interpolation, the required predicted samples in the extended region around the block boundary are copied directly from the integer reference samples. However, as noted in the "Problem Description" section, the gradient values ​​for BDOF and PROF are calculated using integer samples at different locations.

[0104] To achieve another uniform design, we propose two methods to unify the gradient derivation methods used by BDOF and PROF. The first method proposes aligning the gradient derivation method of PROF to be the same as that of BDOF. Specifically, in the first method, the integer positions used to generate prediction samples within the extended region are determined by flooring down the fractional sample positions; that is, the selected integer sample positions are placed to the left of the fractional sample positions (in the case of horizontal gradients) and above the fractional sample positions (in the case of vertical gradients). The second method proposes making the gradient derivation method of BDOF the same as that of PROF, and more specifically, when applying the second method, the integer reference sample closest to the prediction sample is used for gradient calculation.

[0105] Figure 14 shows an example of using the gradient derivation method for BDOF according to this disclosure. In Figure 14, white circles represent reference samples at integer positions, triangles represent fractional predicted samples for the current block, and black circles represent integer reference samples used to fill the extension region of the current block.

[0106] Figure 15 shows an example of using the PROF gradient derivation method according to this disclosure. In Figure 15, white circles represent reference samples at integer positions, triangles represent fractional predicted samples for the current block, and black circles represent integer reference samples used to fill the extended region of the current block.

[0107] Figures 14 and 15 show the corresponding integer sample locations used to derive the gradients of the BDOF and PROF when the first method (Figure 14) and the second method (Figure 15) are applied, respectively. In Figures 14 and 15, white circles represent reference samples at integer locations, triangles represent fractional predicted samples for the current block, and patterned circles represent integer reference samples used to fill the extended region of the current block for gradient deriving.

[0108] Furthermore, according to existing BDOF and PROF designs, predictive sample padding is performed at different coding levels. Specifically, in the case of BDOF, padding is applied along the boundary of each sbWidth x sbHeight subblock, where sbWidth = min(CUWidth, 16) and sbHeight = min(CUHeight, 16), where CUWidth and CUHeight are the width and height of 1 CU. On the other hand, padding for PROF is always applied at the 4x4 subblock level. In the above description, only the padding method is unified between BDOF and PROF, while the padding subblock size still differs. This is also not very hardware-friendly, considering that different modules need to be implemented for the padding process of BDOF and PROF. To achieve another unified design, it is proposed to unify the subblock padding size of BDOF and PROF. In one or more embodiments of this disclosure, it is proposed to apply predictive sample padding for BDOF at the 4x4 level. Specifically, this method first divides the CU into multiple 4x4 subblocks, and after motion compensation for each 4x4 subblock, the extended samples along the top / bottom and left / right boundaries are padded by copying the corresponding integer sample positions. Figures 18A, 18B, 18C, and 18D show one example of the proposed padding method being applied to a single 16x16BDOF CU, where the dashed lines represent the 4x4 subblock boundaries and the black bands represent the padded samples of each 4x4 subblock.

[0109] Figure 18A shows the proposed padding method applied to a 16×16 BDOF CU according to this disclosure, where the dashed line represents the upper left 4×4 subblock boundary 1820.

[0110] Figure 18B shows a proposed padding method applied to a 16×16 BDOF CU, where the dashed line represents the upper right 4×4 subblock boundary 1840 according to this disclosure.

[0111] Figure 18C shows the proposed padding method applied to a 16×16 BDOF CU according to this disclosure, where the dashed line indicates the lower left 4×4 subblock boundary 1860.

[0112] Figure 18D shows the proposed padding method applied to a 16×16 BDOF CU according to this disclosure, where the dashed line indicates the lower right 4×4 subblock boundary 1880.

[0113] High-level signaling syntax for enabling / disabling BDOF, PROF, and DMVR. In existing BDOF and PROF designs, two different flags signal in the Sequence Parameter Set (SPS) to control the enable / disable status of the two encoding tools separately. However, due to the similarity between BDOF and PROF, it is more desirable to enable and / or disable BDOF and PROF from a high level with a single, identical control flag. Based on this consideration, the SPS introduces a new flag called SPS_bdof_prof_enabled_flag, as shown in Table 1. As shown in Table 1, enabling and disabling BDOF depends solely on sps_BDOF_prof_enabled_flag. When the flag is equal to 1, BDOF is enabled for encoding video content in the sequence. Otherwise, when sps_BDOF_prof_enabled_flag is equal to 0, BDOF is not applied. On the other hand, in addition to SPS_bdof_PROF_enabled_flag, an SPS-level affine control flag, namely SPS_affine_enabled_flag, is also used to conditionally enable and disable PROF. If both SPS_bdof_PROF_enabled_flag and SPS_affine_enabled_flag are equal to 1, PROF is enabled for all coded blocks encoded in affine mode. If the flag sps_bdof_PROF_enabled_flag is equal to 1 and sps_affine_enabled_flag is equal to 0, PROF is disabled. [Table 1] [Table 2] Table 1: Changes to the SPS syntax table based on proposed BDOF / PROF enable / disable flags

[0114] The `sps_bdof_prof_enabled_flag` flag specifies whether to enable bidirectional optical flow and predictive refinement by Optical Flow. If `sps_bdof_prof_enabled_flag` is equal to 0, both bidirectional optical flow and predictive refinement by Optical Flow are disabled. If `sps_bdof_prof_enabled_flag` is equal to 1 and `sps_affine_enabled_flag` is equal to 1, both bidirectional optical flow and predictive refinement by Optical Flow are enabled. Otherwise (if `sps_bdof_prof_enabled_flag` is equal to 1 and `sps_affine_enabled_flag` is equal to 0), bidirectional optical flow is enabled and predictive refinement by Optical Flow is disabled.

[0115] The sps_bdof_prof_dmvr_slice_preset_flag specifies when the flag slice_disable_bdof_prof_dmvr_flag is signaled at the slice level. If the flag is equal to 1, the syntax slice_disable_bdof_prof_dmvr_flag is signaled for each slice that references the current sequence parameter set. Otherwise (if sps_bdof_prof_dmvr_slice_present_flag is 0), the syntax slice_disabled_bdof_prof_dmvr_flag is not signaled at the slice level. If the flag is not signaled, it is inferred to be 0.

[0116] Furthermore, if the proposed SPS level BDOF and PROF control flags are used, the corresponding control flag no_BDOF_constraint_flag in the general constraint information syntax must also be modified as follows: [Table 3] [Table 4]

[0117] A no_bdof_prof_constraint_flag equal to 1 specifies that sps_bdof_prof_enabled_flag is equal to 0. A no_bdof_constraint_flag equal to 0 means no constraints are imposed.

[0118] In addition to the above SPS BDOF / PROF syntax, it has been proposed to introduce another control flag at the slice level. Specifically, slice_disable_BDOF_PROF_DMVR_flag has been introduced to disable BDOF, PROF, and DMVR. The SPS flag SPS_bdof_prof_DMVR_slice_present_flag is signaled in SPS if either the DMVR or BDOF / PROF SPS level control flag is true and is used to indicate the presence of slice_disable_bdof_prof_DMVR_flag. If present, slice_disable_bdof_dmvr_flag is signaled. Table 2 shows the modified slice header syntax table after the proposed syntax is applied. In another embodiment, it has been proposed to use two control flags in the slice header to individually control the enabling / disabling of BDOF and DMVR and the enabling / disabling of PROF, specifically the use of two flags in the slice header in this manner. Specifically, one flag, slice_disable_BDOF_DMVR_slice_flag, controls the on / off state of BDOF, while the other flag, disable_PROF_slice_flag, controls the on / off state of PROF independently. [Table 5] Table 2. Proposed changes to the SPS syntax table using BDOF / PROF enable / disable flags.

[0119] In another embodiment, it is proposed to control BDOF and PROF separately using two different SPS flags. Specifically, two separate SPS flags, SPS_bdof_enable_flag and SPS_prof_enable, are introduced to enable / disable the two tools individually. Furthermore, to forcibly disable the PROF tool, one high-level control flag, no_PROF_constraint_flag, must be added to the general_constrain_info() syntax table. [Table 6] [Table 7]

[0120] The sps_bdof_enabled_flag specifies whether to enable bidirectional optical flow. If sps_bdof_enabled_flag is 0, bidirectional optical flow is disabled. If sps_bdof_enabled_flag is 1, bidirectional optical flow is enabled.

[0121] The `sps_prof_enabled_flag` flag specifies whether to enable predictive refinement using optical flow. If `sps_prof_enabled_flag` is equal to 0, predictive refinement using optical flow is disabled. If `sps_prof_enabled_flag` is equal to 1, predictive refinement using optical flow is enabled. [Table 8] [Table 9]

[0122] A no_prof_constraint_flag equal to 1 specifies that sps_prof_enabled_flag is equal to 0. A no_prof_constraint_flag equal to 0 means no constraints are imposed.

[0123] At the slice level, in one or more embodiments of the disclosure, it is proposed to introduce a separate control flag at the slice level, namely, slice_disable_BDOF_PROF_DMVR_flag to disable BDOF, PROF, and DMVR together. In another embodiment, it is appropriate to add two distinct flags at the slice level, namely slice_disable_bdof_dmvr_flag and slice_disable_prof_flag. The first flag (i.e., slice_disable_BDOF_DMVR_flag) is used to adaptively switch BDOF and DMVR on / off for a single slice, and the second flag (i.e., slice_disable_PROF_flag) is used to control the enabling and disabling of the PROF tool at the slice level. Furthermore, when the second method is applied, the flag slice_disable_BDOF_DMVR_flag should only be signaled if either the SPS BDOF or SPS DMVR flag is enabled, and the flag should only be signaled if the SPS PROF flag is enabled.

[0124] Figure 11 shows the BDOF and PROF methods, which can be applied, for example, to a decoder.

[0125] In step 1110, the decoder can receive two GCI (general constraint information) level control flags. The two GCI level control flags can be signaled by the encoder and can include a first GCI level control flag and a second GCI level control flag. The first GCI level control flag indicates whether BDOF is permitted for decoding the current video sequence. The second GCI level control flag indicates whether PROF is permitted to decode the current video sequence.

[0126] In step 1112, the decoder can receive two SPS level control flags. The two SPS level control flags are signaled by the encoder within the SPS and signal whether BDOF and PROF are enabled for the current video block.

[0127] In step 1114, when the first SPS level control flag is enabled, the decoder can apply BDOF to derive motion refinement of the video block based on the first prediction sample I (0) (i, j) and the second prediction sample I (1) (i, j) when the video block is not encoded in affine mode.

[0128] In step 1116, when the second SPS level control flag is enabled, the decoder can apply PROF to derive motion refinement of the video block based on the first prediction sample I (0) (i, j) and the second prediction sample I (1) (i, j) when the video block is encoded in affine mode.

[0129] In step 1118, the decoder can obtain the prediction sample of the video block based on the motion refinement.

[0130] Early termination of PROF based on the control point MV difference According to the current PROF design, PROF is always called when one coding block is predicted by the affine mode. However, as shown in equations (6) and (7), the sub-block MV of one affine block is derived from the control point MV. Therefore, when the difference between the control point MVs is relatively small, the MVs at each sample position should be consistent. In such cases, the advantages of applying PROF can be very limited. Thus, to further reduce the average computational complexity of PROF, it is proposed to adaptively skip the PROF-based sample refinement based on the maximum MV difference between the sample-wise MV and the sub-block-wise MV within one 4x4 sub-block. Since the value of the PROF MV difference of the samples inside one 4x4 sub-block is symmetric with respect to the sub-block center, the maximum horizontal and vertical PROF MV differences can be calculated as follows based on equation (10).

Number

[0131] According to the present disclosure, different metrics can be used when determining whether the MV difference is small enough to skip the PROF process.

[0132] In one example, based on equation (19), when the sum of the absolute maximum horizontal MV difference and the absolute maximum vertical MV difference is smaller than one predetermined threshold, the PROF process can be skipped

Number

[0133] In another example, when the maximum value of |Δv x max | and |Δv y max | is below the threshold, the PROF process can be skipped.

Number

[0134] MAX(a, b) is a function that returns the larger value between the input values a and b.

[0135] Furthermore, in addition to the above two examples, the idea of the present disclosure is also applicable when other metrics are used in determining whether the MV difference is small enough to skip the PROF process. In the above method, PROF is skipped based on the magnitude of the MV difference. On the other hand, in addition to the MV difference, PROF sample refinement is also calculated based on local gradient information at each sample position within one motion compensation block. In prediction blocks with less high-frequency details (such as flat regions), the gradient value tends to be small so that the derived sample refinement value becomes small. Considering this, according to another embodiment, it is proposed to apply PROF only to the predicted samples of blocks containing sufficiently high-frequency information.

[0136] Different metrics can be used when determining whether a block contains sufficiently high-frequency information so that it is worthwhile to call the PROF process for the block. In one example, the decision is made based on the magnitude (i.e., absolute value) of the average of the gradients of the samples within the prediction block. If the average magnitude is smaller than a threshold value, the prediction block is classified as a flat region and PROF should not be applied; otherwise, the prediction block is considered to contain sufficient high-frequency details for which PROF is still applicable. In another example, the maximum magnitude of the gradients of the samples within the prediction block can be used. If the maximum magnitude is smaller than a threshold value, PROF should be skipped for the block. In yet another example, the difference I between the maximum sample value and the minimum sample value of the prediction block max -I minThe PROF can be used to determine whether it applies to a block. If such a difference value is smaller than a threshold, the PROF is skipped for the block. It should be noted that the idea of ​​this disclosure is also applicable when several other metrics are used to determine whether a given block contains sufficient high-frequency information.

[0137] Affine mode processing of the interaction between PROF and LIC Since the LIC uses the adjacent reconstructed samples (i.e., templates) of the current block to derive linear model parameters, decoding a single LIC-coded block depends on the complete reconstruction of its adjacent samples. Due to this interdependence, in actual hardware implementations, the LIC must be performed during the reconstruction phase in which adjacent reconstructed samples become available for LIC parameter derivation. Since block reconstruction must be performed sequentially (i.e., one by one), throughput (i.e., the amount of work that can be performed in parallel per unit time) is one important issue to consider when combining and applying other encoding methods to LIC-coded blocks. In this section, two methods are proposed to address the interaction when both PROF and LIC are valid for affine modes.

[0138] In a first embodiment of this disclosure, it is proposed to apply the PROF mode and the LIC mode exclusively to a single affine coding block. As previously stated, in existing designs, PROF is implicitly applied to all affine blocks without signaling, while a single LIC flag is signaled or inherited at the coding block level to indicate whether the LIC mode is applied to a single affine block. According to the method of the present invention, it is proposed to conditionally apply PROF based on the value of the LIC flag for a single affine block. If the flag is equal to 1, only LIC is applied by adjusting the predicted samples for the entire coding block based on the LIC weights and offsets. Otherwise (i.e., if the LIC flag is equal to 0), PROF is applied to the affine coding block and refines the predicted samples for each subblock based on the optical flow model.

[0139] Figure 17A shows one exemplary flowchart of the decryption process based on the proposed method, where the simultaneous application of PROF and LIC is prohibited.

[0140] Figure 17A illustrates an example of a decryption process based on the proposed method, in which PROF and LIC are not permitted. Decryption process 1720 includes checking whether the LIC flag is on (1722, LIC 1724, and PROF 1726). Checking whether the LIC flag is on (1722) is a step that determines whether the LIC flag is set and takes the next step according to that determination. LIC 1724 indicates that the application of LIC is that the LIC flag is set. If the LIC flag is not set, PROF 1726 is the application of PROF.

[0141] In a second embodiment of the present disclosure, it is proposed to apply LIC after PROF to generate a predictive sample of a single affine block. Specifically, after subblock-based affine motion compensation is performed, the predictive sample is refined based on PROF sample refinement, and then LIC is performed by applying weight and offset pairs (derived from a template and its reference sample) to the PROF-adjusted predictive sample to obtain the final predictive sample of the block as follows:

number

[0142] Figure 17B shows a diagram of the decoding process to which PROF and LIC are applied according to this disclosure. The decoding process 1760 includes affine motion compensation 1762, LIC parameter derivation 1764, PROF 1766, and LIC sample adjustment 1768. Affine motion compensation 1762 applies affine motion and is an input to LIC parameter derivation 1764 and PROF 1766. LIC parameter derivation 1764 is applied to derive the LIC parameters. PROF 1766 is the PROF being applied. LIC sample adjustment 1768 is the LIC weights and offset parameters combined with the PROF.

[0143] Figure 17B shows an example of the decoding workflow when the second method is applied. As shown in Figure 17B, since LIC computes the LIC linear model using a template (i.e., adjacent reconstructed samples), the LIC parameters can be derived as soon as adjacent reconstructed samples become available. This means that PROF refinement and LIC parameter derivation can be performed simultaneously.

[0144] The LIC weights and offsets (i.e., α and β) and the PROF refinement (i.e., ΔI[x]) are generally floating-point numbers. For user-friendly hardware implementation, these floating-point operations are typically implemented as a single integer value followed by a right shift operation and multiplication by a number of bits. In current LIC and PROF designs, the two tools are designed separately, so two different right shifts are each N LIC Bits and N PROF The bit is applied in two stages.

[0145] According to a third embodiment, it is proposed to apply high-precision LIC-based and PROF-based sample tuning to improve the coding gain when PROF and LIC are coupled and applied to an affine coding block. This is done by combining their two right-shift operations into one and applying it last to derive the final predicted sample of the current block (as shown in (12)).

[0146] Addressing the multiplication overflow problem when combining weighted prediction with PROF and CU-level weighted (BCW) bi prediction. According to the current PROF design in the VVC working draft, PROF can be applied in combination with weighted prediction (WP). Specifically, once the prediction signals of one affine CU are synthesized, it is generated using the following procedure:

[0147] First, for each sample at position (x, y), calculate the L0 prediction refinement ΔI0(x, y) based on PROF, and add the refinement to the original L0 prediction sample I0(x, y). [Number] I0’(x, y) is the refined sample, and g h0 (x, y), g v0 (x, y), Δv x0 (x, y), Δv y0 (x, y) are the L0 horizontal / vertical gradient and L0 horizontal / vertical motion refinement at position (x, y).

[0148] Second, for each sample at position (x, y), calculate the L1 prediction refinement ΔI1(x, y) based on PROF, and add the refinement to the original L1 prediction sample I1(x, y). [Number] I1’(x, y) is the refined sample, and g<00001'05>(x, y), g v1 (x, y), Δv x1 (x, y), Δv y1 (x, y) are the L1 horizontal / vertical gradient and L1 horizontal / vertical motion refinement at position (x, y).

[0149] Third, combine the refined L0 and L1 prediction samples. [Number] W0 and W1 are the weights for WP and BCW, respectively, and shift and Offset are the offset and right shift applied to the weighted average of the L0 and L1 prediction signals for bi-prediction of WP and BCW. Here, the parameters for WP include W0, W1, and Offset, and the parameters for BCW include W0, W1, and shift.

[0150] As can be seen from the above equation, sample-by-sample refinement, i.e., ΔI0(x,y) and ΔI1(x,y), results in the predicted samples after PROF (i.e., I0'(x,y) and I1'(x,y)) having a dynamic range one bit larger than the original predicted samples (i.e., I0(x,y) and I1(x,y)). Assuming that the refined predicted samples are multiplied by WP and BCW weight coefficients, this increases the length of the required multiplier. For example, based on the current design, if the internal coding bit depth is in the range of 8 to 12 bits, the dynamic range of the predicted signals I0(x,y) and I1(x,y) is 16 bits. However, after PROF, the dynamic range of the predicted signals I0'(x,y) and I1'(x,y) is 17 bits. Therefore, when PROF is applied, it can cause a 16-bit multiplication overflow problem. Several methods are proposed below to solve such an overflow problem.

[0151] Firstly, in the first method, it is proposed that WP and BCW be disabled when PROF is applied to one affine CU.

[0152] Secondly, the second method proposes applying a clipping operation to the derived sample refinement before adding it to the original predicted sample so that the dynamic range of the refined predicted samples I0'(x,y) and I1'(x,y) has the same dynamic bit depth as the original predicted samples I0(x,y) and I1(x,y). Specifically, in this method, the sample refinements ΔI0(x,y) and ΔI1(x,y) in (23) and (24) are modified by introducing a clipping operation as follows.

number

[0153] Figure 10 shows PROF's method, which can be applied, for example, to a decoder.

[0154] In step 1010, the decoder receives a first reference picture I associated with the video block encoded by the affine mode in the video signal. (0) and second reference picture I (1) You can obtain it.

[0155] In step 1012, the decoder selects the first reference picture I (0) (i,j) and second reference picture I (1) First predicted sample I associated with (i,j) (0) and second predicted sample I (1) Based on this, first and second horizontal gradient values ​​and vertical gradient values ​​can be obtained.

[0156] In Step 1014, the decoder processes the first reference picture I (0) and second reference picture I (1) Based on the CPMV associated with it, first and second horizontal and vertical motion refinements can be obtained.

[0157] In step 1016, the decoder performs a first prediction refinement ΔI based on the first and second horizontal and vertical gradient values, as well as the first and second horizontal and vertical motion refinements. (0) (i,j) and second prediction refinement △I (1) (i,j) can be obtained.

[0158] In step 1018, the decoder receives the first predicted sample I (0) (i,j), Second Prediction Sample I (1) (i,j), First prediction refinement △I (0) (i,j), Second prediction refinement △I (1) Based on (i,j) and prediction parameters, the video The final predicted sample of Oblock can be obtained. The prediction parameters can include weight and offset parameters for WP and BCW.

[0159] Firstly, the third method proposes directly clipping the refined predicted sample instead of clipping the sample refinement so that the refined sample has the same dynamic range as the original predicted sample. Specifically, the refined L0 and L1 samples obtained by the third method are as follows:

number

number

[0160] Secondly, the fourth method proposes applying a specific right shift to the refined L0 and L1 prediction samples before WP and BCW, and then adjusting the final prediction samples to the original accuracy by an additional left shift. Specifically, the final prediction samples are derived as follows:

number

[0161] Thirdly, in the fifth method, it is proposed to split each multiplication of the L0 / L1 predicted samples with the corresponding WP / BCW weights in (25) into two multiplications, both of which do not exceed 16 bits, as described below.

number

[0162] The above methods may be implemented using an apparatus that includes one or more circuits, including application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components. The apparatus may use the circuits in combination with other hardware or software components to perform the above methods. Each module, submodule, unit, or subunit disclosed above may be at least partially implemented using one or more circuits.

[0163] Figure 19 shows a computing environment 1910 coupled to a user interface 1960. The computing environment 1910 can be part of a data processing server. The computing environment 1910 includes a processor 1920, memory 1940, and an I / O interface 1950.

[0164] Processor 1920 controls the overall operation of the computing environment 1910, typically such as operations related to display, data acquisition, data communication, and image processing. Processor 1920 may include one or more processors to execute instructions that perform all or some of the steps in the method described above. Furthermore, processor 1920 may include one or more modules that facilitate interaction between processor 1920 and other components. These processors may be a central processing unit (CPU), a microprocessor, a single-chip machine, a GPU, etc.

[0165] Memory 1940 is configured to store various types of data to support the operation of the computing environment 1910. Memory 1940 may include pre-determined software 1942. Examples of such data include instructions for any application or method operating on the computing environment 1910, video datasets, image data, etc. Memory 1940 can be implemented by using any type of volatile or non-volatile memory device, or a combination thereof, such as SRAM (static random access memory), EEPROM (electrically erasable programmable read-only memory), EPROM (erasable programmable read-only memory), PROM (programmable read-only memory), ROM (read-only memory), magnetic memory, flash memory, magnetic disks, or optical disks.

[0166] The I / O interface 1950 provides an interface between the processor 1920 and peripheral interface modules such as a keyboard, click wheel, and buttons. The buttons may include, but are not limited to, a home button, a start scan button, and a stop scan button. The I / O interface 1950 can be coupled to an encoder and decoder.

[0167] In some embodiments, a non-temporary computer-readable recording medium is also provided, which contains multiple programs, such as those contained in memory 1940, that can be executed by a processor 1920 in a computing environment 1910, in order to carry out the method described above. For example, the non-temporary computer-readable recording medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

[0168] A non-temporary computer-readable recording medium stores multiple programs that are executed by a computing device having one or more processors, where the multiple programs, when executed by one or more processors, cause the computing device to perform the motion prediction method described above.

[0169] In some embodiments, the computing environment 1910 may be implemented using one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field-programmable gate arrays (FPGAs), graphical processing units (GPUs), controllers, microcontrollers, microprocessors, or other electronic components to perform the above-described method.

[0170] The descriptions in this disclosure are provided for illustrative purposes only and are not intended to be exhaustive or limitful to this disclosure. Many modifications, variations, and alternative embodiments will become apparent to those skilled in the art who benefit from the teachings presented in the foregoing description and the accompanying drawings.

[0171] The examples provided illustrate the principles of the Disclosure and are selected and described to enable those skilled in the art to understand the Disclosure for various implementations and to best utilize the underlying principles and various implementations with various modifications suitable for their specific intended use. Therefore, it should be understood that the scope of the Disclosure should not be limited to specific examples of the disclosed implementations, and that modifications and other implementations are intended to be included within the scope of the Disclosure.

[0172] [Cross-reference of related applications] This application is based on and claims priority to Provisional Application No. 62 / 901,774, filed on 17 September 2019, and Provisional Application No. 62 / 904,330, filed on 23 September 2019. The entire contents of those applications are incorporated herein by reference for all purposes.

Claims

1. Determine the two GCI (general constraint information) level control flags. The two GCI level control flags include a first GCI level control flag and a second GCI level control flag, The first GCI level control flag indicates whether the first SPS (sequence parameter set) level control flag is equal to 0. The second GCI level control flag indicates whether the second SPS level control flag is equal to 0. Video encoding method.

2. Determine the first SPS level control flag and the second SPS level control flag, The first SPS level control flag indicates whether BDOF (bi-directional optical flow) is enabled for the current video sequence. The second SPS level control flag indicates whether PROF (prediction refinement with optical flow) is enabled for the current video sequence. When the first SPS level control flag indicates that BDOF is enabled for the current video sequence, and the video block is encoded in non-affine mode, BDOF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). When the second SPS level control flag indicates that PROF is enabled for the current video sequence, and the video block is encoded in affine mode, PROF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). Based on the motion refinement, predict samples are obtained for the current video block. The video encoding method according to claim 1, further comprising the following:

3. The first GCI level control flag is signaled to indicate whether the first SPS level control flag is equal to 0. The second GCI level control flag is signaled to indicate whether the second SPS level control flag is equal to 0. The video encoding method according to claim 1, further comprising the following:

4. If the first SPS level control flag indicates that BDOF is enabled, the first control flag of the slice header is determined. The first control flag signals whether the BDOF is disabled for the video block of the slice. If the second SPS level control flag indicates that PROF is enabled, the second control flag of the slice header is determined. The second control flag signals whether the PROF is enabled for the video block of the slice. The video encoding method according to claim 1, further comprising the following:

5. One or more processors, A memory connected to one or more processors, Multiple programs stored in the aforementioned memory, A computing device including, When the aforementioned multiple programs are executed by one or more processors, Determine the two GCI (general constraint information) level control flags. The two GCI level control flags include a first GCI level control flag and a second GCI level control flag, The first GCI level control flag indicates whether the first SPS (sequence parameter set) level control flag is equal to 0. The second GCI level control flag indicates whether the second SPS level control flag is equal to 0. To cause the computing device to perform the operation, Computing device.

6. When the aforementioned multiple programs are executed by one or more processors, Determine the first SPS level control flag and the second SPS level control flag, The first SPS level control flag indicates whether BDOF (bi-directional optical flow) is enabled for the current video sequence. The second SPS level control flag indicates whether PROF (prediction refinement with optical flow) is enabled for the current video sequence. When the first SPS level control flag indicates that BDOF is enabled for the current video sequence, and the video block is not encoded in affine mode, BDOF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). When the second SPS level control flag indicates that PROF is enabled for the current video sequence, and the video block is encoded in affine mode, PROF is applied to derive the motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). Based on the motion refinement, predict samples are obtained for the current video block. To cause the computing device to perform further operations, The computing device according to claim 5.

7. When the aforementioned multiple programs are executed by one or more processors, The first GCI level control flag is signaled to indicate whether the first SPS level control flag is equal to 0. The second GCI level control flag is signaled to indicate whether the second SPS level control flag is equal to 0. To cause the computing device to perform further operations, The computing device according to claim 5.

8. When the aforementioned multiple programs are executed by one or more processors, If the first SPS level control flag indicates that BDOF is enabled, the first control flag of the slice header is determined. The first control flag signals whether the BDOF is disabled for the video block of the slice. If the second SPS level control flag indicates that PROF is enabled, the second control flag of the slice header is determined. The second control flag signals whether the PROF is enabled for the video block of the slice. To cause the computing device to perform further operations, The computing device according to claim 5.

9. Determine the two GCI (general constraint information) level control flags. The two GCI level control flags include a first GCI level control flag and a second GCI level control flag, The first GCI level control flag indicates whether the first SPS (sequence parameter set) level control flag is equal to 0. The second GCI level control flag indicates whether the second SPS level control flag is equal to 0. A non-temporary computer-readable recording medium that records an instruction to be performed by one or more processors of an encoding device when the instruction is executed by the encoding device, including an operation including the above.

10. Determine the first SPS level control flag and the second SPS level control flag, The first SPS level control flag indicates whether BDOF (bi-directional optical flow) is enabled for the current video sequence. The second SPS level control flag indicates whether PROF (prediction refinement with optical flow) is enabled for the current video sequence. When the first SPS level control flag indicates that BDOF is enabled for the current video sequence, and the video block is not encoded in affine mode, BDOF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). When the second SPS level control flag indicates that PROF is enabled for the current video sequence, and the video block is encoded in affine mode, PROF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). Based on the motion refinement, predict samples are obtained for the current video block. A non-temporary computer-readable recording medium according to claim 9, which records the instruction causing the encoding device to perform an operation including the above.

11. The first GCI level control flag is signaled to indicate whether the first SPS level control flag is equal to 0. The second GCI level control flag is signaled to indicate whether the second SPS level control flag is equal to 0. The non-temporary computer-readable recording medium according to claim 9, further comprising causing the encoding device to perform an operation including the above.

12. If the first SPS level control flag indicates that BDOF is enabled, the first control flag of the slice header is determined. The first control flag signals whether the BDOF is disabled for the video block of the slice. If the second SPS level control flag indicates that PROF is enabled, the second control flag of the slice header is determined. The second control flag signals whether the PROF is enabled for the video block of the slice. The non-temporary computer-readable recording medium according to claim 9, further comprising causing the encoding device to perform an operation including the above.

13. A method for saving a bitstream, The bitstream is generated by performing a video encoding method, The bitstream is saved, The aforementioned video encoding method is Determine the two GCI (general constraint information) level control flags. The two GCI level control flags include a first GCI level control flag and a second GCI level control flag, The first GCI level control flag indicates whether the first SPS (sequence parameter set) level control flag is equal to 0. The second GCI level control flag indicates whether the second SPS level control flag is equal to 0. How to save a bitstream.

14. The aforementioned video encoding method further, Determine the first SPS level control flag and the second SPS level control flag, The first SPS level control flag indicates whether BDOF (bi-directional optical flow) is enabled for the current video sequence. The second SPS level control flag indicates whether PROF (prediction refinement with optical flow) is enabled for the current video sequence. When the first SPS level control flag indicates that BDOF is enabled for the current video sequence, and the video block is not encoded in affine mode, BDOF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). When the second SPS level control flag indicates that PROF is enabled for the current video sequence, and the video block is encoded in affine mode, PROF is applied to derive motion refinement of the video block based on the first predicted sample I(0)(i,j) and the second predicted sample I(1)(i,j). Based on the motion refinement, predict samples are obtained for the current video block. A method for saving a bitstream according to claim 13.