Method and apparatus of adjusting prediction refinement adaptively in video coding systems
Adaptive prediction refinement techniques in video coding systems address inefficiencies by using position-related weights and sample-based offsets, reducing complexity and improving efficiency through optimized prediction refinement.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- MEDIATEK INC
- Filing Date
- 2025-12-23
- Publication Date
- 2026-07-02
Smart Images

Figure CN2025144562_02072026_PF_FP_ABST
Abstract
Description
METHOD AND APPARATUS OF ADJUSTING PREDICTION REFINEMENT ADAPTIVELY IN VIDEO CODING SYSTEMSCROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63 / 739,158 filed on December 27, 2024 and U.S. Provisional Patent Application No. 63 / 770,443 filed on March 12, 2025. The U.S. Provisional Patent Applications are hereby incorporated by reference in their entireties.FIELD OF THE INVENTION
[0002] The present invention relates to video coding system. In particular, the present invention relates to adaptive prediction refinement to improve coding efficiency by using multiple models.BACKGROUND
[0003] Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO / IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO / IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
[0004] Fig. 1A illustrates an exemplary adaptive Inter / Intra video encoding system incorporating loop processing. For Intra Prediction 110, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, is provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
[0005] As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
[0006] The decoder, as shown in Fig. 1B, can use some of the functional blocks as the encoder. For example, the decoder can reuse Inverse Quantization 124 and Inverse Transform 126; however, Transform 118 and Quantization 120 are not needed at the decoder. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
[0007] In VVC, the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS) contain high-level syntax elements that apply to entire coded video sequences and pictures, respectively. The Picture Header (PH) and Slice Header (SH) contain high-level syntax elements that apply to a current coded picture and a current coded slice, respectively.
[0008] In VVC, a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) . A coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order. A bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors and reference indices to predict the sample values of each block. A predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block. An intra (I) slice is decoded using intra prediction only.
[0009] 1.1 Affine Motion Compensated Prediction
[0010] In HEVC, only translational motion model is applied for motion compensation prediction (MCP) . While in the real world, there are many kinds of motion, e.g. zoom in / out, rotation, perspective motions and the other irregular motions. In VVC, a block-based affine transform motion compensation prediction is applied. As shown Figs. 2A-B, the affine motion field of the blocks 210 and 220 is described by motion information of two control point (4-parameter) in Fig. 2A or three control point motion vectors (6-parameter) in Fig. 2B.
[0011] For 4-parameter affine motion model, motion vector at sample location (x, y) in a block is derived as:
[0012] For 6-parameter affine motion model, motion vector at sample location (x, y) in a block is derived as:
[0013] Where (mv0x, mv0y) is motion vector of the top-left corner control point, (mv1x, mv1y) is motion vector of the top-right corner control point, and (mv2x, mv2y) is motion vector of the bottom-left corner control point.
[0014] In order to simplify the motion compensation prediction, block based affine transform prediction is applied. To derive motion vector of each 4×4 luma subblock, the motion vector of the centre sample of each subblock, as shown in Fig. 3, is calculated according to above equations, and rounded to 1 / 16 fraction accuracy. Then, the motion compensation interpolation filters are applied to generate the prediction of each subblock with the derived motion vector. The subblock size of chroma-components is also set to be 4×4. The MV of a 4×4 chroma subblock is calculated as the average of the MVs of the top-left and bottom-right luma subblocks in the collocated 8x8 luma region.
[0015] As is for translational-motion inter prediction, there are also two affine motion inter prediction modes: affine merge mode and affine AMVP mode.
[0016] 1.2 Affine Subblock BDOF Refinement
[0017] BDOF subblock MV refinement and sample adjustment are applied to an affine or SbTMVP coded block with subblock MC when BDOF condition is satisfied.
[0018] An affine coded block, such as affine regular merge mode, affine BM merge mode, affine AMVP mode, derives MVs for each 4×4 subblock from the affine model. The BDOF process starts with the 4×4 subblock grouping with identical MVs. The first iteration of BDOF MV refinement is processed in 8x8 subblock grid as in ECM-10.0. When the grouped subblock size is less than 256, the second iteration of BDOF MV refinement is processed in 4×4 subblock grid, and otherwise in 8×8 subblock grid. When the grouped subblock size is 4xN or Nx4, the first iteration of BDOF MV refinement is bypassed.
[0019] 1.3 Bi-prediction with CU-level weight (BCW)
[0020] In HEVC, the bi-prediction signal is generated by averaging two prediction signals obtained from two different reference pictures and / or using two different motion vectors. In VVC, the bi-prediction mode is extended beyond simple averaging to allow weighted averaging of the two prediction signals. Pbi-pred= ( (8-w) *P0+w*P1+4) >>3 (3)
[0021] Five weights are allowed in the weighted averaging bi-prediction, w∈ {-2, 3, 4, 5, 10} . For each bi-predicted CU, the weight w is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; and 2) for a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index. BCW is only applied to CUs with 256 or more luma samples (i.e., CU width times CU height is greater than or equal to 256) . For low-delay pictures, all 5 weights are used. For non-low-delay pictures, only 3 weights (w∈ {3, 4, 5} ) are used. - At the encoder, fast search algorithms are applied to find the weight index without significantly increasing the encoder complexity. These algorithms are summarized as follows. Further details can be found in the VTM software and document JVET-L0646. When combined with AMVR, unequal weights are only conditionally checked for 1-pel and 4-pel motion vector precisions if the current picture is a low-delay picture. - When combined with affine, affine ME will be performed for unequal weights if and only if the affine mode is selected as the current best mode. - When the two reference pictures in bi-prediction are the same, unequal weights are only conditionally checked. - Unequal weights are not searched when certain conditions are met, depending on the POC distance between current picture and its reference pictures, the coding QP, and the temporal level.
[0022] The BCW weight index is coded using one context coded bin followed by bypass coded bins. The first context coded bin indicates if equal weight is used; and if unequal weight is used, additional bins are signalled using bypass coding to indicate which unequal weight is used.
[0023] Weighted prediction (WP) is a coding tool supported by the H. 264 / AVC and HEVC standards to efficiently code video content with fading. Support for WP was also added into the VVC standard. WP allows weighting parameters (weight and offset) to be signalled for each reference picture in each of the reference picture lists L0 and L1. Then, during motion compensation, the weight (s) and offset (s) of the corresponding reference picture (s) are applied. WP and BCW are designed for different types of video content. To avoid interactions between WP and BCW, which would complicate the VVC decoder design, the BCW weight index is not signalled when a CU uses WP, and the weight www is inferred to be 4, corresponding to equal weighting. For a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index. This can be applied to both normal merge mode and inherited affine merge mode. For constructed affine merge mode, the affine motion information is constructed based on the motion information of up to 3 blocks. The BCW index for a CU using the constructed affine merge mode is simply set equal to the BCW index of the first control point MV.
[0024] In VVC, CIIP and BCW cannot be jointly applied for a CU. When a CU is coded with CIIP mode, the BCW index of the current CU is set to 2, , corresponding to equal weight.
[0025] 1.4 Subblock-based Temporal Motion Vector Prediction (SbTMVP)
[0026] VVC supports the subblock-based temporal motion vector prediction (SbTMVP) method. Similar to the temporal motion vector prediction (TMVP) in HEVC, SbTMVP uses the motion field in the collocated picture to improve motion vector prediction and merge mode for CUs in the current picture. The same collocated picture used by TMVP is used for SbTMVP. SbTMVP differs from TMVP in the following two main aspects: - TMVP predicts motion at CU level but SbTMVP predicts motion at sub-CU level; - Whereas TMVP fetches the temporal motion vectors from the collocated block in the collocated picture (the collocated block is the bottom-right or center block relative to the current CU) , SbTMVP applies a motion shift before fetching the temporal motion information from the collocated picture, where the motion shift is obtained from the motion vector from one of the spatial neighbouring blocks of the current CU.
[0027] The SbTMVP process is illustrated in Figs. 4A-B. SbTMVP predicts the motion vectors of the sub-CUs within the current CU in two steps. In the first step, the spatial neighbour A1 in Fig. 4A is examined. If A1 has a motion vector that uses the collocated picture as its reference picture, this motion vector is selected to be the motion shift to be applied. If no such motion is identified, then the motion shift is set to (0, 0) .
[0028] In the second step, the motion shift identified in Step 1 is applied (i.e. added to the current block’s coordinates) to obtain sub-CU level motion information (motion vectors and reference indices) from the collocated picture as shown in Fig. 4B. The example in Fig. 4B assumes the motion shift is set to block A1’s motion, where frame 420 corresponds to the current picture and frame 430 corresponds to a reference picture (i.e., a collocated picture) . Then, for each sub-CU, the motion information of its corresponding block (the smallest motion grid that covers the centre sample) in the collocated picture is used to derive the motion information for the sub-CU. After the motion information of the collocated sub-CU is identified, it is converted to the motion vectors and reference indices of the current sub-CU in a similar way as the TMVP process of HEVC, where temporal motion scaling is applied to align the reference pictures of the temporal motion vectors to those of the current CU. In Fig. 4B, the arrow (s) in each subblock of the collocated picture 430 correspond (s) to the motion vector (s) of a collocated subblock (thick-lined arrow for L0 MV and thin-lined arrow for L1 MV) . For the current picture 420, the arrow (s) in each subblock correspond (s) to the scaled motion vector (s) of a current subblock (thick-lined arrow for L0 MV and thin-lined arrow for L1 MV) .
[0029] In VVC, a combined subblock based merge list, which contains both SbTMVP candidate and affine merge candidates, is used for the signalling of subblock based merge mode. The SbTMVP mode is enabled / disabled by a sequence parameter set (SPS) flag. If the SbTMVP mode is enabled, the SbTMVP predictor is added as the first entry of the list of subblock based merge candidates, and followed by the affine merge candidates. The size of subblock based merge list is signalled in SPS and the maximum allowed size of the subblock based merge list is 5 in VVC.
[0030] The sub-CU size used in SbTMVP is fixed to be 8x8, and as done for the affine merge mode, SbTMVP mode is only applicable to the CU with both width and height are larger than or equal to 8.
[0031] The encoding processing flow of the additional SbTMVP merge candidate is the same as for the other merge candidates, that is, for each CU in P or B slice, an additional RD check is performed to decide whether to use the SbTMVP candidate.
[0032] 1.5 Multi-pass decoder-side motion vector refinement (MP-DMVR)
[0033] In ECM-2.0, a multi-pass decoder-side motion vector refinement (DMVR) method is applied in regular merge mode if the selected merge candidate meets the DMVR conditions. In the first pass, bilateral matching (BM) is applied to the coding block. In the second pass, BM is applied to each 16x16 subblock within the coding block. In the third pass, MV in each 8x8 subblock is refined by applying bi-directional optical flow (BDOF) .
[0034] Similar to the DMVR in VVC, the BM refined a pair of motion vectors MV0 and MV1 under the constrain that MVD0 (MV0’ -MV0) is just the opposite sign of MVD1 (MV1’ -MV1) , as illustrated in Fig. 5
[0035] In JVET-X0049, adaptive decoder side motion vector refinement is proposed. The adaptive decoder side motion vector refinement method consists of the two new merge modes introduced to refine MV only in one direction, either L0 or L1, of the bi prediction for the merge candidates that meet the DMVR conditions. The multi-pass DMVR process is applied for the selected merge candidate to refine the motion vectors, however either MVD0 or MVD1 is set to zero in the first pass (i.e. PU level) DMVR.
[0036] Like the regular merge mode, merge candidates for the proposed merge modes are derived from the spatial neighbouring coded blocks, TMVPs, non-adjacent blocks, HMVPs, and pair-wise candidate. The difference is that only those meet DMVR conditions are added into the candidate list. The same merge candidate list is used by the two proposed merge modes and merge index is coded as in regular merge mode. There are two syntax elements to indicate this mode, including bmMergeFlag and bmDirFlag. bmMergeFlag is used to indicate the on-off of this kind of prediction (refine MV only on one direction) . bmDirFlag is used to indicate the refined MV direction. For example, bmDirFlag is equal to 0, the refined MV is from List0. bmDirFlag is equal to 1, the refined MV is from List 1. As shown in Table 1. Table 1. Signalling of bmMergeFlag and bmDirFlag
[0037] After decoding bm_merge_flag and bm_dir_flag, bmDir can be decided. For example, if bm_merge_flag is equal to 1, bm_dir_flag is equal to 0, bmDir will be set as 1. And it is used to represent the adaptive MP-DMVR only refine the MV in List0. For another example, if bm_merge_flag is equal to 1, bm_dir_flag is equal to 1, bmDir will be set as 2. And it is used to represent the adaptive MP-DMVR only refine the MV in List1.
[0038] 1.6 Bi-directional Optical Flow (BIO) / BDOF
[0039] Bi-directional optical flow (BIO) is proposed in the third JCTVC meeting and 52th VCEG meeting, and it is disclosed in the document, JCTVC-C204 and VECG-AZ05. BIO is based on the assumptions of optical flow and steady motion to achieve the sample-level motion refinement. BIO derived the sample-level motion refinement based on the assumptions of optical flow and steady motion as shown in Fig. 6, where a current pixel 622 in a B-slice (bi-prediction slice) 620 is predicted by one pixel (632) in reference picture 0 (630) and one pixel (612) in reference picture 1 (610) . As shown in Fig. 6, the current pixel 622 is predicted by pixel B 612 in reference picture 1 (610) and pixel A 632 in reference picture 0 (630) . It is applied only for truly bi-directional predicted blocks, which is predicted from two reference frames and one is the previous frame and the other is the latter frame. In VECG-AZ05, BIO utilizes one 5x5 window to derive the motion refinement of one sample. Therefore, for one NxN block, the motion compensated results and corresponding gradient information of one (N+4) x (N+4) block are required to derive the sample-based motion refinement of current block. A 6-Tap gradient filter and a 6-Tap interpolation filter are used to generate the gradient information in BIO. Therefore, the computation complexity of BIO is much higher than that of traditional bi-directional prediction. In order to further improve the performance of BIO, the following methods are proposed.
[0040] In a conventional bi-prediction in HEVC, the predictor is generated using equation (4) , in which P (0) and P (1) are the list0 and list1 predictor, respectively. PConventional [i, j] =(P (0) [i, j] +P (1) [i, j] +1) >>1 (4)
[0041] In JCTVC-C204 and VECG-AZ05, the BIO predictor is generated using equation (5) . POpticalFlow= (P (0) [i, j] +P (1) [i, j] +vx [i, j] (Ix (0) -Ix (1) [i, j] ) +vy [i, j] (Iy (0) -Iy (1) [i, j] ) +1) >>1 (5)
[0042] In equation (5) , Ix (0) and Ix (1) represents the x-directional gradient in list0 and list1 predictor, respectively; Iy (0) and Iy (1) represents the y-directional gradient in list0 and list1 predictor, respectively; vx and vy represents the offsets in x-and y-direction, respectively. These gradients can be directly derived based on interpolated results or calculating by using another set of gradient filters and interpolation filters. One additional shift, gradient shift, is also introduced to normalize the gradient values in the derivation process of gradients. The derivation process of vx and vy is shown in the following. First, the cost function is defined as diffCost (x, y) to find the best values vx and vy. In order to find the best values vx and vy to minimize the cost function, diffCost (x, y) , one 5x5 window is used. The solutions of vx and vy can be represented by using S1, S2, S3, S5, and S6.
[0043] The minimum cost function, min diffCost (x, y) can be derived according to:
[0044] By solving equations (5) and (6) , vx and vy can be solved according to eqn. (7) : where,
[0045] The minimum cost function, min diffCost (x, y) can be derived according to:
[0046] By solving equations (3) and (4) , vx and vy can be solved according to eqn. (5) : where,
[0047] Or, in some related art, the S2 can be ignored and then we can further simplify the equations as: where
[0048] We can find that the required bitdpeth is large in BIO process, especially for calculating S1, S2, S3, S5, and S6. For example, if the bitdepth of pixel value in video sequences is 10bits and the bitdepth of gradients is increased by fractional interpolation filter or gradient filter, then 16 bits are required to represent one x-directional gradient or one y-directional gradient. These 16 bits may be further reduced by gradient shift equal to 4, so one gradient needs 12 bits to represent the value. Even if the magnitude of gradient can be reduced to 12 bits by gradient shift, the required bitdepth of BIO operations is still large. One multiplier with 13 bits by 13 bits is required to calculate S1, S2, and S5. And another multiplier with 13 bits by 17 bits is required to get S3, and S6. When the window size is large, more than 32 bits are required to represent S1, S2, S3, S5, and S6.
[0049] 1.7 Geometric Partitioning Mode (GPM)
[0050] In VVC, a geometric partitioning mode is supported for inter prediction. The geometric partitioning mode is signalled using a CU-level flag as one kind of merge mode, with other merge modes including the regular merge mode, the MMVD mode, the CIIP mode and the subblock merge mode. In total 64 partitions are supported by geometric partitioning mode for each possible CU size w×h=2m×2n with m, n ∈ {3…6} excluding 8x64 and 64x8.
[0051] When this mode is used, a CU is split into two parts by a geometrically located straight line (Fig. 7) . The location of the splitting line is mathematically derived from the angle and offset parameters of a specific partition. Each part of a geometric partition in the CU is inter-predicted using its own motion; only uni-prediction is allowed for each partition, that is, each part has one motion vector and one reference index. The uni-prediction motion constraint is applied to ensure that same as the conventional bi-prediction, only two motion compensated prediction are needed for each CU.
[0052] 1.8 GPM Blending Along the Geometric Partitioning Edge
[0053] After predicting each part of a geometric partition using its own motion, blending is applied to the two prediction signals to derive samples around geometric partition edge. The blending weight for each position of the CU are derived based on the distance between individual position and the partition edge.
[0054] The distance for a position (x, y) to the partition edge are derived as: where i, j are the indices for angle and offset of a geometric partition, which depend on the signalled geometric partition index. The sign of ρx, j and ρy, j depend on angle index i.
[0055] The weights for each part of a geometric partition are derived as following: wIdxL (x, y) =partIdx ? 32+d (x, y) : 32-d (x, y) w1 (x, y) =1-w0 (x, y)
[0056] The partIdx depends on the angle index i. One example of weigh w0 is illustrated in Fig. 8. In Fig. 8, line 840 corresponds to the GPM partition boundary and two thresholds (i.e., -τ and τ) correspond to lines 842 and 844 in Fig 8. Furthermore, the angle 810 and offset ρi 820 are indicated for GPM index i and point 830 corresponds to the centre of the block.
[0057] 1.9 Enhanced Bi-Directional Motion Compensation
[0058] In bi-directional motion compensation, the out of boundary (OOB) prediction samples are discarded and only the non-OOB predictors, when available, are used to generate the final predictor. Specifically, let Pos_xi, j and Pos_yi, j denote the position of one prediction sample in one current block, and (x = 0, 1) denote the MV of the current block; PosLeftBdry, PosRightBdry, PosTopBdry and PosBottomBdry are the positions of four boundaries of the picture. One prediction sample is regarded as OOB when at least one of the following conditions is satisfied: where half_pixel is equal to 8 that represents the half-pel sample distance in the 1 / 16-pel sample precision.
[0059] After examining the OOB condition for each sample, the final prediction samples of one bi-directional block are generated as follows:
[0060] OOB checking process is also applicable when BCW is enabled.
[0061] Finally, note this sample-adaptive bi-prediction process only applies to prediction units for which at least a reference bock is first detected as partially or entirely out-of-bounds. Thus, a block-level OOB criteria is first checked. If both prediction blocks are non-OOB, then the usual bi-prediction takes place.
[0062] 1.10 Local Illumination Compensation (LIC)
[0063] LIC is an inter prediction technique to model local illumination variation between current block and its prediction block as a function of that between the current block template and the reference block template. The parameters of the function can be denoted by a scale α and an offset β, which forms a linear equation, that is, α*p [x] +β to compensate illumination changes, where p [x] is a reference sample pointed to by MV at a location x on reference picture. When wrap around motion compensation is enabled, the MV shall be clipped with wrap around offset taken into consideration. Since α and β can be derived based on the current block template and the reference block template, no signalling overhead is required for them.
[0064] The local illumination compensation proposed in JVET-O0066 is used for inter-coded CUs with the following modifications. · Intra neighbour samples can be used in LIC parameter derivation. · LIC is disabled for blocks with less than 32 luma samples. · Samples of the reference block template are generated by using MC with the block MV without rounding it to integer-pel precision.
[0065] 1.11 Overlapped Block Motion Compensation (OBMC)
[0066] When OBMC is applied, top and left boundary pixels of a CU are refined using neighbouring block’s motion information with a weighted prediction as described in JVET-L0101.
[0067] Conditions of not applying OBMC are as follows: · When OBMC is disabled at SPS level · When current block has intra mode or IBC mode · When current luma block area is smaller or equal to 32
[0068] Additionally, OBMC is adaptively controlled on a block level as follows: · OBMC flag is inherited from a neighbouring affine block for affine merge mode. · OBMC is not applied to a block if there is a neighbour block coded with IBC, palette, or BDPCM modes. · When applying OBMC to a block, block boundary check regarding whether OBMC is applied to the boundary is further made based on the reference samples of the current block. If any absolute difference between the prediction sample and non-interpolated (integer pel) reference sample is greater than a threshold, the OBMC is not applied to that boundary.
[0069] A subblock-boundary OBMC is performed by applying the same blending to the top, left, bottom, and right subblock boundary pixels using neighbouring subblocks’ motion information. It is enabled for the subblock based coding tools: · Affine AMVP modes; · Affine merge modes and subblock-based temporal motion vector prediction (SbTMVP) ; · Subblock-based bilateral matching.
[0070] When OBMC mode is used in CIIP mode with LMCS (Luma Mapping with Chroma Scaling) , inter blending is performed prior to LMCS mapping of inter samples. LMCS is applied to blended inter samples which are combined with LMCS applied intra samples in CIIP mode, where InterpredY represents the samples predicted by the motion of current block in the original domain, IntrapredY represents the samples predicted in the mapped domain, OBMCpredY represents the samples predicted by the motion of neighbouring blocks in the original domain, and w0 and w1 are the weights.
[0071] When OBMC mode is used in a LIC coded block, the LIC parameters are applied to generate the corresponding prediction samples for the OBMC of the LIC coded block. Besides, to reduce the complexity, the OBMC is only applied to the top and left CU boundaries while being always disabled for the boundaries of the internal sub-blocks of the LIC coded block.
[0072] 1.12 Decoder side intra mode derivation (DIMD)
[0073] When DIMD is applied, up to five intra modes are derived from the reconstructed neighbour samples, and those five predictors are combined with the non-directional predictor (planar or block vector based predictor) with the weights derived from the histogram of gradients as described in JVET-O0449. The decision between for the non-directional modes is taken according to the template cost. Specifically, the block vectors of all adjacent and non-adjacent merge candidates (coded in IntraTMP or IBC) are compared to planar prediction on the reconstructed template. The template cost (SATD) is used to select the best predictor among them.
[0074] The division operations in weight derivation are performed utilizing the same lookup table (LUT) based integerization scheme used by the CCLM. For example, the division operation in the orientation calculation Orient=GyΤGx is computed by the following LUT-based scheme: x = Floor (Log2 (Gx) ) normDiff = ( (Gx<< 4) >> x) &15 x += (3 + (normDiff ! = 0) ? 1: 0) Orient = (Gy* (DivSigTable [normDiff ] | 8) + (1<< (x-1) ) ) >> x where DivSigTable
[0016] = {0, 7, 6, 5 , 5, 4, 4, 3, 3, 2, 2, 1, 1, 1, 1, 0 } .
[0075] For a block of size W×H, the weight for each of the five derived modes is modified if the one the above or left histogram magnitudes is twice larger than the other one. In this case, the weights are location dependent and computed as follows:
[0076] If the above histogram is twice the left, then:
[0077] If the left histogram is twice the above, then: where wDimdi is the unmodified uniform weight of the DIMD selected as in JVET-O0449, Δi is pre-defined and set to 10.
[0078] Derived intra modes are included into the primary list of intra most probable modes (MPM) , so the DIMD process is performed before the MPM list is constructed. The primary derived intra mode of a DIMD block is stored with a block and is used for MPM list construction of the neighbouring blocks.
[0079] Finally, note the region of neighbouring reconstructed samples used for computing the histogram of gradients is modified compared to JVET-O0449 method, depending on reconstructed samples availability. The region of decoded reference samples of current WxH luma CB is extended towards the above-right side if available, up to W additional columns. It is extended towards the bottom-left side if available, up to H additional rows.
[0080] 1.13 Combined Inter and Intra Prediction (CIIP)
[0081] In VVC, when a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64) , and if both CU width and CU height are less than 128 luma samples, an additional flag is signalled to indicate if the combined inter / intra prediction (CIIP) mode is applied to the current CU. As its name indicates, the CIIP prediction combines an inter prediction signal with an intra prediction signal. The inter prediction signal in the CIIP mode Pinter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal Pintra is derived following the regular intra prediction process with the planar mode. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value is calculated depending on the coding modes of the top and left neighbouring blocks (depicted in Fig. 9) as follows: – If the top neighbour is available and intra coded, then set isIntraTop to 1, otherwise set isIntraTop to 0; – If the left neighbour is available and intra coded, then set isIntraLeft to 1, otherwise set isIntraLeft to 0; – If (isIntraLeft + isIntraTop) is equal to 2, then wt is set to 3; – Otherwise, if (isIntraLeft + isIntraTop) is equal to 1, then wt is set to 2; – Otherwise, set wt to 1.
[0082] The CIIP prediction is formed as follows: PCIIP= ( (4-wt) *Pinter+wt*Pintra+2) >>2
[0083] 1.14 Multi-Hypothesis Prediction (MHP)
[0084] In the multi-hypothesis inter prediction mode (JVET-M0425) , one or more additional motion-compensated prediction signals are signalled, in addition to the conventional bi-prediction signal. The resulting overall prediction signal is obtained by sample-wise weighted superposition. With the bi-prediction signal pbi and the first additional inter prediction signal / hypothesis h3, the resulting prediction signal p3 is obtained as follows: p3= (1-α) pbi+αh3.
[0085] The weighting factor α is specified by the new syntax element add_hyp_weight_idx, according to the following mapping table: Table 2. Mapping between syntax element add_hyp_weight_idx, and weighting factor α
[0086] Analogously to above, more than one additional prediction signal can be used. The resulting overall prediction signal is accumulated iteratively with each additional prediction signal. pn+1= (1-αn+1) pn+αn+1hn+1
[0087] The resulting overall prediction signal is obtained as the last pn (i.e., the pn having the largest index n) . Up to two additional prediction signals can be used (i.e., n is limited to 2) .
[0088] The motion parameters of each additional prediction hypothesis can be signalled either explicitly by specifying the reference index, the motion vector predictor index, and the motion vector difference, or implicitly by specifying a merge index. A separate multi-hypothesis merge flag distinguishes between these two signalling modes.
[0089] For inter AMVP mode, MHP is only applied if non-equal weight in BCW is selected in bi-prediction mode.
[0090] Combination of MHP and BDOF is possible, however the BDOF is only applied to the bi-prediction signal part of the prediction signal (i.e., the ordinary first two hypotheses) .
[0091] In the present invention, techniques of adaptive prediction refinement are disclosed, where the prediction refinement is adjusted based on coding parameters or the prediction refinement uses multi-set PRWs selected based on classification for prediction refinement. BRIEF SUMMARY OF THE INVENTION
[0092] A method and apparatus of video coding for adaptive prediction refinement are disclosed. According to one method, input data associated with a current block is received, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. One or more sets of Position-Related Weights (PRWs) for template samples in one or more templates and prediction samples in the current block are determined, wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample in said one or more templates and / or a second position of each of the prediction samples in the current block. An SPO (Sample-Based Prediction Offset) is derived for a target prediction sample in the current block, wherein the SPO comprises a sum of template differences weighted by said one or more sets of respective PRWs, wherein the template differences are derived based on the template samples. One or more coding parameters associated with the current block are determined for the target prediction sample. The SPO is adjusted according to said one or more coding parameters to derive an adjusted SPO. The target prediction sample is refined using the adjusted SPO to generate a refined prediction sample. The current block is encoded or decoded by using information comprising the refined prediction sample.
[0093] In one embodiment, the SPO further comprises a second weighted sum of the target prediction sample and neighbouring prediction samples of the target prediction sample.
[0094] In one embodiment, said one or more coding parameters comprise a block variance or a mean of absolute differences of the prediction samples of the current block. In one embodiment, the block variance or the mean of absolute differences of the prediction samples of the current block is derived based on down-sampled prediction samples.
[0095] In one embodiment, said one or more coding parameters comprise a local variance or a local mean of absolute differences of the target prediction sample derived based on the target prediction sample and one or more neighbouring prediction samples of the target prediction sample.
[0096] In one embodiment, said one or more coding parameters comprise an average of absolute differences between a reference template and a reconstructed template.
[0097] In one embodiment, said one or more coding parameters comprise block size, block area, temporal ID or slice QP.
[0098] In one embodiment, a scale is derived based on said one or more coding parameters according to a linear function or a non-linear function, and the SPO is scaled by the scale to derive the adjusted SPO. In one embodiment, the scale is clipped to a range before the scale is used to adjust the SPO. In one embodiment, when the scale is derived based on said one or more coding parameters according to the non-linear function, values of the scale for said one or more coding parameters are stored in a lookup table.
[0099] According to another method, input data associated with a current block is received, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. Classification for the current block to select a target class among multiple classes is determined. An SPO (Sample-Based Prediction Offset) for a target prediction sample in the current block is derived according to the target class, wherein multiple Position-Related Weights (PRWs) are used for the multiple classes and the SPO comprises a sum of template differences weighted by a target PRWs associated with the target class, wherein the template differences are derived based on reconstructed samples and reference samples of said one or more templates. The target prediction sample is refined using the SPO to generate a refined prediction sample. The current block is encoded or decoded by using information comprising the refined prediction sample.
[0100] In one embodiment, the classification for the current block is determined according to block variance, local variance, intensity of one or more samples, or a combination thereof.
[0101] In one embodiment, the classification for the current block is determined according to template differences, similarity of template differences between reconstructed template samples and reference template samples, similarity of differences between the target prediction sample and the reference template samples, or a combination thereof.
[0102] In one embodiment, the classification for the current block is determined according to slice video resolution, slice QP (Quantization Parameter) , slice TID (Temporal ID) , or a combination thereof.BRIEF DESCRIPTION OF THE DRAWINGS
[0103] Fig. 1A illustrates an exemplary adaptive Inter / Intra video coding system incorporating loop processing.
[0104] Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
[0105] Fig. 2A illustrates an example of the affine motion field of a block described by motion information of two control point (4-parameter) .
[0106] Fig. 2Billustrates an example of the affine motion field of a block described by motion information of three control point motion vectors (6-parameter) .
[0107] Fig. 3 illustrates an example of block based affine transform prediction, where the motion vector of each 4×4 luma subblock is derived from the control-point MVs.
[0108] Fig. 4A illustrates an example of subblock-based Temporal Motion Vector Prediction (SbTMVP) in VVC, where the spatial neighbouring blocks are checked for availability of motion information.
[0109] Fig. 4B illustrates an example of SbTMVP for deriving sub-CU motion field by applying a motion shift from spatial neighbour and scaling the motion information from the corresponding collocated sub-CUs.
[0110] Fig. 5 illustrates an example of bilateral matching search process.
[0111] Fig. 6 illustrates an example of Bi-directional optical flow (BIO) that utilizes the assumptions of optical flow and steady motion to achieve the sample-level motion refinement.
[0112] Fig. 7 illustrates examples of the GPM splits grouped by identical angles.
[0113] Fig. 8 illustrates exemplified generation of a bending weight w0 using geometric partitioning mode.
[0114] Fig. 9 illustrates top and left neighbouring blocks used in CIIP weight derivation.
[0115] Fig. 10 illustrates an example of the samples surrounding the centre sample IC used for calculating local variance.
[0116] Fig. 11 illustrates a flowchart of an exemplary video coding system that adjusts prediction refinement based on coding parameters according to an embodiment of the present invention.
[0117] Fig. 12 illustrates a flowchart of an exemplary video coding system that uses multi-set PRWs selected based on classification for prediction refinement according to an embodiment of the present invention.DETAILED DESCRIPTION OF THE INVENTION
[0118] It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.
[0119] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.
[0120] PROPOSED METHOD
[0121] 2. Multi-Model Classification
[0122] 2.1 Multi-Model Classification by CU Skip Mode
[0123] In one embodiment, the classification method of neighbouring reconstructed samples is used to select the model for prediction generation or refinement. The classification method utilizes the coded mode of the CU where the neighbouring reconstructed sample belongs to. The neighbouring reconstructed samples are used for the prediction generation or refinement with model-0 or model-1 according to the classification method. If the coded mode is non SKIP mode (i.e. skipModeFlag =0) , the classification method chooses model-0 for the neighbouring reconstructed samples. On the other hand, if the coded mode is SKIP mode (i.e. skipModeFlag =1) , the classification method chooses model-1 for the neighbouring reconstructed samples. The generated or refined prediction with model-0 and model-1 can be used or blended to derive the final prediction.
[0124] Example 1: LIC is the prediction refinement tool in the inter prediction process. Model-0 is the linear model derived from the neighbouring template samples from the current reconstructed template with skipModeFlag = 0 and the corresponding neighbouring prediction samples from the reference template. In contrast, model-1 is the linear model derived from the neighbouring template samples from the current reconstructed template with skipModeFlag =1 and the corresponding neighbouring prediction samples from the reference template. Due to the nature that samples coded by non-skip mode have lower distortion than samples coded by skip mode, model-1 is more reliable than model-0. We blend model-0 and model-1 to generate the final prediction refinement with higher weighting to model-1. In this example, the blending weighting is 3 / 4 for model-1 and 1 / 4 for model-0. In another example, we give a larger weighting for the predictors generated from model-0.
[0125] 2.2 Multi-model Classification by TB CBF Flag
[0126] In another embodiment, the classification method of neighbouring reconstructed samples is used to select the model for prediction generation or refinement. The classification method utilizes the coded CBF flag of the TB where the neighbouring reconstructed sample belongs to. The neighbouring reconstructed samples are used for the prediction generation or refinement with model-0 or model-1 according to the classification method. If CBF flag is 0, the classification method chooses model-0 for the neighbouring reconstructed samples. On the other hand, if CBF flag is 1, the classification method chooses model-1 for the neighbouring reconstructed samples. The generated or refined prediction with model-0 and model-1 can be used or blended to derive the final prediction.
[0127] Example 2: LIC is the prediction refinement tool in the inter prediction process. Model-0 is the linear model derived from the neighbouring template samples from the current reconstructed template with CBF flag = 0 and the corresponding neighbouring prediction samples from the reference template. In contrast, model-1 is the linear model derived from the neighbouring template samples of the current reconstructed template with CBF flag = 1 and the corresponding neighbouring prediction samples from the reference template. Due to the nature that the reconstructed samples coded with CBF flag = 1 have lower distortion than the samples coded with CBF flag = 0, model-1 is more reliable than model-0. We blend model-0 and model-1 to generate the final prediction refinement with higher weighting to model-1. In this example, the blending weighting is 3 / 4 for model-1 and 1 / 4 for model-0. In another example, we give a larger weighting for the predictors generated from model-0.
[0128] 2.3 Multi-model Classification by CU Skip Mode and TB CBF Flag
[0129] In another embodiment, the classification method of neighbouring reconstructed samples is used to select the model for prediction generation or refinement. The classification method utilizes the coded mode of the CU where the neighbouring reconstructed sample belongs to and the coded CBF flag of the TB where the neighbouring reconstructed sample belongs to. The neighbouring reconstructed samples are used for the prediction generation or refinement with model-0, model-1 or model-2 according to the classification method. If the coded mode is a non-skip mode (i.e. skipModeFlag =0) and CBF flag is 0, the classification method chooses model-0 for the neighbouring reconstructed samples. If the coded mode is a non-skip mode and CBF flag is 1, the classification method chooses model-1 for the neighbouring reconstructed samples. If the coded mode is skip mode (i.e. skipModeFlag =1) , the classification method chooses model-2 for the neighbouring reconstructed samples. The generated or refined prediction with model-0, model-1 and model-2 can be used or blended to derive the final prediction.
[0130] Example 3: LIC is the prediction refinement tool in inter prediction process. Model-0 is the linear model derived from the neighbouring template samples from the current reconstructed template with skipModeFlag= 0 and CBF flag 0 and the corresponding neighbouring prediction samples from the reference template. Model-1 is the linear model derived from the neighbouring template samples of the current reconstructed template with skipModeFlag=0 and CBF flag 1 and the corresponding neighbouring prediction samples from the reference template. Model-2 is the linear model derived from the neighbouring template samples from the current reconstructed template with skipModeFlag=1 and the corresponding neighbouring prediction samples from the reference template. Just like examples 1 and 2, we blend model-0, model-1 and model-2 to generate the final prediction refinement with higher weighting to model-1. In this example, the blending weighting is 1 / 4 for model-0, 2 / 4 for model-1 and 1 / 4 for model-2.
[0131] In the above embodiment, “the coded mode is skip mode or not” or “the CBF flag is 1 or not” can be replaced by other coded information in one CU / CB or one TU / TB. For example, the condition of the coded mode can be changed to Merge mode, AMVP mode, affine mode, GPM mode, motion vector, reference frame selection, subblock usage, or the combination of coded information in one CU / CB. The condition of “CBF flag is 1 or not” can be changed to the number of non-zero coefficients, the absolute sum of quantized coefficients, or the combination of coded information in one TB / TU.
[0132] 3. Prediction Refinement with Multi-Model Parameters
[0133] 3.1 Classification by CU Skip Mode and / or TU CBF Flag
[0134] In one embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The model can be chosen from proposed method in Section 2.1 -Multi-model classification by CU coded mode (skip mode or not) , method in Section 2.2 -Multi-model classification by TB CBF flag or method in Section 2.3 -Multi-model classification by CU coded mode and TB CBF flag. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) . The template can be 1 line or multiple lines. The PRW for each sample in the predictor to be refined is related to the template position and the sample position according to the block size. The SPO for each sample in the predictor to be refined is the sum of DRR multiplied by the sample’s PRW. The SPO is then added to the predictor for each sample. Clipping is optional to apply on it before output.
[0135] Example 4. We derive a sample-based prediction offset to refine the predictor. In the first step, we calculate the template difference DiffTemp, x, y between the reconstructed template RecTemp, x, y of current block and the reference template RefTemp, x, y of predictor. The x and y are the related position to the top-left (TL) sample in the predictor. For example, x = 0 and y = -1 means the template sample above the TL sample. DiffTemp, x, y=RecTemp, x, y-RefTemp, x, y.
[0136] The position-based weighting of each sample in the predictor came from a pre-trained table LUTPRW, w, h, x, y, i, j, model according to the block width w, height h and model. Like example 1, LUTPRW, w, h, x, y, i, j, model-0 is trained by the neighbouring template samples from the current reconstructed template with skipModeFlag = 0 and the corresponding neighbouring prediction samples from the reference template. In contrast, LUTPRW, w, h, x, y, i, j, model-1 is trained by the neighbouring template samples from the current reconstructed template with skipModeFlag = 1 and the corresponding neighbouring prediction samples from the reference template. The i and j are the related position to the top-left (TL) sample in the predictor. For example, i = 1 and j = 0 means the sample right next to the TL sample.
[0137] The prediction offset of each sample OffsetSPO, i, j for the predictor is the sum of DiffTemp, x, y multiplied by LUTPRW, w, h, x, y, i, j, model.
[0138] 3.2. Classification by Block Variance or Local Variance
[0139] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The to-be-processed block / sample is classified by variance classification method (VCM) . The VCM utilizes block-based variance or sample-based local variance. The calculation can be traditional definition of variance for block-based variance or simplified measure, such as sum of absolute difference (SAD) between the surrounding sample values and the centre sample value in the calculation region for local variance. Model is determined by comparing the derived variance with one or more predefined values. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) , and the derived offset is added to the predictor for each sample as the same flow as proposed in method in Section 3.1.
[0140] Example 5. Similar to Example 4, we derive a sample-based prediction offset to refine the predictor with different models according to the VCM. In this example, LUTPRW, w, h, x, y, i, j, model-0 is trained by the predictor samples with local variance less than a predefined threshold, the neighbouring template samples from the current reconstructed template, and the corresponding neighbouring prediction samples from the reference template. LUTPRW, w, h, x, y, i, j, model-1 is trained by the predictor samples with local variance equal to or greater than a predefined threshold, the neighbouring template samples from the current reconstructed template, and the corresponding neighbouring prediction samples from the reference template.
[0141] The local variance is calculated in one local region. For example, a diamond shape region including the centre sample IC and its neighbouring samples for the predictor sample is defined as one local region for calculating the local variance. The samples surrounding the centre sample IC are denoted in the following Fig., where A, B, L and R stands for above, below, left and right and where NW, NE, SW, SE stands for north-west etc. Likewise, AA stands for above-above, BB for below-below etc. If the surrounding sample exceed the block boundary, padding is applied. The local variance is the sum of absolute differences (SAD) between IC and its neighbouring samples.
[0142] For example, the predefined threshold is set to 128. LUTPRW, w, h, x, y, i, j, model-0 is trained with the predictor sample with local variance less than 128. In contrast, LUTPRW, w, h, x, y, i, j, model-1 is trained with the predictor sample with local variance equal to or greater than 128.
[0143] The prediction offset of each sample OffsetSPO, i, j for the predictor is the sum of DiffTemp, x, y multiplied by LUTPRW, w, h, x, y, i, j, model as in the example 4.
[0144] 3.3 Classification by Intensity of the Sample
[0145] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The model is classified by intensity classification method (ICM) . The ICM utilizes intensity information. The intensity information can be predictor’s sample value, the reference template sample value and / or the reconstructed template value. A model is determined by comparing the derived intensity with predefine values. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) , and added to the predictor for each sample as the same flow as proposed in method in Section 3.1.
[0146] 3.4 Classification by Template Differences
[0147] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The model is classified by absolute values of template sample differences (ATSD) between reconstructed template samples and reference template samples. A model is determined by comparing the derived ATSD with one or more predefined values. The SPO is derived from position-related weighting (PRW) and the differences between the reconstructed template and the reference template (DRR) , and added to the predictor for each sample as the same flow as proposed in method in Section 3.1.
[0148] 3.5. Classification by Similarity of Template Differences
[0149] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The model is classified by the information from the similarity of the template sample differences (SMTSD) . The SMTSD is first calculated according to the template sample differences (TSD) between the reconstructed template samples and the reference template samples. The similarity of the TSD in the specified region is then checked. The similarity can be based on the variance or the sum of absolute differences. A model is determined by comparing the derived SMTSD with one or more predefined value. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) , and is added to the predictor for each sample as the same flow as proposed in method in Section 3.1.
[0150] 3.6. Classification by Similarity of Differences between Prediction and Template
[0151] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The model is classified by the information from the similarity of prediction and template (SPT) . The SPT is calculated according to the absolute differences (AD) between the sample value of the predictor and the sample values of the reference template. A model is determined by comparing the derived SPT with predefine values. The SPO is derived from position-related weighting (PRW) and the differences between the reconstructed template and the reference template (DRR) , and added to the predictor for each sample as the same flow as proposed in method in Section 3.1.
[0152] 3.7 Classification by Combined Methods
[0153] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models. The model is classified by the combined information from proposed methods in Sections 3.1 to 3.6. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) and, added to the predictor for each sample as the same flow as proposed in method in Section 3.1.
[0154] 4. Prediction Refinement with Multi-Set Parameters
[0155] 4.1 Classification by Slice Video Resolution
[0156] In one embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with parameters according to the video resolution. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) . The template can be 1 line or multiple lines. The PRW for each sample in the predictor to be refined is related to the template position and sample position according to the block size and video resolution. The SPO for each sample in the predictor to be refined is the sum of DRR multiplied by the sample’s PRW. The SPO is then added to the predictor for each sample. Clipping is optional to apply on it before output.
[0157] 4.2 Classification by Slice QP
[0158] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with parameters according to the slice QP setting. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) . The template can be 1 line or multiple lines. The PRW for each sample in the predictor to be refined is related to the template position and sample position according to the block size and the slice QP setting. The SPO for each sample in the predictor to be refined is the sum of DRR multiplied by the sample’s PRW. The SPO is then added to the predictor for each sample. Clipping is optional to apply on it before output.
[0159] 4.3 Classification by Slice TID Layer
[0160] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with parameters according to the slice TID (Temporal ID) value. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) . The template can be 1 line or multiple lines. The PRW for each sample in the predictor to be refined is related to the template position and sample position according to the block size and the slice TID value. The SPO for each sample in the predictor to be refined is the sum of DRR multiplied by the sample’s PRW. The SPO is then added to the predictor for each sample. Clipping is optional to apply on it before output.
[0161] 4.4 Classification by Multi-Set Parameters
[0162] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor according to the mixed settings mentioned in proposed methods in Sections 4.1 to 4.3. The SPO is derived from position-related weighting (PRW) and the difference between reconstructed template and reference template (DRR) , and added to the predictor for each sample as the same flow as proposed in Method 4.1.
[0163] 4.5 Classification by Combined Methods and Multi-Set Parameters
[0164] In another embodiment, a sample-based prediction offset (SPO) is used to refine the predictor with different models according to the mixed setting mentioned in the proposed method in Section 4.4. The model is classified by the combined information mentioned in the proposed method in Section 3.7. The SPO is derived from position-related weighting (PRW) and the difference between the reconstructed template and the reference template (DRR) , and added to the predictor for each sample as the same flow as proposed in method in Section 4.1.
[0165] 5. Prediction Refinement Using Decoder-Side Available Coding Parameters
[0166] In this disclosure, we propose several numerical models for refining reference predictors. The proposed numerical model generates a sample-based prediction offset (SPO) to refine the predictor. We will adaptively adjust the refinement applied to predictor based on some coding parameters that are available at the decoder side.
[0167] Our proposed methods include three parts. The first part enumerates various numerical models used for refining the predictor. The second part includes the coding parameters that may be used for adaptively fine-tuning the SPO correction amount. The third part is about how to fine-tune the SPO based on the coding parameters available on the decoder side. We will describe the corresponding possible embodiments for each part in following sections. By selecting one possible embodiment from each part, we can form a proposed method.
[0168] 5.1 Numerical Models Used to Generate Sample-Based Prediction Offset
[0169] 5.1.1 Only using template differences to derive sample-based prediction offset
[0170] We derive a sample-based prediction offset to refine the predictor. In the first step, we calculate the template difference of above template Diffabove, i and left template Diffleft, j between the reconstructed templates (Recabove, i and Recleft, j) of the current block and the reference template (Refabove, i and Refleft, j) of predictor. The i and j are the related horizontal and vertical position to the top-left (TL) sample in the predictor. For example, i = 0 means the template sample above the TL sample and i = 1 means the template sample right next to the sample i = 0. Diffabove, i=Recabove, i-Refabove, i Diffleft, j=Recleft, j-Refleft, j.
[0171] In the second step, we obtain the position-based weight of each sample. The position-based weight of each sample in the predictor came from a pre-trained table LUTabove, w, h, i, j and LUTleft, w, h, i, j according to the block width w and height h. The i and j correspond to horizontal and vertical position to the top-left (TL) sample in the predictor.
[0172] The prediction offset of each sample OffsetSPO, i, j in the predictor is the sum of Diffabove, i multiplied by LUTabove, w, h, j and Diffleft, j multiplied by LUTleft, w, h, i. Assume the template size is M, where M is a positive integer. Where N is a non-negative integer.
[0173] 5.1.2 Using template differences and predictor sample to derive sample-based prediction offset
[0174] Based on the description mentioned in Section 5.1, we further introduce the predictor sample and its neighbouring samples to calculate sample-based predictor offset.
[0175] Assume the template size is M, where M is a positive integer, N is a non-negative integer, and S is a non-negative integer.
[0176] 5.2 Useful Coding Parameters Available at Decoder Side
[0177] In this section, we will list some coding parameters that can be used to adaptively adjust the refinement generated by the numerical models mentioned in Section 5.1. In one embodiment, block variance can be used. Specifically, we compute the variance of predictor:
[0178] In another embodiment, we can simplify the calculation of block variance by using subsampling. Here is an example embodiment:
[0179] In another embodiment, we can replace block variance with local variance, which involving only a small portion of the predictor sample. Here is an example embodiment:
[0180] Where h is a non-negative integer and h is smaller than the height of block , H. There is another example embodiment, which using predictor sample on the corner to compute local variance. where
[0181] Where n is a non-negative integer and n is smaller than W / 2 and H / 2. In one embodiment, mean absolute difference (MAD) of block can be used.
[0182] In another embodiment, we can simplify the calculation of block MAD by using subsampling. According to another embodiment:
[0183] In another embodiment, we can replace block MAD with local MAD, which involves only a small portion of the predictor samples. Here is a corresponding embodiment: where h is a non-negative integer and h is smaller than the height of block, H. In another embodiment, predictor samples at the corner are used to compute local MAD. where, where n is a non-negative integer and n is smaller than W / 2 and H / 2.
[0184] In another embodiment, we can replace block MAD with average absolute difference between the reference template and the reconstructed template. Here is an example embodiment: we calculate the template difference of above template Diffabove, i and left template Diffleft, j between the reconstructed templates (Recabove, i and Recleft, j) of current block and the reference template (Refabove, i and Refleft, j) of predictor. The i and j are the related horizontal and vertical position to the top-left (TL) sample in the predictor. For example, i = 0 means the template sample above the TL sample and i = 1 means the template sample right next to the sample i = 0. Diffabove, i=Recabove, i-Refabove, i Diffleft, j=Recleft, j-Refleft, j.
[0185] The average absolute difference between the reference template and the reconstructed template is another simplified method to evaluate block variance:
[0186] If the template size is larger than 1, we can use re-write the equation as following: where M represents the template size and M is a positive integer larger than or equal to 1.
[0187] In addition to the embodiments mentioned above, we can also adaptively adjust the refinement applied to the predictor based on block size, block area, temporal ID or slice QP.
[0188] 5.3 Relationship between Coding Parameter and Refinement Applied on Predictor
[0189] As we just mentioned in Section 5.1. The output of numerical model is OffsetSPO, i, j which represents the sample-based prediction offset for the predictor at (i, j) . The final predictor will be Predi, j= Predi, j+OffsetSPO, i, j×Scale (coding parameters)
[0190] Scale is a function of decoder-side available coding parameters. In this section, we will enumerate how to implement the scale function. The relationship between Scale and coding parameters mainly involves two types: linear transformation and nonlinear transformation. The nonlinear transformation part can be presented using a lookup table.
[0191] In one embodiment, scale is positively correlated with block MAD, and the relationship between them is linear. According to one embodiment: Scale=s× (block MAD -c) +b, Scale=clip (Scale , min=0) .where s represents the slope of linear transform and s is a positive floating number, and c and b are positive floating numbers.
[0192] In another embodiment, we can use different linear transform based on different block area to obtain the scale.
[0193] According to another embodiment: Scale=s× (block MAD -c) +b, Scale=clip (Scale , min=0) .
[0194] If the block area is larger than 16×16, (s, b , c) = (0.5 , 96, 0) . Otherwise, (s, b , c) =(0.3, 96, 0)
[0195] In another embodiment, scale is positively correlated with block MAD, and the relationship between them is non-linear. The nonlinear transformation will be presented using a lookup table, as shown in Table 3, where d is a positive floating number. Table 3: Nonlinear relationship between scale and block MAD
[0196] There is another way to adaptively adjust the refinement amount applied on predictor. For numerical model mentioned in Section 5.1, we can refer to following equation: Predi, j= Predi, j+OffsetSPO, i, j;
[0197] For numerical model mentioned in Section 5.2, we can refer to following equation: Predi, j= Predi, j+OffsetSPO, i, j where codingParamabove and codingParamleft can be different or the same.
[0198] According to yet another embodiment: Scale is positively and linearly correlated with local MAD. The relationship is as follows: Scaleabove=s× (local MADabove -c) +b Scaleabove=clip (Scaleabove, min=0) Scaleleft=s× (local MADleft -c) +b Scaleleft=clip (Scaleleft, min=0) .
[0199] The calculation method for local MAD is as follows: where h is a non-negative integer smaller than the height of block, H. where w is a non-negative integer smaller than the width of block, W.
[0200] Our proposed method can be applied not only to the luma component but also to the chroma component.
[0201] The scale function for luma and the scale function for chroma can be different or the same.
[0202] Alternatively, there can be a linear relationship between the chroma scale and the luma scale, such as Scalechroma=a×Scaleluma+b.
[0203] Any of the foregoing proposed methods of adaptive prediction refinement can be implemented in encoders and / or decoders. For example, any of the proposed methods can be implemented in an inter / intra / prediction module of an encoder, and / or an inter / intra / prediction module of a decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter / intra / prediction module of the encoder and / or the inter / intra / prediction module of the decoder, so as to provide the information needed by the inter / intra / prediction module
[0204] With reference to the exemplary encoder and decoder in Fig. 1A and Fig 1B, the proposed methods can be implemented in the intra / inter prediction modules. For example, in the encoder side, the required processing can be implemented as part of the Inter-Pred. unit 112 or Intra Pred. unit 110 as shown in Fig. 1A. However, the encoder may also use additional processing unit to implement the required processing. For the decoder side, the required processing can be implemented as part of the MC unit 152 or Intra Pred. 150 as shown in Fig. 1B. However, the decoder may also use additional processing unit to implement the required processing. Alternatively, any of the proposed methods can be implemented as a circuit coupled to the inter / intra / prediction module of the encoder and / or the inter / intra / prediction module of the decoder, so as to provide the information needed by the inter / intra / prediction module. While the Inter-Pred. 112 and Intra Pred. 110 in the encoder side and MC 152 and Intra Pred. 150 in the decoder side are shown as individual processing units, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
[0205] Fig. 11 illustrates a flowchart of an exemplary video coding system that adjusts prediction refinement based on coding parameters according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to one method, input data associated with a current block is received in step 1110, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. One or more sets of Position-Related Weights (PRWs) for template samples in one or more templates and prediction samples in the current block are determined in step 1120, wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample in said one or more templates and / or a second position of each of the prediction samples in the current block. An SPO (Sample-Based Prediction Offset) is derived for a target prediction sample in the current block in step 1130, wherein the SPO comprises a sum of template differences weighted by said one or more sets of respective PRWs, wherein the template differences are derived based on the template samples. One or more coding parameters associated with the current block are determined for the target prediction sample in step 1140. The SPO is adjusted according to said one or more coding parameters to derive an adjusted SPO in step 1150. The target prediction sample is refined using the adjusted SPO to generate a refined prediction sample in step 1160. The current block is encoded or decoded by using information comprising the refined prediction sample in step 1170.
[0206] Fig. 12 illustrates a flowchart of an exemplary video coding system that uses multi-set PRWs selected based on classification for prediction refinement according to an embodiment of the present invention. According to this method, input data associated with a current block is received in step 1210, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. Classification for the current block to select a target class among multiple classes is determined in step 1220. An SPO (Sample-Based Prediction Offset) for a target prediction sample in the current block is derived according to the target class in step 1230, wherein multiple Position-Related Weights (PRWs) are used for the multiple classes and the SPO comprises a sum of template differences weighted by a target PRWs associated with the target class, wherein the template differences are derived based on reconstructed samples and reference samples of said one or more templates. The target prediction sample is refined using the SPO to generate a refined prediction sample in step 1240. The current block is encoded or decoded by using information comprising the refined prediction sample in step 1250.
[0207] The flowcharts shown are intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.
[0208] The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
[0209] Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
[0210] The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims
1.A method of video coding, the method comprising:receiving input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;determining one or more sets of Position-Related Weights (PRWs) for template samples in one or more templates and prediction samples in the current block, wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample in said one or more templates and / or a second position of each of the prediction samples in the current block;deriving an SPO (Sample-Based Prediction Offset) for a target prediction sample in the current block, wherein the SPO comprises a sum of template differences weighted by said one or more sets of respective PRWs, wherein the template differences are derived based on the template samples;determining one or more coding parameters associated with the current block for the target prediction sample;adjusting the SPO according to said one or more coding parameters to derive an adjusted SPO;refining the target prediction sample using the adjusted SPO to generate a refined prediction sample; andencoding or decoding the current block by using information comprising the refined prediction sample.2.The method of Claim 1, wherein the SPO further comprises a second weighted sum of the target prediction sample and neighbouring prediction samples of the target prediction sample.3.The method of Claim 1, wherein said one or more coding parameters comprise a block variance or a mean of absolute differences of the prediction samples of the current block.4.The method of Claim 3, wherein the block variance or the mean of absolute differences of the prediction samples of the current block is derived based on down-sampled prediction samples.5.The method of Claim 1, wherein said one or more coding parameters comprise a local variance or a local mean of absolute differences of the target prediction sample derived based on the target prediction sample and one or more neighbouring prediction samples of the target prediction sample.6.The method of Claim 1, wherein said one or more coding parameters comprise an average of absolute differences between a reference template and a reconstructed template.7.The method of Claim 1, wherein said one or more coding parameters comprise block size, block area, temporal ID or slice QP.8.The method of Claim 1, wherein a scale is derived based on said one or more coding parameters according to a linear function or a non-linear function, and the SPO is scaled by the scale to derive the adjusted SPO.9.The method of Claim 8, wherein the scale is clipped to a range before the scale is used to adjust the SPO.10.The method of Claim 8, wherein when the scale is derived based on said one or more coding parameters according to the non-linear function, values of the scale for said one or more coding parameters are stored in a lookup table.11.An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:receive input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;determine one or more sets of Position-Related Weights (PRWs) for template samples in one or more templates and prediction samples in the current block, wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample in said one or more templates and / or a second position of each of the prediction samples in the current block;derive an SPO (Sample-Based Prediction Offset) for a target prediction sample in the current block, wherein the SPO comprises a sum of template differences weighted by said one or more sets of respective PRWs, wherein the template differences are derived based on the template samples;determine one or more coding parameters associated with the current block for the target prediction sample;adjust the SPO according to said one or more coding parameters to derive an adjusted SPO;refine the target prediction sample using the adjusted SPO to generate a refined prediction sample; andencode or decode the current block by using the refined prediction samples.12.A method of video coding, the method comprising:receiving input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;determining classification for the current block to select a target class among multiple classes;deriving an SPO (Sample-Based Prediction Offset) for a target prediction sample in the current block according to the target class, wherein multiple Position-Related Weights (PRWs) are used for the multiple classes and the SPO comprises a sum of template differences weighted by a target PRWs associated with the target class, wherein the template differences are derived based on reconstructed samples and reference samples of said one or more templates;refining the target prediction sample using the SPO to generate a refined prediction sample; andencoding or decoding the current block by using information comprising the refined prediction sample.13.The method of Claim 12, wherein the classification for the current block is determined according to whether the current block is coded in skip mode, CBF (Coded Block Flag) flag, or both.14.The method of Claim 12, wherein the classification for the current block is determined according to block variance, local variance, intensity of one or more samples, or a combination thereof.15.The method of Claim 12, wherein the classification for the current block is determined according to template differences, similarity of template differences between reconstructed template samples and reference template samples, similarity of differences between the target prediction sample and the reference template samples, or a combination thereof.16.The method of Claim 12, wherein the classification for the current block is determined according to slice video resolution, slice QP (Quantization Parameter) , slice TID (Temporal ID) , or a combination thereof.17.An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:receive input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;determine classification for the current block to select a target class among multiple classes;derive an SPO (Sample-Based Prediction Offset) for a target prediction sample in the current block according to the target class, wherein multiple Position-Related Weights (PRWs) are used for the multiple classes and the SPO comprises a sum of template differences weighted by a target PRWs associated with the target class, wherein the template differences are derived based on reconstructed samples and reference samples of said one or more templates;refine the target prediction sample using the SPO to generate a refined prediction sample; andencode or decode the current block by using information comprising the refined prediction sample.