Methods and apparatus of prediction blending using neighbouring samples for video coding
By blending predictors at neighboring locations using specific shapes and weights, the method addresses inefficiencies in existing video coding technologies, enhancing prediction accuracy and coding efficiency for complex video content.
Patent Information
- Authority / Receiving Office
- WO · WO
- Patent Type
- Applications
- Current Assignee / Owner
- MEDIATEK INC
- Filing Date
- 2025-12-24
- Publication Date
- 2026-07-02
AI Technical Summary
Existing video coding technologies face challenges in improving prediction accuracy and coding efficiency, particularly in handling complex video content with high computational complexity and bit-depth requirements in processes like bi-directional optical flow and geometric partitioning.
The method involves generating a blended predictor by combining predictors at neighboring locations of a target sample using blending weights, which can include neighboring samples within a specific footprint shape, and applying this approach in various coding modes such as bi-prediction with CU-level weights, geometric partitioning, combined inter and intra prediction, and intra fusion.
This approach enhances prediction accuracy and coding efficiency by reducing computational complexity and bit-depth requirements, leading to improved video quality and compression performance.
Smart Images

Figure CN2025145074_02072026_PF_FP_ABST
Abstract
Description
METHODS AND APPARATUS OF PREDICTION BLENDING USING NEIGHBOURING SAMPLES FOR VIDEO CODINGCROSS REFERENCE TO RELATED APPLICATIONS
[0001] The present invention is a non-Provisional Application of and claims priority to U.S. Provisional Patent Application No. 63 / 738,633, filed on December 24, 2024. The U.S. Provisional Patent Application is hereby incorporated by reference in its entirety.FIELD OF THE INVENTION
[0002] The present invention relates to video coding using blended predictor. In particular, the present invention discloses blended prediction using predictors at neighbouring locations of a target sample to be predicted. BACKGROUND AND RELATED ART
[0003] Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO / IEC Moving Picture Experts Group (MPEG) . The standard has been published as an ISO standard: ISO / IEC 23090-3: 2021, Information technology -Coded representation of immersive media -Part 3: Versatile video coding, published Feb. 2021. VVC is developed based on its predecessor HEVC (High Efficiency Video Coding) by adding more coding tools to improve coding efficiency and also to handle various types of video sources including 3-dimensional (3D) video signals.
[0004] Fig. 1A illustrates an exemplary adaptive Inter / Intra video encoding system incorporating loop processing. For Intra Prediction 110, the prediction data is derived based on previously coded video data in the current picture. For Inter Prediction 112, Motion Estimation (ME) is performed at the encoder side and Motion Compensation (MC) is performed based on the result of ME to provide prediction data derived from other picture (s) and motion data. Switch 114 selects Intra Prediction 110 or Inter Prediction 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transform (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to be included in a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion and coding modes associated with Intra prediction and Inter prediction, and other information such as parameters associated with loop filters applied to underlying image area. The side information associated with Intra Prediction 110, Inter prediction 112 and in-loop filter 130, is provided to Entropy Encoder 122 as shown in Fig. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
[0005] As shown in Fig. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, in-loop filter 130 is often applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. For example, deblocking filter (DF) , Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) may be used. The loop filter information may need to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, loop filter information is also provided to Entropy Encoder 122 for incorporation into the bitstream. In Fig. 1A, Loop filter 130 is applied to the reconstructed video before the reconstructed samples are stored in the reference picture buffer 134. The system in Fig. 1A is intended to illustrate an exemplary structure of a typical video encoder. It may correspond to the High Efficiency Video Coding (HEVC) system, VP8, VP9, H. 264 or VVC.
[0006] The decoder, as shown in Fig. 1B, can use some of the functional blocks as the encoder. For example, the decoder can reuse Inverse Quantization 124 and Inverse Transform 126; however, Transform 118 and Quantization 120 are not needed at the decoder. Instead of Entropy Encoder 122, the decoder uses an Entropy Decoder 140 to decode the video bitstream into quantized transform coefficients and needed coding information (e.g. ILPF information, Intra prediction information and Inter prediction information) . The Intra prediction 150 at the decoder side does not need to perform the mode search. Instead, the decoder only needs to generate Intra prediction according to Intra prediction information received from the Entropy Decoder 140. Furthermore, for Inter prediction, the decoder only needs to perform motion compensation (MC 152) according to Inter prediction information received from the Entropy Decoder 140 without the need for motion estimation.
[0007] In VVC, the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS) contain high-level syntax elements that apply to entire coded video sequences and pictures, respectively. The Picture Header (PH) and Slice Header (SH) contain high-level syntax elements that apply to a current coded picture and a current coded slice, respectively.
[0008] In VVC, a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) . A coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order. A bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors and reference indices to predict the sample values of each block. A predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block. An intra (I) slice is decoded using intra prediction only.
[0009] Bi-Prediction with CU-level Weight (BCW)
[0010] In HEVC, the bi-prediction signal, Pbi-pred is generated by averaging two prediction signals, P0 and P1 obtained from two different reference pictures and / or using two different motion vectors. In VVC, the bi-prediction mode is extended beyond simple averaging to allow weighted averaging of the two prediction signals. Pbi-pred = ( (8-w) *P0+w*P1+4)>>3 (3)
[0011] Five weights are allowed in the weighted averaging bi-prediction, w∈ {-2, 3, 4, 5, 10} . For each bi-predicted CU, the weight w is determined in one of two ways: 1) for a non-merge CU, the weight index is signalled after the motion vector difference; 2) for a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index. BCW is only applied to CUs with 256 or more luma samples (i.e., CU width times CU height is greater than or equal to 256) . For low-delay pictures, all 5 weights are used. For non-low-delay pictures, only 3 weights (w ∈ {3, 4, 5} ) are used. At the encoder, fast search algorithms are applied to find the weight index without significantly increasing the encoder complexity. These algorithms are summarized as follows. The details are disclosed in the VTM software and document JVET-L0646 (Yu-Chi Su, et. al., “CE4-related: Generalized bi-prediction improvements combined from JVET-L0197 and JVET-L0296” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 12th Meeting: Macao, CN, 3–12 Oct. 2018, Document: JVET-L0646) . –When combined with AMVR, unequal weights are only conditionally checked for 1-pel and 4-pel motion vector precisions if the current picture is a low-delay picture. –When combined with affine, affine ME will be performed for unequal weights if and only if the affine mode is selected as the current best mode. –When the two reference pictures in bi-prediction are the same, unequal weights are only conditionally checked. –Unequal weights are not searched when certain conditions are met, depending on the POC distance between current picture and its reference pictures, the coding QP, and the temporal level.
[0012] The BCW weight index is coded using one context coded bin followed by bypass coded bins. The first context coded bin indicates if equal weight is used; and if unequal weight is used, additional bins are signalled using bypass coding to indicate which unequal weight is used.
[0013] Weighted prediction (WP) is a coding tool supported by the H. 264 / AVC and HEVC standards to efficiently code video content with fading. Support for WP is also added into the VVC standard. WP allows weighting parameters (weight and offset) to be signalled for each reference picture in each of the reference picture lists L0 and L1. Then, during motion compensation, the weight (s) and offset (s) of the corresponding reference picture (s) are applied. WP and BCW are designed for different types of video content. In order to avoid interactions between WP and BCW, which will complicate VVC decoder design, if a CU uses WP, then the BCW weight index is not signalled, and weight w is inferred to be 4 (i.e. equal weight is applied) . For a merge CU, the weight index is inferred from neighbouring blocks based on the merge candidate index. This can be applied to both the normal merge mode and inherited affine merge mode. For the constructed affine merge mode, the affine motion information is constructed based on the motion information of up to 3 blocks. The BCW index for a CU using the constructed affine merge mode is simply set equal to the BCW index of the first control point MV.
[0014] In VVC, CIIP and BCW cannot be jointly applied for a CU. When a CU is coded with CIIP mode, the BCW index of the current CU is set to 2, (i.e., w=4 for equal weight) . Equal weight implies the default value for the BCW index.
[0015] Bi-directional Optical Flow (BIO) / BDOF
[0016] Bi-directional optical flow (BIO or BDOF) is motion estimation / compensation technique disclosed in JCTVC-C204 (E. Alshina, et al., Bi-directional optical flow, Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29 / WG 11, 3rd Meeting: Guangzhou, CN, 7-15 October, 2010, Document: JCTVC-C204) and VCEG-AZ05 (E. Alshina, et al., Known tools performance investigation for next generation video coding, ITU-T SG 16 Question 6, Video Coding Experts Group (VCEG) , 52nd Meeting: 19–26 June 2015, Warsaw, Poland, Document: VCEG-AZ05) . BIO derived the sample-level motion refinement based on the assumptions of optical flow and steady motion as shown in Fig. 2, where a current pixel 222 in a B-slice (bi-prediction slice) 220 is predicted by one pixel (232) in reference picture 0 (230) and one pixel (212) in reference picture 1 (210) . As shown in Fig. 2, the current pixel 222 is predicted by pixel B 212 in reference picture 1 (210) and pixel A 232 in reference picture 0 (230) . In Fig. 2, vx and vy are pixel displacement vector in the x-direction and y-direction, which are derived using a bi-directional optical flow (BIO) model. It is applied only for truly bi-directional predicted blocks, which is predicted from two reference pictures corresponding to the previous picture and the latter picture. In VCEG-AZ05, BIO utilizes a 5x5 window to derive the motion refinement of each sample. Therefore, for an NxN block, the motion compensated results and corresponding gradient information of an (N+4) x (N+4) block are required to derive the sample-based motion refinement for the NxN block. According to VCEG-AZ05, a 6-Tap gradient filter and a 6-Tap interpolation filter are used to generate the gradient information for BIO. Therefore, the computational complexity of BIO is much higher than that of traditional bi-directional prediction. In order to further improve the performance of BIO, the following methods are proposed.
[0017] In a conventional bi-prediction in HEVC, the predictor is generated using equation (1) , where P (0) and P (1) are the list0 and list1 predictor, respectively. PConventional [i, j] =(P (0) [i, j] +P (1) [i, j] +1)>>1 (1)
[0018] In JCTVC-C204 and VECG-AZ05, the BIO predictor is generated using equation (2) . POpticalFlow = (P (0) [i, j] +P (1) [i, j] +vx [i, j] (Ix (0) -Ix (1) [i, j] ) + vy[i, j] (Iy (0) -Iy (1) [i, j] ) +1) >>1 (2)
[0019] In equation (2) , Ix (0) and Ix (1) represent the x-directional gradient in list0 and list1 predictor, respectively; Iy (0) and Iy (1) represent the y-directional gradient in list0 and list1 predictor, respectively; vx and vy represent the offsets or displacements in x-and y-direction, respectively. The derivation process of vx and vy is shown in the following. First, the cost function is defined as diffCost (x, y) to find the best values vx and vy. In order to find the best values vx and vy to minimize the cost function, diffCost (x, y) , one 5x5 window is used. The solutions of vx and vy can be represented by using S1, S2, S3, S5, and S6.
[0020] The minimum cost function, min diffCost (x, y) can be derived according to:
[0021] By solving equations (3) and (4) , vx and vy can be solved according to eqn. (5) : where,
[0022] In some related art, the S2 can be ignored and then we can further simplify the equations as: where
[0023] We can find that the required bit-depth is large in BIO process, especially for calculating S1, S2, S3, S5, and S6. For example, if the bit-depth of pixel value in video sequences is 10 bits and the bit-depth of gradients is increased by fractional interpolation filter or gradient filter, then 16 bits are required to represent one x-directional gradient or one y-directional gradient. These 16 bits may be further reduced by gradient shift equal to 4, so one gradient needs 12 bits to represent the value. Even if the magnitude of gradient can be reduced to 12 bits by gradient shift, the required bit-depth of BIO operations is still large. One multiplier with 13 bits by 13 bits is required to calculate S1, S2, and S5, and another multiplier with 13 bits by 17 bits is required to get S3, and S6. When the window size is large, more than 32 bits may be required to represent S1, S2, S3, S5, and S6.
[0024] Geometric Partitioning Mode (GPM)
[0025] In VVC, a Geometric Partitioning Mode (GPM) is supported for inter prediction as described in JVET-W2002 (Adrian Browne, et al., Algorithm description for Versatile Video Coding and Test Model 14 (VTM 14) , ITU-T / ISO / IEC Joint Video Exploration Team (JVET) , 23rd Meeting, by teleconference, 7–16 July 2021, document: document JVET-M2002) . The geometric partitioning mode is signalled using a CU-level flag as one kind of merge mode, with other merge modes including the regular merge mode, the MMVD mode, the CIIP mode and the subblock merge mode. A total of 64 partitions are supported by geometric partitioning mode for each possible CU size, w×h=2m×2n with m, n ∈ {3…6} excluding 8x64 and 64x8. The GPM mode can be applied to skip or merge CUs having a size within the above limit and having at least two regular merge modes.
[0026] When this mode is used, a CU is split into two parts by a geometrically located straight line in certain angles. In VVC, there are a total of 20 angles and 4 offset distances used for GPM, which has been reduced from 24 angles in an earlier draft. The location of the splitting line is mathematically derived from the angle and offset parameters of a specific partition. In VVC, there are a total of 64 partitions as shown in Fig. 3, where the partitions are grouped according to their angles and dashed lines indicate redundant partitions. Each part of a geometric partition in the CU is inter-predicted using its own motion; only uni-prediction is allowed for each partition, that is, each part has one motion vector and one reference index. In Fig. 3, each line corresponds to the boundary of one partition. The partitions are grouped according to its angle. For example, partition group 310 consists of three vertical GPM partitions (i.e., 90°) . Partition group 320 consists of four slant GPM partitions with a small angle from the vertical direction. Also, partition group 330 consists of three vertical GPM partitions (i.e., 270°) similar to those of group 310, but with an opposite direction. The uni-prediction motion constraint is applied to ensure that only two motion compensated prediction are needed for each CU, same as the conventional bi-prediction. The uni-prediction motion for each partition is derived using the process described later.
[0027] GPM Blending Along the Geometric Partitioning Edge
[0028] After predicting each part of a geometric partition using its own motion, blending is applied to the two prediction signals to derive samples around geometric partition edge. The blending weight for each position of the CU are derived based on the distance between individual position and the partition edge.
[0029] The distance for a position (x, y) to the partition edge are derived as: where i, j are the indices for angle and offset of a geometric partition, which depend on the signaled geometric partition index. The sign of ρx, j and ρy, j depend on angle index i.
[0030] Fig. 4 illustrates an example of GPM blending according to ECM 4.0 (Muhammed Coban, et. al., “Algorithm description of Enhanced Compression Model 4 (ECM 4) ” , Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO / IEC JTC 1 / SC 29, 26th Meeting, by teleconference, 20–29 April 2022, JVET-Y2025) . In Fig. 4, the size of the blending region on each side of the partition boundary is indicated by θ. The weights for each part of a geometric partition are derived as following: wIdxL (x, y) =partIdx ? 32+d (x, y) : 32-d (x, y) (11) w1(x, y) =1-w0 (x, y) (13)
[0031] The partIdx depends on the angle index i. One example of weigh w0 is illustrated in Fig. 4. In Fig. 4, line 440 corresponds to the GPM partition boundary and two thresholds (i.e., -τand τ) correspond to lines 442 and 444 in Fig 4. Furthermore, the angle 410 and offset ρi 420 are indicated for GPM index i and point 430 corresponds to the centre of the block.
[0032] Combined Inter and Intra Prediction (CIIP)
[0033] In VVC, when a CU is coded in merge mode, if the CU contains at least 64 luma samples (that is, CU width times CU height is equal to or larger than 64) , and if both CU width and CU height are less than 128 luma samples, an additional flag is signalled to indicate if the combined inter / intra prediction (CIIP) mode is applied to the current CU. As its name indicates, the CIIP prediction combines an inter prediction signal with an intra prediction signal. The inter prediction signal in the CIIP mode Pinter is derived using the same inter prediction process applied to regular merge mode; and the intra prediction signal Pintra is derived following the regular intra prediction process with the planar mode. Then, the intra and inter prediction signals are combined using weighted averaging, where the weight value wt is calculated depending on the coding modes of the top and left neighbouring blocks (as shown in Fig. 5) of current CU 510 as follows: –If the top neighbour is available and intra coded, then set isIntraTop to 1, otherwise set isIntraTop to 0; –If the left neighbour is available and intra coded, then set isIntraLeft to 1, otherwise set isIntraLeft to 0; –If (isIntraLeft + isIntraTop) is equal to 2, then wt is set to 3; –Otherwise, if (isIntraLeft + isIntraTop) is equal to 1, then wt is set to 2; –Otherwise, set wt to 1.
[0034] The CIIP prediction is formed as follows: PCIIP=( (4-wt) *Pinter+wt*Pintra+2)>>2 (14)
[0035] Intra Prediction Fusion
[0036] This intra prediction method derives predicted samples as a weighted combination of multiple predictors generated from different reference lines. In this process multiple intra predictors are generated and then fused by weighted averaging. The process of deriving the predictors to be used in the fusion process is described as follows: 1) For angular intra prediction modes including the single mode case of TIMD and DIMD, This intra prediction method derives intra prediction by weighting intra predictions obtained from multiple reference lines represented as pfusion=w0pline+w1pline+1, where pline is the intra prediction from the default reference line and pline+1 is the prediction from the line above the default reference line. The weights are set as w0=3 / 4 and w1=1 / 4. 2) For TIMD mode with blending, pline is used for the first mode (w0=1, w1=0) and pline+1 is used for the second mode (w0=0, w1=1) . 3) For DIMD mode with blending, the number of predictors selected for a weighted average is increased from 3 to 6.
[0037] The angular intra prediction fusion method is applied to luma blocks when angular intra mode has non-integer slope (required reference samples interpolation) and the block size is greater than 16, it is used with MRL and not applied for ISP coded blocks. In the method studied in the sub-test a, PDPC is applied for the intra prediction mode using the closest to the current block reference line.
[0038] The TIMD mode with blending method is applied when all the following conditions are satisfied: -both the first and second modes are angular prediction mode -the current block is not ISP coded block. -all of the following conditions are false: ○abs (predModeIntra1 –predModeIntra2) is greater than Threshold. The value of Threshold is set to 8 or 4 depending on block size. ○(predModeIntra1 -EXT_HOR_IDX) * (predModeIntra2 -EXT_HOR_IDX) is less than 0. ○(predModeIntra1 -EXT_VER_IDX) * (predModeIntra2 -EXT_VER_IDX) is less than 0.
[0039] In the present invention, methods and apparatus to derive blended prediction using predictors at neighbouring locations of a target sample to be predicted are disclosed to improve the coding performance. BRIEF SUMMARY OF THE INVENTION
[0040] A method and apparatus for video coding with blended prediction are disclosed. According to the method, input data associated with a current block is received, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. Two or more predictors for a target location of the current block are determined. One or more neighbouring predictors corresponding to said two or more predictors located at one or more neighbouring locations of the target location of the current block are determined. A blended predictor is generated by blending said two or more predictors and said one or more neighbouring predictors using blending weights. The current block is encoded or decoded by using the blended predictor.
[0041] In one embodiment, a blending footprint is formed by said one or more neighbouring locations of the target location of the current block and the target location of the current block, and wherein the blending footprint has a shape comprising a square, rectangular, diamond shape, or cross-shape, or any combination thereof. In one embodiment, the shape of the blending footprint is a 3x3 square shape or a 3x3 cross shape. In one embodiment, if any of said one or more neighbouring locations of the target location of the current block is not available, repetitive padding is used to generate unavailable sample. In one embodiment, if any of said one or more neighbouring locations of the target location of the current block is not available, a larger prediction block is used to generate an unavailable sample.
[0042] In one embodiment, the blending weights re fixed at both the encoder side and the decoder side. In another embodiment, the blending weights are signalled in a sequence, picture, slice, CTU, or CU level.
[0043] In one embodiment, the blending weights are different for different block sizes and / or different position. In another embodiment, the blending weights are different for different CU (Coding Unit) modes.
[0044] In one embodiment, different blending weights are selected implicitly. In one embodiment, the different blending weights are selected according to variance calculated between current and neighbouring samples.
[0045] In one embodiment, when the shape of the blending footprint is symmetric, the blending weights are also symmetric.
[0046] In one embodiment, the blended predictor is applied when the current block is coded in a target coding mode comprising BCW (Bi-Prediction with CU-level Weight) , GPM (Geometric Partitioning Mode) , CIIP (Combined Inter and Intra Prediction) , or intra fusion mode.
[0047] In one embodiment, one or more flags in a sequence, picture, slice, CTU, CU, PU level, or a combination thereof are used to indicate whether to apply the blended predictor to the current block.BRIEF DESCRIPTION OF THE DRAWINGS
[0048] Fig. 1A illustrates an exemplary adaptive Inter / Intra video coding system incorporating loop processing.
[0049] Fig. 1B illustrates a corresponding decoder for the encoder in Fig. 1A.
[0050] Fig. 2 illustrates the process of Bi-directional optical flow (BIO) or Bi-Directional Optical Flow (BDOF) .
[0051] Fig. 3 illustrates an example of the 64 partitions used in the VVC standard, where the partitions are grouped according to their angles and dashed lines indicate redundant partitions.
[0052] Fig. 4 illustrates an example of bending weight ω0 using the geometric partitioning mode.
[0053] Fig. 5 illustrates an example of the weight value derivation for Combined Inter and Intra Prediction (CIIP) according to the coding modes of the top and left neighbouring blocks. \
[0054] Fig. 6 illustrates an example of square shape for blending target prediction with neighbouring predictions around a target position to be predicted according to an embodiment of the present invention.
[0055] Fig. 7 illustrates an example of different shapes for prediction blending according to embodiments of the present invention.
[0056] Fig. 8 illustrates a flowchart of an exemplary video coding system that derives blended prediction using predictors at neighbouring locations of a target sample to be predicted according to an embodiment of the present invention.DETAILED DESCRIPTION OF THE INVENTION
[0057] It will be readily understood that the components of the present invention, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of the embodiments of the systems and methods of the present invention, as represented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. References throughout this specification to “one embodiment, ” “an embodiment, ” or similar language mean that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment.
[0058] Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, etc. In other instances, well-known structures, or operations are not shown or described in detail to avoid obscuring aspects of the invention. The illustrated embodiments of the invention will be best understood by reference to the drawings, wherein like parts are designated by like numerals throughout. The following description is intended only by way of example, and simply illustrates certain selected embodiments of apparatus and methods that are consistent with the invention as claimed herein.
[0059] In order to improve the prediction accuracy or coding performance of cross-component prediction, various schemes related to inheriting cross-component models are disclosed.
[0060] PROPOSED METHOD
[0061] 1. Prediction Blending with Different Shapes
[0062] In the existing design, the blended output at position (x, y) only depends on the value at (x, y) for each predictor. For example, for BCW blending as shown in the following equation, the output P (x, y) bi-pred only depends on P0 (x, y) and P1 (x, y) . P(x, y) bi-pred = ( (8-w) *P0 (x, y) +w*P1 (x, y) +4)>>3.
[0063] In the above equation, P0 (x, y) and P1 (x, y) correspond to two predictors to be blended.
[0064] In the present invention, improved blending prediction is disclosed by including one or more neighbouring samples of the present sample while the conventional approach only uses predictors at the present sample location. In one embodiment, the blended output at position (x, y) not only depends on the value at (x, y) for each predictor, but also depends on its neighbouring values. For example, neighbouring 3x3 values at each predictor can be used to generate the blended output:
[0065] For integer implementation of the blending process, an additional rounding term and shift operation can be included.
[0066] In one embodiment, similar concept can be used when more than two predictors are used for blending. For example, in some fusion modes, up to M predictors are blended as shown in the following equation: where wk, i, j, Pk (i, j) are all integer values and the bit depth is predefined value or determined by encoder / decoder configuration. For example, the Pk (i, j) is in 10-bit precision, wk, i, j is in 4-bit precision. The rounding term C can be 1 << (N-1) .
[0067] For the neighbouring 3x3, the shape of the blending footprint is shown in Fig, 6.
[0068] In one embodiment, up to N neighbouring samples around the current position (x, y) can be used for generating the blending output. All or a part of the neighbouring samples are used. For example, the cross shape, diamond shape, cross + diamond shape, or any other shape can be used. For another example, wherein the blending footprint may have a shape comprising a square, rectangular, diamond shape, or cross-shape, or any combination thereof. The phrase “any combination thereof” here refers to a composite shape formed by the union of at least two of the following shapes: square, rectangle, diamond, and cross. These composite shapes may be generated by combining the respective shapes to achieve a desired blending footprint. Some examples are shown in Fig. 7, where 5 different shapes (710-750) are shown. Shape 710 is a 3x3 square. Shape 720 is a 3x3 cross-shape. Shape 730 is form by a 3x3 square shape combined with a cross shape whose arms extend two cells from the central cell of the 3x3 square shape. Shape 740 is formed by a 3x3 square shape combined with a cross shape whose arms extend four cells from the central cell of the 3x3 square shape. Shape 750 is a cross shape with a central cell and arms extending four cells in each direction from the central cell.
[0069] In one embodiment, the blending weights for current and neighbouring samples are fixed at both encoder and decoder sides. In another embodiment, the blending weights are signalled in a sequence, picture, slice, CTU, CU level.
[0070] In one embodiment, the blending weights can be different for different block sizes. In one embodiment, the blending weights can be different for each position and each block size. In one embodiment, the blending weights can be different for different CU modes. In one embodiment, different blending weights are selected by implicit methods. For example, the variance between current and neighbouring samples are calculated, and blending weights are selected accordingly.
[0071] In one embodiment, if the shape used for blending is symmetric, the constraint that the blending weights should be symmetric can be used to reduce the size for storing blending weights or reduce the size for signalling the blending weights.
[0072] In one embodiment, in hardware implementation, the blending weights are stored in a look-up table (LUT) . For example, the blending weights are size and position dependent, and the blending weights are stored in LUT called BLDW [SizeID] [Height] [Width] . The blending weights for a given block with size HxW at position (x, y) can be found at BLDW [sideIdx] [y] [x] . The sizeIdx of a block indicates derived H and W of the current block. For each BLDW [sideIdx] [y] [x] , there may be multiple weights for different predictions and neighbouring samples. In this case, to generate the blending prediction, following process should be applied: sizeIdx = mapping (H, W) . For y within blockHeight: For x within blockWidth: P(x, y) blended= (sum+ C) >>N.
[0073] In one embodiment, the above LUT can be further simplified by using shared weights for different block sizes, different positions, etc.
[0074] In one embodiment, if the neighbouring sample is outside the block boundary, repetitive padding is used for the out-of-boundary sample.
[0075] In one embodiment, if the neighbouring sample is outside the current prediction boundary, but the reconstruction samples (also called reconstructed samples in this disclosure) are available, the reconstruction samples can be used instead of using repetitive padding. For example, in the case that the above and left reconstruction samples are available for the current position at (0, 0) and 3x3 blending is used, the above 3 and left 3 neighbouring samples can be filled with reconstruction samples.
[0076] In one embodiment, if N lines of neighbouring samples are required, additional N lines of prediction sample (also called predicted sample in this disclosure) should be generated. For example, in the case that 3x3 neighbouring samples and desired 8x8 blending output are used, additional 1 line is required; thus, the size of each prediction should be 10x10.
[0077] 2. Usage of Prediction Blending on Different Modes
[0078] In one embodiment, the proposed prediction blending methods in Section 1 can be applied on any process including blending.
[0079] For example, for bi-prediction mode, the proposed blending methods can replace the original BCW blending. For another example, for CIIP modes, the proposed blending methods can replace the blending process of inter and intra predictors. In another example, for intra prediction fusion, the proposed blending methods can replace the original blending method.
[0080] For another example, for GPM modes, the proposed methods can replace the original blending method, and for different GPM partitions, different blending weights can be used. For example, one LUT table can be used for storing blending weights for each GPM mode with different block sizes and gpm_partition_idx. In one embodiment, the LUT for storing GPM blending weights can be shared to reduce the LUT size. For example, since partition with angle 45°and -45° are symmetric, LUT for these two partitions can be shared by using the symmetric property. For another example, LUT for the same angle with different distance can be shared since the shift relationship can be used.
[0081] For CU with BDOF, the blending process includes refinement terms derived according to the BDOF equations. In one embodiment, the proposed prediction blending methods are disabled for BDOF. In another embodiment, BDOF is disabled when proposed blending methods are applied. In one embodiment, the proposed blending process can be applied first, and the refinement terms of BDOF can be added to the final blended predictions.
[0082] In one embodiment, a sequence, picture, slice, CTU, CU, and or PU level flag is used to indicate whether to apply the proposed blending methods. In one embodiment, implicit method can be used to determine whether to enable the proposed blending method. For example, the proposed blending method and original blending methods can be applied on the reference template area, and which blending methods to use can be selected according to the template cost between the current reconstruction template area and the blended reference template area.
[0083] Any of the foregoing proposed methods of prediction blending including predictors at neighbouring locations of a current prediction sample can be implemented in encoders and / or decoders. For example, any of the proposed methods can be implemented in one module of an encoder and / or decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to one module of the encoder and / or decoder, so as to provide the information needed by the module used in the encoder and / or decoder.
[0084] The blended prediction as described above can be implemented in an encoder side or a decoder side. For example, any of the proposed candidate derivation method can be implemented in an Intra / Inter coding module (e.g. Intra Pred. 150 / MC 152 in Fig. 1B) in a decoder or an Intra / Inter coding module is an encoder (e.g. Intra Pred. 110 / Inter Pred. 112 in Fig. 1A) . Any of the proposed method can also be implemented as a circuit coupled to the intra / inter coding module at the decoder or the encoder. However, the decoder or encoder may also use additional processing unit to implement the required cross-component prediction processing. While the Intra Pred. units (e.g. unit 110 / 112 in Fig. 1A and unit 150 / 152 in Fig. 1B) are shown as individual processing units, they may correspond to executable software or firmware codes stored on a media, such as hard disk or flash memory, for a CPU (Central Processing Unit) or programmable devices (e.g. DSP (Digital Signal Processor) or FPGA (Field Programmable Gate Array) ) .
[0085] Fig. 8 illustrates a flowchart of an exemplary video coding system that derives blended prediction using predictors at neighbouring locations of a target sample to be predicted according to an embodiment of the present invention. The steps shown in the flowchart may be implemented as program codes executable on one or more processors (e.g., one or more CPUs) at the encoder side. The steps shown in the flowchart may also be implemented based hardware such as one or more electronic devices or processors arranged to perform the steps in the flowchart. According to the method, input data associated with a current block is received in step 810, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side. Two or more predictors for a target location of the current block are determined in step 820. One or more neighbouring predictors corresponding to said two or more predictors located at one or more neighbouring locations of the target location of the current block are determined in step 830. A blended predictor is generated by blending said two or more predictors and said one or more neighbouring predictors using blending weights in step 840. The current block is encoded or decoded by using the blended predictor in step 850.
[0086] The flowchart shown is intended to illustrate an example of video coding according to the present invention. A person skilled in the art may modify each step, re-arranges the steps, split a step, or combine steps to practice the present invention without departing from the spirit of the present invention. In the disclosure, specific syntax and semantics have been used to illustrate examples to implement embodiments of the present invention. A skilled person may practice the present invention by substituting the syntax and semantics with equivalent syntax and semantics without departing from the spirit of the present invention.
[0087] The above description is presented to enable a person of ordinary skill in the art to practice the present invention as provided in the context of a particular application and its requirement. Various modifications to the described embodiments will be apparent to those with skill in the art, and the general principles defined herein may be applied to other embodiments. Therefore, the present invention is not intended to be limited to the particular embodiments shown and described, but is to be accorded the widest scope consistent with the principles and novel features herein disclosed. In the above detailed description, various specific details are illustrated in order to provide a thorough understanding of the present invention. Nevertheless, it will be understood by those skilled in the art that the present invention may be practiced.
[0088] Embodiment of the present invention as described above may be implemented in various hardware, software codes, or a combination of both. For example, an embodiment of the present invention can be one or more circuit circuits integrated into a video compression chip or program code integrated into video compression software to perform the processing described herein. An embodiment of the present invention may also be program code to be executed on a Digital Signal Processor (DSP) to perform the processing described herein. The invention may also involve a number of functions to be performed by a computer processor, a digital signal processor, a microprocessor, or field programmable gate array (FPGA) . These processors can be configured to perform particular tasks according to the invention, by executing machine-readable software code or firmware code that defines the particular methods embodied by the invention. The software code or firmware code may be developed in different programming languages and different formats or styles. The software code may also be compiled for different target platforms. However, different code formats, styles and languages of software codes and other means of configuring code to perform the tasks in accordance with the invention will not depart from the spirit and scope of the invention.
[0089] The invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described examples are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
Claims
1.A method of video coding, the method comprising:receiving input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;determining two or more predictors for a target location of the current block;determining one or more neighbouring predictors corresponding to said two or more predictors located at one or more neighbouring locations of the target location of the current block;generating a blended predictor by blending said two or more predictors and said one or more neighbouring predictors using blending weights; andencoding or decoding the current block by using the blended predictor.2.The method of Claim 1, wherein a blending footprint is formed by said one or more neighbouring locations of the target location of the current block and the target location of the current block, and wherein the blending footprint has a shape comprising a square, rectangular, diamond shape, or cross-shape, or any combination thereof.3.The method of Claim 2, wherein the shape of the blending footprint is a 3x3 square shape or a 3x3 cross shape.4.The method of Claim 1, wherein if any of said one or more neighbouring locations of the target location of the current block is not available, repetitive padding is used to generate unavailable sample.5.The method of Claim 1, wherein if any of said one or more neighbouring locations of the target location of the current block is not available, a larger prediction block is used to generate unavailable sample.6.The method of Claim 1, wherein the blending weights are fixed at both the encoder side and the decoder side.7.The method of Claim 1, wherein the blending weights are signalled in a sequence, picture, slice, CTU, or CU level.8.The method of Claim 1, wherein the blending weights are different for different block sizes and / or different position.9.The method of Claim 1, wherein the blending weights are different for different CU (Coding Unit) modes.10.The method of Claim 1, wherein different blending weights are selected by implicitly.11.The method of Claim 10, wherein the different blending weights are selected according to variance calculated between current and neighbouring samples.12.The method of Claim 2, wherein when the shape of the blending footprint is symmetric, the weights are also symmetric.13.The method of Claim 1, wherein the blended predictor is applied when the current block is coded in target coding mode comprising BCW (Bi-Prediction with CU-level Weight) , GPM (Geometric Partitioning Mode) , CIIP (Combined Inter and Intra Prediction) , or intra fusion mode.14.The method of Claim 1, wherein one or more flags in sequence, picture, slice, CTU, CU, PU level, or a combination thereof are used to indicate whether to apply the blended predictor to the current block.15.An apparatus for video coding, the apparatus comprising one or more electronics or processors arranged to:receive input data associated with a current block, wherein the input data comprise pixel data for the current block to be encoded at an encoder side or coded data associated with the current block to be decoded at a decoder side;determine two or more predictors for a target location of the current block;determine one or more neighbouring predictors corresponding to said two or more predictors located at one or more neighbouring locations of the target location of the current block;generate a blended predictor by blending said two or more predictors and said one or more neighbouring predictors using blending weights; andencode or decode the current block by using the blended predictor.