Prediction refinement using template area for intra block copy and intra template matching

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
The use of template areas and block vectors in video coding addresses inefficiencies in handling local illumination and spatial correlations, enhancing prediction accuracy and efficiency in video compression.

WO2026139054A1PCT designated stage Publication Date: 2026-07-02MEDIATEK INC

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: MEDIATEK INC
Filing Date: 2025-12-26
Publication Date: 2026-07-02

Application Information

Patent Timeline

26 Dec 2025

Application

02 Jul 2026

Publication

WO2026139054A1

IPC: H04N19/52; H04N19/176

AI Tagging

Technology Topics

Template matching Algorithm

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing video coding standards face challenges in efficiently handling local illumination variations and spatial correlations within video frames, particularly in screen content, leading to inefficiencies in prediction processes.

Method used

The method employs prediction refinement techniques using template areas identified by block vectors, which include intra block copy (IBC) and intra template matching (IntraTMP), along with local illumination compensation (LIC) and various filtering methods, to enhance prediction accuracy and efficiency.

Benefits of technology

This approach improves prediction accuracy and coding efficiency by compensating for local illumination changes and leveraging spatial correlations within frames, resulting in enhanced video compression performance.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN2025146115_02072026_PF_FP_ABST

Patent Text Reader

Abstract

A video coder determines a template area of the current block having a set of template samples and uses a block vector to locate a reference block. The video coder determines Position-Related Weights (PRWs) for the template samples and derives a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by the respective PRWs. The derived template samples are derived from a reconstructed template and a target template. Each PRW is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block. The target prediction sample is generated by using the reference block. The video coder applies the SPO to the target prediction sample to generate a refined prediction sample for generating a prediction of the current block.

Need to check novelty before this filing date? Find Prior Art

Description

PREDICTION REFINEMENT USING TEMPLATE AREA FOR INTRA BLOCK COPY AND INTRA TEMPLATE MATCHINGCROSS REFERENCE TO RELATED PATENT APPLICATION (S)

[0001] The present disclosure is part of a non-provisional application that claims the priority benefit of U.S. Provisional Patent Application No. 63 / 738,891, filed on 26 December 2024. Content of above-listed applications is herein incorporated by reference.TECHNICAL FIELD

[0002] The present disclosure relates generally to video coding. In particular, the present disclosure relates to methods of coding pixel blocks by prediction refinement using template areas.BACKGROUND

[0003] Unless otherwise indicated herein, approaches described in this section are not prior art to the claims listed below and are not admitted as prior art by inclusion in this section.

[0004] High-Efficiency Video Coding (HEVC) is an international video coding standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) . HEVC is based on the hybrid block-based motion-compensated DCT-like transform coding architecture. The basic unit for compression, termed coding unit (CU) , is a 2Nx2N square block of pixels, and each CU can be recursively split into four smaller CUs until the predefined minimum size is reached. Each CU contains one or multiple prediction units (PUs) .

[0005] Versatile video coding (VVC) is the latest international video coding standard developed by the Joint Video Expert Team (JVET) of ITU-T SG16 WP3 and ISO / IEC JTC1 / SC29 / WG11. The input video signal is predicted from the reconstructed signal, which is derived from the coded picture regions. The prediction residual signal is processed by a block transform. The transform coefficients are quantized and entropy coded together with other side information in the bitstream. The reconstructed signal is generated from the prediction signal and the reconstructed residual signal after inverse transform on the de-quantized transform coefficients. The reconstructed signal is further processed by in-loop filtering for removing coding artifacts. The decoded pictures are stored in the frame buffer for predicting the future pictures in the input video signal.

[0006] In VVC, a coded picture is partitioned into non-overlapped square block regions represented by the associated coding tree units (CTUs) . The leaf nodes of a coding tree correspond to the coding units (CUs) . A coded picture can be represented by a collection of slices, each comprising an integer number of CTUs. The individual CTUs in a slice are processed in raster-scan order. A bi-predictive (B) slice may be decoded using intra prediction or inter prediction with at most two motion vectors (MVs) and reference indices to predict the sample values of each block. A predictive (P) slice is decoded using intra prediction or inter prediction with at most one motion vector and reference index to predict the sample values of each block. An intra (I) slice is decoded using intra prediction only.

[0007] A CTU can be partitioned into one or multiple non-overlapped coding units (CUs) using the quadtree (QT) with nested multi-type-tree (MTT) structure to adapt to various local motion and texture characteristics.

[0008] Each CU contains one or more prediction units (PUs) . The prediction unit, together with the associated CU syntax, works as a basic unit for signaling the predictor information. The specified prediction process is employed to predict the values of the associated pixel samples inside the PU. Each CU may contain one or more transform units (TUs) for representing the prediction residual blocks. A transform unit (TU) is comprised of a transform block (TB) of luma samples and two corresponding transform blocks of chroma samples and each TB correspond to one residual block of samples from one color component. An integer transform is applied to a transform block. The level values of quantized coefficients together with other side information are entropy coded in the bitstream. The terms coding tree block (CTB) , coding block (CB) , prediction block (PB) , and transform block (TB) are defined to specify the 2-D sample array of one-color component associated with CTU, CU, PU, and TU, respectively. Thus, a CTU consists of one luma CTB, two chroma CTBs, and associated syntax elements. A similar relationship is valid for CU, PU, and TU.

[0009] For each inter-predicted CU, motion parameters consisting of motion vectors, reference picture indices and reference picture list usage index, and additional information are used for inter-predicted sample generation. The motion parameter can be signalled in an explicit or implicit manner. When a CU is coded with skip mode, the CU is associated with one PU and has no significant residual coefficients, no coded motion vector delta or reference picture index. A merge mode is specified whereby the motion parameters for the current CU are obtained from neighbouring CUs, including spatial and temporal candidates, and additional schedules introduced in VVC. The merge mode can be applied to any inter-predicted CU. The alternative to merge mode is the explicit transmission of motion parameters, where motion vector, corresponding reference picture index for each reference picture list and reference picture list usage flag and other needed information are signalled explicitly per each CU.SUMMARY

[0010] The following summary is illustrative only and is not intended to be limiting in any way. That is, the following summary is provided to introduce concepts, highlights, benefits and advantages of the novel and non-obvious techniques described herein. Select and not all implementations are further described below in the detailed description. Thus, the following summary is not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter.

[0011] Some embodiments of the disclosure provide a method for coding pixel blocks using prediction refinement based on template areas identified by block vectors is provided. A video coder determines a template area of the current block. The template area of the current block comprises a set of template samples. The video coder uses a block vector associated with the current block to locate a reference block in the current picture. The video coder determines one or more sets of Position-Related Weights (PRWs) for the set of template samples. The video coder derives a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by said one or more sets of respective PRWs. The derived template samples are derived from a reconstructed template and a target template. Each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block. The target prediction sample is generated by using the located reference block. The video coder refines the target prediction sample using the SPO to generate a refined prediction sample. The video coder encodes or decodes the current block by using prediction comprising the refined prediction sample.

[0012] In some embodiments, the target template may be a reference template neighboring the located reference block. The target prediction sample may be a sample within an initial predictor generated by intra block copy (IBC) and the block vector is signaled in a bitstream. The reference template may be from an area of the current picture that is valid for IBC mode. The target prediction sample may be a sample within an initial predictor generated by intra template matching prediction (IntraTMP) and the block vector is derived by searching a reconstructed portion of the current picture. The target template used to derive the SPO may be identically shaped as a template neighboring the current block used for deriving the block vector.

[0013] In some embodiments, a same set of weights is applicable to generate the SPO for when the block vector is signaled in a bitstream and when the block vector is derived by searching a reconstructed portion of the current picture. In some embodiments, different sets of weights are used to generate the SPO for when different coding tools are used to generate the target prediction sample.

[0014] In some embodiments, a linear filter model is applied to an initial predictor, from which the target prediction sample is derived, before the SPO is applied to the target prediction sample. In some embodiments, the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped.

[0015] In some embodiments, N lines above the reference block and N lines right of the reference block are identified as the target template used to generate the SPO when the current picture is coded with samples horizontally flipped. In some embodiments, N lines below the reference block and N lines left of the reference block are identified as the reference template used to generate the SPO when the current picture is coded with samples vertically flipped. In some embodiments, one or more syntax element is used to indicate whether the current picture is coded with samples horizontally flipped, vertically flipped type, or not flipped. In some embodiments, the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped. In some embodiments, a linear filter model is applied to the prediction comprising the refined prediction sample before being used for decoding or decoding the current block.BRIEF DESCRIPTION OF THE DRAWINGS

[0016] The accompanying drawings are included to provide a further understanding of the present disclosure, and are incorporated in and constitute a part of the present disclosure. The drawings illustrate implementations of the present disclosure and, together with the description, serve to explain the principles of the present disclosure. It is appreciable that the drawings are not necessarily in scale as some components may be shown to be out of proportion than the size in actual implementation in order to clearly illustrate the concept of the present disclosure.

[0017] FIG. 1 shows search areas used for Intra template matching prediction (IntraTMP) .

[0018] FIG. 2 illustrates intra block copy or current picture referencing.

[0019] FIG. 3 illustrates the reference area for intra block copy (IBC) when coding a CTU.

[0020] FIG. 4 illustrates using IntraTMP block vector for IBC block.

[0021] FIGS. 5A-5B illustrate BV adjustment for horizontal flip and vertical flip.

[0022] FIG. 6 illustrates a template area of a current block.

[0023] FIG 7 shows example template areas of the current block and a reference block that is identified using BV of the current block.

[0024] FIG. 8 illustrates an example video encoder that may implement template-based prediction refinement.

[0025] FIG. 9 illustrates portions of the video encoder that implement prediction refinement using template areas identified by block vectors.

[0026] FIG. 10 conceptually illustrates a process that encode a pixel block with prediction refinement using template areas identified by block vectors.

[0027] FIG. 11 illustrates an example video decoder that may implement template-based prediction refinement.

[0028] FIG. 12 illustrates portions of the video decoder that implement prediction refinement using template areas identified by block vectors.

[0029] FIG. 13 conceptually illustrates a process that decode a pixel block with prediction refinement using template areas identified by block vectors.

[0030] FIG. 14 conceptually illustrates an electronic system with which some embodiments of the present disclosure are implemented.DETAILED DESCRIPTION

[0031] In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the relevant teachings. Any variations, derivatives and / or extensions based on teachings described herein are within the protective scope of the present disclosure. In some instances, well-known methods, procedures, components, and / or circuitry pertaining to one or more example implementations disclosed herein may be described at a relatively high level without detail, in order to avoid unnecessarily obscuring aspects of teachings of the present disclosure. I. Local Illumination Compensation (LIC)

[0032] LIC is an inter prediction technique to model local illumination variation between current block and its prediction block as a function of that between current block template and reference block template. The parameters of the function can be denoted by a scale α and an offset β, which forms a linear equation, that is, α*p [x] +β to compensate illumination changes, where p [x] is a reference sample pointed to by MV at a location x on reference picture. When wrap around motion compensation is enabled, the MV shall be clipped with wrap around offset taken into consideration. Since α and β can be derived based on current block template and reference block template, no signaling overhead is required for them.

[0033] The local illumination compensation is used for inter-coded CUs with the following modifications: (1) Intra neighbor samples can be used in LIC parameter derivation; (2) LIC is disabled for blocks with less than 32 luma samples; (3) Samples of the reference block template are generated by using MC with the block MV without rounding it to integer-pel precision. II. Current Picture Referencing

[0034] A. Intra Template Matching (IntraTMP)

[0035] Intra template matching prediction (IntraTMP) is a special intra prediction mode that copies the best prediction block from the reconstructed part of the current frame, whose L-shaped template matches the current template. For a predefined search range, the encoder searches for the most similar template to the current template in a reconstructed part of the current frame and uses the corresponding block as a prediction block. The encoder then signals the usage of this mode, and the same prediction operation is performed at the decoder side.

[0036] The prediction signal is generated by matching the L-shaped, Top-only or Left-Only causal neighbor of the current block with another block in a predefined search area as shown in FIG. 1, which shows search areas used for Intra template matching prediction (IntraTMP) . As illustrated, there are 6 predefined search areas, i.e., R1 to R6 that contain the reconstructed samples from the top and left CTUs as well as part of the reconstructed samples within the current CTU 110 that are located above, left, bottom-left and top-right to the current block 110.

[0037] Sum of absolute differences (SAD) is used as a cost function during the search for matching template. A given search order of the 6 regions is utilized, i.e., R4, R5, R6, R1, R2, and R3. Within each region, the decoder constructs a candidate list of (e.g., up to 19) template matching block vectors that are ranked in ascending order according to the template cost (SAD) . The following modes are supported: (1) Single predictor: A single predictor is selected from the candidate list; (2) Fusion of multiple predictors: multiple predictors are selected from the candidate list and blended multiple to derive the final prediction block. The blending weights are either computed from the template matching cost of each predictor, or with Wiener-filter based weight derivation method; (3) Sub-pel precision: When single predictor is used, sub-pel precisions are supported. A new candidate list is constructed by including the selected integer block vector and surrounding 1 / 2-pel and 1-4-pel sub-pel positions. The list is sorted based on the same cost function used for the integer bv search. After that, the first two candidates are allowed to be selected with one single flag being signaled from encoder to decoder; and (4) Linear filter model: A linear filter can be learned between the reference template and current template and be applied the linear filter model to reference block. This mode can be used for single predictor when sub-pel precision is not used.

[0038] Additionally, IntraTMP with local illumination compensation is allowed. The following considerations are taken: (1) Usages of LIC and FLM (CCCM-like filtering) are mutually exclusive for a given CU; (2) Usages of LIC together with fusion in intra TMP is allowed; (3) Top-only and Left-only template usage for LIC model determination is allowed for screen content coding. For camera-captured coding, only the top-left template is employed; (4) Multi Mode Linear filter model (MMLM) is supported similarly to IBC-LIC, for screen content coding.

[0039] When LIC is used for a given CU, the Intra TMP search process employs MRSAD rather than SAD distortion function.

[0040] B. Intra Block Copy (IBC)

[0041] Block matching and copy has been tried to allow selecting the reference block from within the same picture. It is observed to be not efficient when applying this concept to camera captured videos. However, the spatial correlation among pixels within the same picture is different for screen content. For a typical video with text and graphics, there are usually repetitive patterns within the same picture. Hence, intra (picture) block compensation has been observed to be very effective. The intra block copy (IBC) mode or called current picture referencing (CPR) , utilizes this characteristic. In the CPR mode, a prediction unit (PU) is predicted from a previously reconstructed block within the same picture. Further, a displacement vector (called block vector or BV) is used to signal the relative displacement from the position of the current block to that of the reference block. The prediction errors are then coded using transformation, quantization and entropy coding. FIG. 2 illustrates intra block copy or current picture referencing, which uses a block vector 220 to locate a reference block 230 within the current picture for coding the current block 210.

[0042] The block vector for IBC may identify a reference block in a particular reference area of the current picture, which is also referred to as the IBC valid region. The reference area for IBC may be extended to two CTU rows above. FIG. 3 illustrates the reference area for IBC when coding a CTU. In the figure, for a current CTU 310, the unshaded CTUs are the reference area, the hashed CTUs are invalid for the reference area. Specifically, for a CTU (m, n) to be coded, the reference area includes CTUs with index (m–2, n–2) … (W, n–2) , (0, n–1) … (W, n–1) , (0, n) … (m, n) , where W denotes the maximum horizontal index within the current tile, slice or picture. When CTU size is 256, the reference area is limited to one CTU row above. This setting ensures that for CTU size being 128 or 256, IBC does not require extra memory in the current ETM platform. The per-sample block vector search (or called local search) range is limited to [– (C << 1) , C >> 2] horizontally and [–C, C >> 2] vertically to adapt to the reference area extension, where C denotes the CTU size.

[0043] C. IBC-LIC

[0044] Intra block copy with local illumination compensation (IBC-LIC) is a coding tool which compensates the local illumination variation within a picture between the CU coded with IBC and its prediction block with a linear equation. The parameters of the linear equation are derived same as LIC for inter prediction except that the reference template is generated using block vector in IBC-LIC. IBC-LIC can be applied to IBC AMVP mode and IBC merge mode. For IBC AMVP mode, an IBC-LIC flag is signalled to indicate the use of IBC-LIC. For IBC merge mode, the IBC-LIC flag is inferred from the merge candidate.

[0045] D. IBC-GPM

[0046] Intra block copy with geometry partitioning mode (IBC-GPM) is a coding tool which divides a CU into two sub-partitions geometrically. The prediction signals of the two sub-partitions are generated using IBC and intra prediction. IBC-GPM can be applied to regular IBC merge mode or IBC TM merge mode. An intra prediction mode (IPM) candidate list is constructed using the same method as GPM with inter and intra prediction for intra prediction, and the IPM candidate list size is pre-defined as 3. There are 48 geometry partitioning modes in total, which are divided into two geometry partitioning mode sets as follows: Table 1: Geometry partitioning modes in the first geometry partitioning mode set Table 2: Geometry partitioning modes in the second geometry partitioning mode set

[0047] When IBC-GPM is used, an IBC-GPM geometry partitioning mode set flag is signalled to indicate whether the first or the second geometry partitioning mode set is selected, followed by the geometry partitioning mode index. An IBC-GPM intra flag is signalled to indicate whether intra prediction is used for the first sub-partition. When intra prediction is used for a sub-partition, an intra prediction mode index is signalled. When IBC is used for a sub-partition, a merge index is signalled.

[0048] E. IBC-CIIP

[0049] Combined intra block copy and intra prediction (IBC-CIIP) is a coding tool for a CU which uses IBC with merge mode and intra prediction to obtain two prediction signals, and the two prediction signals are weighted summed to generate the final prediction. Specifically, if the intra prediction is planar or DC mode, the final prediction is obtained as follows: P = (wibc *Pibc + ( (1<< shift –wibc) *Pintra + (1 << (shift-1) ) ) >> shift

[0050] wherein Pibc and Pintra denote the IBC prediction signal and intra prediction signal, respectively. (wibc, shift) are set equal to (13, 4) and (1, 1) for IBC merge mode and IBC AMVP mode.

[0051] F. Filtered IBC Prediction

[0052] Additional filtered IBC mode is introduced, where a filter is applied to IBC predictor, which is derived by minimizing MSE between current and reference template. Output of the filter is calculated as follows: predLumaVal = c0C + c1N + c2S + c3E + c4W + c5P + c6B

[0053] The nonlinear term P is represented as power of two of the center sample C and scaled to the sample value range of the content: P = (C*C + midVal ) >> bitDepth

[0054] The bias term B represents a scalar offset between the input and output and is set to middle luma value (512 for 10-bit content) .

[0055] This filtered mode is used as an additional mode for non-merge IBC blocks, and it is not used together with IBC-LIC, IBC-CIIP or RR-IBC. For IBC merge modes, this filtering mode is inherited when merge mode list is constructed. The mode flag is signalled before the IBC-LIC flag.

[0056] G. IntraTMP derived block vector candidate for IBC

[0057] A block vector (BV) derived from the intra template matching prediction (IntraTMP) may be used for intra block copy (IBC) . FIG. 4 illustrates using IntraTMP block vector for IBC block. In the figure, a current block 410 is coded by IBC, while the neighboring block 420 is coded by IntraTMP by searching and finding a reference block 430 having a template 435 that best matches the template 425. The reference block 430 is located by a BV 405. In some embodiments, the IntraTMP BV 405 of the neighboring block 425 is stored along with IBC BVs that are used as spatial BV candidates in IBC candidate list construction. IntraTMP block vector is stored in an IBC block vector buffer and, the current IBC block can use both IBC BV and IntraTMP BV of neighboring blocks as BV candidate for IBC BV candidate list. IntraTMP block vectors are added to the IBC block vector candidate list as spatial candidates. IntraTMP block vectors are stored in quarter-pel resolution for coding of IBC block vectors and HMVP.

[0058] H. Reconstruction-Reordered IBC (RR-IBC)

[0059] A Reconstruction-Reordered IBC (RR-IBC) mode is allowed for IBC coded blocks. When RR-IBC is applied, the samples in a reconstruction block are flipped according to a flip type of the current block. At the encoder side, the original block may be flipped before motion search and residual calculation, while the prediction block is derived without flipping. At the decoder side, the reconstruction block is flipped back to restore the original block.

[0060] Two flip methods, horizontal flip and vertical flip, are supported for RR-IBC coded blocks. A syntax flag is firstly signalled for an IBC AMVP coded block, indicating whether the reconstruction is flipped, and if it is flipped, another flag is further signaled specifying the flip type. For IBC merge and BV candidates in SGPM mode, the flip type is inherited from neighbouring blocks, without syntax signalling. Considering the horizontal or vertical symmetry, the current block and the reference block are normally aligned horizontally or vertically. Therefore, when a horizontal flip is applied, the vertical component of the BV is not signaled and inferred to be equal to 0. Similarly, the horizontal component of the BV is not signaled and inferred to be equal to 0 when a vertical flip is applied.

[0061] To better utilize the symmetry property, a flip-aware BV adjustment approach is applied to refine the block vector candidate. FIGS. 5A-5B illustrate BV adjustment for horizontal flip and for vertical flip. In the figures, (xc, yc) and (xn, yn) represent the coordinates of the center sample of a current block and its neighboring block, respectively. BVc and BVn denote the BVs of the current block and the neighboring block, respectively. Instead of directly inheriting the BVn from the neighboring block, an adjustment is made to BVc depending on whether it’s horizontal flip or vertical flip.

[0062] FIG. 5A illustrates BV adjustment for horizontal flip. As illustrated, for a current block 510 and its neighboring block 515, the horizontal component of BVc (denoted as BVc, h) after the horizontal flip is calculated by adding a motion shift to the horizontal component of BVn (denoted as BVn, h) , i.e., BVc, h = 2 (xn –xc) + BVn, h.

[0063] FIG. 5B illustrates BV adjustment for vertical flip. As illustrated, for a current block 520 and its neighboring block 525, the vertical component of BVc (denoted as BVc, v) after the vertical flip is calculated by adding a motion shift to the vertical component of BVn (denoted as BVn, v) , i.e., BVc, v = 2(yn–yc) + BVn, v. III. Prediction Refinement Using Template Area

[0064] A. Prediction Refinement Using Template with Motion from Current Block

[0065] In some embodiments, a sample-based prediction offset (SPO) is used to refine the predictor. The SPO is derived from position-related weighting (PRW) and the difference between reconstructed template and reference template (DRR) . The PRW for each sample in the predictor to be refined is related to the template position and sample position according to the block size. The SPO for each sample in the predictor to be refined is the sum of DRR multiplied by the sample’s PRW. The SPO is then added to the predictor for each sample. Clipping is optional to apply on it before output.

[0066] The template can be one line or multiple lines. Top-left, top-right, bottom-left areas can be viewed as template area whenever these areas are available on both encode side and decode side. FIG. 6 illustrates a template area of a current block. As illustrated, the neighboring closest N rows and / or the neighboring closest M columns of the current block above or left of the current block (N and M are integers) are used as the template area of the current block. The template area includes M+wext columns above the current block and N+hext rows left of the current block.

[0067] In some embodiments, wext / hext can be defined as any positive number specified in the standard. For example, the positive number is set as a fixed number as 4, 8, 16, or 32, ... etc. For another example, the positive wext number is decided according to W such as k*W where k can be 1, 2, 3, or 4, ... etc. For another example, the positive number wext is decided according to W and H such as k* (W+H) where k can be 1, 2, 3, or 4 …etc. For another example, the positive number is decided according to at least one syntax element signaled at CU, CTU, SPS, PPS, slice, tile, picture, or sequence level.

[0068] In some embodiments, (example 1) a sample-based prediction offset may be derived to refine the predictor. In the first step, the template difference DiffTemp, x, y is calculated between the reconstructed template RecTemp, x, y of current block and the reference template RefTemp, x, y of predictor. The x and y are the related position to the top-left (TL) sample in the predictor. For example, x = 0 and y = -1 means the template sample above the TL sample. DiffTemp, x, y=RecTemp, x, y-RefTemp, x, y

[0069] The position-based weighting of each sample in the predictor came from a pretrained table LUTPRW, w, h, x, y, i, j according to the block width w and height h. The i and j are the related position to the top-left (TL) sample in the predictor. For example, i = 1 and j = 0 means the sample right next to the TL sample. The prediction offset of each sample OffsetSPO, i, j in the predictor is the sum of DiffTemp, x, y multiplied by LUTPRW, w, h, x, y, i, j

[0070] In some embodiments, if some of the template areas are not available, the DiffTemp, x, y term is set to zero. For example, if current block is on the left boundary of picture, DiffTemp, x, y is set to zero for left template area (i, e., x < 0 region) .

[0071] In some embodiments, (example 2) a sample-based prediction offset is derived to refine the predictor. In the first step, the template difference of above template Diffabove, i and left template Diffleft, j between the reconstructed templates (Recabove, i and Recleft, j) of current block and the reference template (Refabove, i and Refleft, j) of predictor are calculated: Diffabove, i=Recabove, i-Refabove, i Diffleft, j=Recleft, j-Refleft, j

[0072] The i and j are the related horizontal and vertical position to the top-left (TL) sample in the predictor. For example, i = 0 means the template sample above the TL sample and i = 1 means the template sample right next to the sample i = 0. The position-based weighting of each sample in the predictor came from a pretrained table LUTabove, w, h, j and LUTleft, w, h, i according to the block width w and height h. The i and j horizontal and vertical position to the top-left (TL) sample in the predictor. For example, j = 1 means the sample right below to the TL sample.

[0073] The prediction offset of each sample OffsetSPO, i, j in the predictor is the sum of Diffabove, i multiplied by LUTabove, w, h, j and Diffleft, j multiplied by LUTleft, w, h, i.

[0074] In some embodiments, the RecTemp, x, y , RefTemp, x, y are used to refine the prediction instead of their difference. For example, SPO can be derived using following equation:

[0075]

[0076] B. Prediction Refinement Using Template with Motion from Neighboring Block

[0077] In some embodiments, the SPO is derived from the neighboring reconstructed residual data (NRR) instead of DRR described in Section III. A. The first step will be replaced by calculating the template difference DiffTemp, x, y between the reconstructed template RecTemp, x, y of current block and the template NeiTemp, x, y derived from neighboring predictor with motion from neighboring block: DiffTemp, x, y=RecTemp, x, y-NeiTemp, x, y

[0078] The x and y are the related position to the top-left (TL) sample in the predictor. For example, x = 0 and y = -1 means the template sample above the TL sample. In some embodiments, the neighbouring reconstructed residual data (NRR) is from the residual of neighbouring block. DiffTemp, x, y=ResidualTemp, x, y

[0079] In some embodiments, the SPO is derived from the NRR and DRR together.

[0080] C. Different LUTs for Prediction Refinement Using Template

[0081] In some embodiments, the LUT in above Sections III. A and III. B can be different according to the QP value, block size, CU mode, motion vector of current CU and any other implicit / explicit rule. For example, an index is signaled to select the LUT in sequence, picture, slice, CTU, CU level. For another example, the classifier like DIMD can be applied on the template area or current prediction to choose the LUT.

[0082] In some embodiments, the LUT can be replaced with online trained parameters, the parameters can be signaled in sequence, picture, slice, CTU, CU level. In some embodiments, the parameters can be derived from the information at both encoder and decoder side, for example, the reconstruction data. In this case, the parameters are not signaled. IV. Prediction Modes with Prediction Refinement

[0083] A. Prediction Refinement on IntraTMP mode

[0084] In some embodiments, the prediction refinement technique like SPO can be apply on CU with IntraTMP mode whenever the top and / or left template area of a reference block is ready. The template area of the reference block can be derived by using the BV of current IntraTMP CU. FIG 7 shows example template areas of the current block and a reference block that is identified using BV of the current block. The figure illustrates a current block 720 in a current picture 700. The current block 720 has a block vector 705 that locates a reference block 730 in the current picture 700. An area 725 neighboring the current block 720 is used as a template area of the current block 720. An area 735 neighboring the reference block 730 is used as a template area of the reference block 730.

[0085] In some embodiments, if an LIC model is applied on the prediction, the LIC model is also applied on the template area of reference block for prediction refinement. In some embodiments, the template area used for prediction refinement follows the matching shape of the IntraTMP CU. For example, top-only template matching mode is used for current CU, the top-only template is used for prediction refinement. In some embodiments, a template area (of the current block or a reference block) for prediction refinement is used whenever it’s available.

[0086] In some embodiments, for IntraTMP with linear filter model, the prediction refinement is applied before applying the linear filter. The linear filter is applied on the refined predictor afterward. In some embodiments, the prediction refinement is applied after the linear filter. The linear filter model is also applied on template area of reference block for prediction refinement processing.

[0087] In some embodiments, the post processing (e.g., LIC model, linear filter) that is applied on the prediction of reference block is also applied on the template area of reference block.

[0088] In some embodiments, the template area can be generated along with reference block to reduce latency. For example, NxN reference block and 2 above and left template lines are required to generate the final prediction. (N+2) x (N+2) reference block can be generated at the same time to reduce latency.

[0089] In some embodiments, the weight used in prediction refinement technique like SPO is separated for IntraTMP mode. In some embodiments, the weight used in prediction refinement technique like SPO is share among different modes. For example, weights for inter prediction and weights for intraTMP can be shared.

[0090] B. Prediction Refinement on IBC mode

[0091] In some embodiments, the prediction refinement technique like SPO can be applied on CU with IBC mode whenever the top and / or left template area is ready. The template area of reference block can be derived by using the BV of current IBC CU, which is similar to IntraTMP mode. In some embodiments, if LIC model is applied on the prediction, the LIC model should also applied on the template area of reference block for prediction refinement. In some embodiments, if filtered IBC prediction mode is used, the filtering process is also be applied on the template area of the reference block.

[0092] In some embodiments, the post processing (e.g., LIC model) which is applied on the prediction of reference block should also applied on the template area of reference block.

[0093] In some embodiments, the template area may be generated along with reference block to reduce latency. For example, NxN reference block and 2 above and left template lines are required to generate the final prediction. (N+2) x (N+2) reference block may be generated at the same time to reduce latency.

[0094] If bi-prediction mode is used for IBC coded blocks, in some embodiments, the prediction refinement is applied on each of the predictor. In some embodiments, the prediction refinement is applied after the blended predictor is generated. In some embodiments, the prediction refinement is disabled for bi-prediction mode of IBC. Only uni-prediction IBC with single predictor can be applied.

[0095] In some embodiments, if IBC is combined with CIIP or GPM mode, the prediction refinement is applied on each of the predictors. For example, for GPM mode with P0 and P1 prediction, the P0 and P1 is refined separately. If reference and or current block template area is unavailable for each predictor, the prediction refinement is disabled. For example, for CIIP mode, since there’s no reference template area for intra prediction, the prediction is disable on the intra part, and only applied on the inter / IBC part. In some embodiments, the prediction refinement is applied on the final blended predictor if all the reference and current template area are available for each predictor. The reference template area is blended as the same way as blended predictor. After that, the prediction refinement can be applied using the final blended prediction and final blended templated area. In some embodiments, if IBC TM merge mode is used, the prediction refinement can be applied before TM refinement.

[0096] In some embodiments, if IBC TM merge mode is used, the prediction refinement can be applied after TM refinement. In some embodiments, the above-mentioned methods may also be applied on chroma direct block vector (DBV) mode.

[0097] In some of the above-mentioned embodiments, the templates may consider RRIBC type during the generation. For example, if horizontal flip type is used, the top N lines and right N lines of reference block are generated. If vertical flip type is used, the left N lines and bottom N lines of reference block are generated. In some embodiments, if RRIBC is enabled, the above-mentioned prediction refinement is disabled.

[0098] In some embodiments, the templates generated for above-mentioned prediction refinement shall be within IBC valid region (described by reference to FIG. 3 above) . For example, in some embodiments, if the to-be generated top templates are outside the IBC valid region, top templates will be treated as unavailable. Or if the to-be generated top templates are outside the IBC valid region, the corresponding templates within the IBC valid region will be generated. For another example, in some embodiments, if the to-be generated left templates are outside the IBC valid region, left templates will be treated as unavailable, or if the to-be generated left templates are outside the IBC valid region, the corresponding templates within the IBC valid region will be generated.

[0099] In some embodiments, the weight used in prediction refinement technique like SPO is separated for IBC mode. In some embodiments, the weight used in prediction refinement technique like SPO is shared among different modes. For example, weights for inter prediction and weights for IBC can be shared.

[0100] Any of the foregoing proposed methods can be implemented in encoders and / or decoders. For example, any of the proposed methods can be implemented in one module of a encoder and / or decoder. Alternatively, any of the proposed methods can be implemented as a circuit coupled to one module of the encoders and / or decoder, so as to provide the information needed by the module used in encoders and / or decoder. V. Example Video Encoder

[0101] FIG. 8 illustrates an example video encoder 800 that may implement template-based prediction refinement. As illustrated, the video encoder 800 receives input video signal from a video source 805 and encodes the signal into bitstream 895. The video encoder 800 has several components or modules for encoding the signal from the video source 805, at least including some components selected from a transform module 810, a quantization module 811, an inverse quantization module 814, an inverse transform module 815, an intra estimation module 824, an intra prediction module 825, a motion compensation module 830, a motion estimation module 835, an in-loop filter 845, a reconstructed picture buffer 850, a MV buffer 865, and a MV prediction module 875, and an entropy encoder 890. The motion compensation module 830 and the motion estimation module 835 are part of an inter-prediction module 840. The intra-prediction module 825 and the intra-estimation module 824 are part of a current picture prediction module 820, which uses current picture reconstructed samples as reference samples for prediction of the current block.

[0102] In some embodiments, the modules 810 –890 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device or electronic apparatus. In some embodiments, the modules 810 –890 are modules of hardware circuits implemented by one or more integrated circuits (ICs) of an electronic apparatus. Though the modules 810 –890 are illustrated as being separate modules, some of the modules can be combined into a single module.

[0103] The video source 805 provides a raw video signal that presents pixel data of each video frame without compression. A subtractor 808 computes the difference between the raw video pixel data of the video source 805 and the predicted pixel data 813 from the motion compensation module 830 or intra-prediction module 825 as prediction residual 809. The transform module 810 converts the difference (or the residual pixel data or residual signal 808) into transform coefficients (e.g., by performing Discrete Cosine Transform, or DCT) . The quantization module 811 quantizes the transform coefficients into quantized data (or quantized coefficients) 812, which is encoded into the bitstream 895 by the entropy encoder 890.

[0104] The inverse quantization module 814 de-quantizes the quantized data (or quantized coefficients) 812 to obtain transform coefficients 818, and the inverse transform module 815 performs inverse transform on the transform coefficients 818 to produce reconstructed residual 819. The reconstructed residual 819 is added with the predicted pixel data 813 to produce reconstructed pixel data 817. In some embodiments, the reconstructed pixel data 817 is temporarily stored in a line buffer 827 (or intra prediction buffer) for intra-picture prediction and spatial MV prediction. The reconstructed pixels are filtered by the in-loop filter 845 and stored in the reconstructed picture buffer 850. In some embodiments, the reconstructed picture buffer 850 is a storage external to the video encoder 800. In some embodiments, the reconstructed picture buffer 850 is a storage internal to the video encoder 800.

[0105] The intra estimation module 824 derives intra-prediction data (e.g., intra prediction modes) based on the reconstructed pixel data 817 (stored in the line buffer 827) . The intra-prediction data is provided to the entropy encoder 890 to be encoded into bitstream 895. The intra-prediction data is also used by the intra-prediction module 825 to produce the predicted pixel data 813.

[0106] The motion estimation module 835 performs inter-prediction by producing MVs to reference pixel data of previously decoded frames stored in the reconstructed picture buffer 850. These MVs are provided to the motion compensation module 830 to produce predicted pixel data.

[0107] Instead of encoding the complete actual MVs in the bitstream, the video encoder 800 uses MV prediction to generate predicted MVs, and the difference between the MVs used for motion compensation and the predicted MVs is encoded as residual motion data and stored in the bitstream 895.

[0108] The MV prediction module 875 generates the predicted MVs based on reference MVs that were generated for encoding previously video frames, i.e., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 875 retrieves reference MVs from previous video frames from the MV buffer 865. The video encoder 800 stores the MVs generated for the current video frame in the MV buffer 865 as reference MVs for generating predicted MVs.

[0109] The MV prediction module 875 uses the reference MVs to create the predicted MVs. The predicted MVs can be computed by spatial MV prediction or temporal MV prediction. The difference between the predicted MVs and the motion compensation MVs (MC MVs) of the current frame (residual motion data) are encoded into the bitstream 895 by the entropy encoder 890.

[0110] The entropy encoder 890 encodes various parameters and data into the bitstream 895 by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding. The entropy encoder 890 encodes various header elements, flags, along with the quantized transform coefficients 812, and the residual motion data as syntax elements into the bitstream 895. The bitstream 895 is in turn stored in a storage device or transmitted to a decoder over a communications medium such as a network.

[0111] The in-loop filter 845 performs filtering or smoothing operations on the reconstructed pixel data 817 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 845 include deblock filter (DBF) , sample adaptive offset (SAO) , and / or adaptive loop filter (ALF) . In some embodiments, luma mapping chroma scaling (LMCS) is performed before the loop filters.

[0112] FIG. 9 illustrates portions of the video encoder 800 that implement prediction refinement using template areas identified by block vectors. As illustrated, the current picture prediction module 820 has access to reconstructed pixel data stored in the reconstructed picture buffer 850 and the line buffer 827, and uses their content to perform prediction and generate an initial, unrefined predictor 925, from which the target prediction sample is derived.

[0113] The current picture prediction module 820 includes an IBC prediction module 910 and an IntraTMP prediction module 920, in addition to the intra prediction module 825 and the intra estimation module 824. The IntraTMP prediction module 920 performs IntraTMP process by searching current picture data 902 from the reconstructed picture buffer 850 to identify one or more reference blocks having reference template that matches the template region neighboring the current block. The identified reference blocks are used to generate the unrefined predictor 925. The IBC prediction module 910 may generates or receives a block vector 915 and uses the block vector to locate a reference block from the reconstructed picture buffer 850 and uses the reference block to generate the unrefined predictor 925. In some embodiments, the block vector 915 used by the IBC prediction module 910 is provided by the IntraTMP prediction module 920. The block vector 915 may be signaled in the bitstream 895 for IBC prediction though not for IntraTMP prediction.

[0114] A refinement generator 940 generates a prediction refinement 945. The prediction refinement may include sample-based prediction offset (SPO) that is generated by applying position-related weight (PRW) 935 to template data 905 of a reference template neighboring the reference block. The refinement generator 940 may use the block vector 915 provided by the current picture prediction module 920 to locate the reference block and retrieve its neighboring reference template from the reconstructed picture buffer 850 as the template data 905. A look-up-table (LUT) 930 is used to provide a set of PRW 935 to the refinement generator 940. In some embodiments, a same set of PRW for refining predictors generated by different coding tools. In some embodiments, different sets of PRW are used for refining predictors generated by different coding tools.

[0115] In some embodiments, the refinement generator 940 receives indicator of whether the current picture is coded with samples horizontally or vertically flipped. The refinement generator 940 may use N lines above the reference block and N lines right of the reference block as the reference template when the current picture is coded with samples horizontally flipped, and use N lines below the reference block and N lines left of the reference block are used as the reference template when the current picture is coded with samples vertically flipped. The entropy encoder 890 may signal syntax elements to indicate whether the current picture is coded with samples horizontally or vertically flipped or not flipped at all.

[0116] A predictor modifier 950 applies (by e.g., adding) the generated prediction refinement 945 to the unrefined predictor 925 to generate a refined predictor (prediction comprising the refined prediction sample) 955. The refined predictor 955 is then used as the predicted pixel data 813. In some embodiments, the predictor modifier 950 applies a linear filter model 960 to the initial unrefined predictor 925 before applying the prediction refinement is applied to the initial predictor to generate the refined predictor 955. In some embodiments, the linear filter model is applied to the refined predictor before the refined predictor is used for encoding or decoding the current block.

[0117] FIG. 10 conceptually illustrates a process 1000 that encode a pixel block with prediction refinement using template areas identified by block vectors. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the encoder 800 performs the process 1000 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the encoder 800 performs the process 1000.

[0118] The encoder receives (at block 1010) data to be encoded as a current block of pixels in a current picture. The encoder determines (at block 1020) a template area of the current block, wherein the template area of the current block comprises a set of template samples. The encoder uses (at block 1030) a block vector associated with the current block to locate a reference block in the current picture.

[0119] The encoder determines (at block 1040) one or more sets of Position-Related Weights (PRWs) for the set of template samples. The encoder derives (at block 1050) a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by the one or more sets of respective PRWs. The template samples are derived from a reconstructed template and a target template. Each weight of each set of the one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block. The target prediction sample is generated by using the located reference block.

[0120] The target template may be a reference template neighboring the located reference block. The target prediction sample may be a sample within an initial predictor generated by intra block copy (IBC) and the block vector is signaled in a bitstream. The reference template may be from an area of the current picture that is valid for IBC mode. The target prediction sample may be a sample within an initial predictor generated by intra template matching prediction (IntraTMP) and the block vector is derived by searching a reconstructed portion of the current picture. The target template used to derive the SPO may be identically shaped as a template neighboring the current block used for deriving the block vector.

[0121] In some embodiments, a same set of weights is applicable to generate the SPO for when the block vector is signaled in a bitstream and when the block vector is derived by searching a reconstructed portion of the current picture. In some embodiments, different sets of weights are used to generate the SPO for when different coding tools are used to generate the target prediction sample.

[0122] The encoder applies (at block 1060) the SPO to the target prediction sample to generate a refined prediction sample. In some embodiments, a linear filter model is applied to an initial predictor, from which the target prediction sample is derived, before the SPO is applied to the target prediction sample. In some embodiments, the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped

[0123] In some embodiments, N lines above the reference block and N lines right of the reference block are identified as the target template used to generate the SPO when the current picture is coded with samples horizontally flipped. In some embodiments, N lines below the reference block and N lines left of the reference block are identified as the reference template used to generate the SPO when the current picture is coded with samples vertically flipped. In some embodiments, one or more syntax element is used to indicate whether the current picture is coded with samples horizontally flipped, vertically flipped type, or not flipped. In some embodiments, the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped.

[0124] The encoder encodes (at block 1070) the current block by using prediction comprising the refined prediction sample. In some embodiments, a linear filter model is applied to the prediction comprising the refined prediction sample before being used for encoding or decoding the current block. VI. Example Video Decoder

[0125] In some embodiments, an encoder may signal (or generate) one or more syntax element in a bitstream, such that a decoder may parse said one or more syntax element from the bitstream.

[0126] FIG. 11 illustrates an example video decoder 1100 that may implement template-based prediction refinement. As illustrated, the video decoder 1100 is an image-decoding or video-decoding circuit that receives a bitstream 1195 and decodes the content of the bitstream into pixel data of video frames for display. The video decoder 1100 has several components or modules for decoding the bitstream 1195, including some components selected from an inverse quantization module 1114, an inverse transform module 1115, an intra-prediction module 1125, a motion compensation module 1130, an in-loop filter 1145, a decoded picture buffer 1150, a MV buffer 1165, a MV prediction module 1175, and a parser 1190. The motion compensation module 1130 is part of an inter-prediction module 1140. The intra-prediction module 1125 is part of a current picture prediction module 1120, which uses current picture reconstructed samples as reference samples for prediction of the current block.

[0127] In some embodiments, the modules 1114 –1190 are modules of software instructions being executed by one or more processing units (e.g., a processor) of a computing device. In some embodiments, the modules 1114 –1190 are modules of hardware circuits implemented by one or more ICs of an electronic apparatus. Though the modules 1114 –1190 are illustrated as being separate modules, some of the modules can be combined into a single module.

[0128] The parser 1190 (or entropy decoder) receives the bitstream 1195 and performs initial parsing according to the syntax defined by a video-coding or image-coding standard. The parsed syntax element includes various header elements, flags, as well as quantized data (or quantized coefficients) 1112. The parser 1190 parses out the various syntax elements by using entropy-coding techniques such as context-adaptive binary arithmetic coding (CABAC) or Huffman encoding.

[0129] The inverse quantization module 1114 de-quantizes the quantized data (or quantized coefficients) 1112 to obtain transform coefficients, and the inverse transform module 1115 performs inverse transform on the transform coefficients 1118 to produce reconstructed residual signal 1119. The reconstructed residual signal 1119 is added with predicted pixel data 1113 from the intra-prediction module 1125 or the motion compensation module 1130 to produce decoded pixel data 1117. The decoded pixels data are filtered by the in-loop filter 1145 and stored in the decoded picture buffer 1150. In some embodiments, the decoded picture buffer 1150 is a storage external to the video decoder 1100. In some embodiments, the decoded picture buffer 1150 is a storage internal to the video decoder 1100.

[0130] The intra-prediction module 1125 receives intra-prediction data from bitstream 1195 and according to which, produces the predicted pixel data 1113 from the decoded pixel data 1117 stored in the decoded picture buffer 1150. In some embodiments, the decoded pixel data 1117 is also stored in a line buffer 1127 (or intra prediction buffer) for intra-picture prediction and spatial MV prediction.

[0131] In some embodiments, the content of the decoded picture buffer 1150 is used for display. A display device 1105 either retrieves the content of the decoded picture buffer 1150 for display directly, or retrieves the content of the decoded picture buffer to a display buffer. In some embodiments, the display device receives pixel values from the decoded picture buffer 1150 through a pixel transport.

[0132] The motion compensation module 1130 produces predicted pixel data 1113 from the decoded pixel data 1117 stored in the decoded picture buffer 1150 according to motion compensation MVs (MC MVs) . These motion compensation MVs are decoded by adding the residual motion data received from the bitstream 1195 with predicted MVs received from the MV prediction module 1175.

[0133] The MV prediction module 1175 generates the predicted MVs based on reference MVs that were generated for decoding previous video frames, e.g., the motion compensation MVs that were used to perform motion compensation. The MV prediction module 1175 retrieves the reference MVs of previous video frames from the MV buffer 1165. The video decoder 1100 stores the motion compensation MVs generated for decoding the current video frame in the MV buffer 1165 as reference MVs for producing predicted MVs.

[0134] The in-loop filter 1145 performs filtering or smoothing operations on the decoded pixel data 1117 to reduce the artifacts of coding, particularly at boundaries of pixel blocks. In some embodiments, the filtering or smoothing operations performed by the in-loop filter 1145 include deblock filter (DBF) , sample adaptive offset (SAO) , and / or adaptive loop filter (ALF) . In some embodiments, luma mapping chroma scaling (LMCS) is performed before the loop filters.

[0135] FIG. 12 illustrates portions of the video decoder 1100 that implement prediction refinement using template areas identified by block vectors. As illustrated, the current picture prediction module 1120 has access to reconstructed pixel data stored in the decoded picture buffer 1150 and the line buffer 1127, and uses their content to perform prediction and generate an initial, unrefined predictor 1225, from which the target prediction sample is derived.

[0136] The current picture prediction module 1120 includes an IBC prediction module 1210 and an IntraTMP prediction module 1220, in addition to the intra prediction module 1125 and the intra estimation module 1124. The IntraTMP prediction module 1220 performs IntraTMP process by searching current picture data 1202 from the decoded picture buffer 1150 to identify one or more reference blocks having reference template that matches the template region neighboring the current block. The identified reference blocks are used to generate the unrefined predictor 1225. The IBC prediction module 1210 may receive a block vector 1215 from the entropy decoder 1190 and uses the block vector to locate a reference block from the decoded picture buffer 1150 and uses the reference block to generate the unrefined predictor 1225. The block vector 1215 may be signaled in the bitstream 1195 for IBC prediction.

[0137] A refinement generator 1240 generates a prediction refinement 1245. The prediction refinement may include sample-based prediction offset (SPO) that is generated by applying position-related weight (PRW) 1235 to template data 1205 of a reference template neighboring the reference block. The refinement generator 1240 may use the block vector 1215 provided by the current picture prediction module 1220 to locate the reference block and retrieve its neighboring reference template from the decoded picture buffer 1150 as the template data 1205. A look-up-table (LUT) 1230 is used to provide a set of PRW 1235 to the refinement generator 1240. In some embodiments, a same set of PRW for refining predictors generated by different coding tools. In some embodiments, different sets of PRW are used for refining predictors generated by different coding tools.

[0138] In some embodiments, the refinement generator 1240 receives indicator of whether the current picture is coded with samples horizontally or vertically flipped. The refinement generator 1240 may use N lines above the reference block and N lines right of the reference block as the reference template when the current picture is coded with samples horizontally flipped, and use N lines below the reference block and N lines left of the reference block are used as the reference template when the current picture is coded with samples vertically flipped. The entropy decoder 1190 may receive syntax elements from the bitstream 1195 to indicate whether the current picture is coded with samples horizontally or vertically flipped or not flipped at all.

[0139] A predictor modifier 1250 applies (by e.g., adding) the generated prediction refinement 1245 to the unrefined predictor 1225 to generate a refined predictor (prediction comprising the refined prediction samples) 1255. The refined predictor 1250 is then used as the predicted pixel data 1113.

[0140] In some embodiments, the predictor modifier 1250 applies a linear filter model 1260 to the initial unrefined predictor 1225 before applying the prediction refinement is applied to the initial predictor to generate the refined predictor 1255. In some embodiments, the linear filter model is applied to the refined predictor before the refined predictor is used for decoding or decoding the current block.

[0141] FIG. 13 conceptually illustrates a process 1300 that decode a pixel block with prediction refinement using template areas identified by block vectors. In some embodiments, one or more processing units (e.g., a processor) of a computing device implementing the decoder 1100 performs the process 1300 by executing instructions stored in a computer readable medium. In some embodiments, an electronic apparatus implementing the decoder 1100 performs the process 1300.

[0142] The decoder receives (at block 1310) data to be decoded as a current block of pixels in a current picture. The decoder determines (at block 1320) a template area of the current block, wherein the template area of the current block comprises a set of template samples. The decoder uses (at block 1330) a block vector associated with the current block to locate a reference block in the current picture.

[0143] The decoder determines (at block 1340) one or more sets of Position-Related Weights (PRWs) for the set of template samples. The decoder derives (at block 1350) a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by the one or more sets of respective PRWs. The template samples are derived from a reconstructed template and a target template. Each weight of each set of the one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block. The target prediction sample is generated by using the located reference block.

[0144] The target template may be a reference template neighboring the located reference block. The target prediction sample may be a sample within an initial predictor generated by intra block copy (IBC) and the block vector is signaled in a bitstream. The reference template may be from an area of the current picture that is valid for IBC mode. The target prediction sample may be a sample within an initial predictor generated by intra template matching prediction (IntraTMP) and the block vector is derived by searching a reconstructed portion of the current picture. The target template used to derive the SPO may be identically shaped as a template neighboring the current block used for deriving the block vector.

[0145] In some embodiments, a same set of weights is applicable to generate the SPO for when the block vector is signaled in a bitstream and when the block vector is derived by searching a reconstructed portion of the current picture. In some embodiments, different sets of weights are used to generate the SPO for when different coding tools are used to generate the target prediction sample.

[0146] The decoder applies (at block 1360) the SPO to the target prediction sample to generate a refined prediction sample. In some embodiments, a linear filter model is applied to an initial predictor, from which the target prediction sample is derived, before the SPO is applied to the target prediction sample. In some embodiments, the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped.

[0147] In some embodiments, N lines above the reference block and N lines right of the reference block are identified as the target template used to generate the SPO when the current picture is coded with samples horizontally flipped. In some embodiments, N lines below the reference block and N lines left of the reference block are identified as the reference template used to generate the SPO when the current picture is coded with samples vertically flipped. In some embodiments, one or more syntax element is used to indicate whether the current picture is coded with samples horizontally flipped, vertically flipped type, or not flipped. In some embodiments, the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped.

[0148] The decoder reconstructs (at block 1370) the current block by using prediction comprising the refined prediction sample. The video coder may output the reconstructed current block as part of the reconstructed current picture for display. In some embodiments, a linear filter model is applied to the prediction comprising the refined prediction sample before being used for decoding or decoding the current block. VII. Example Electronic System

[0149] Many of the above-described features and applications are implemented as software processes that are specified as a set of instructions recorded on a computer readable storage medium (also referred to as computer readable medium) . When these instructions are executed by one or more computational or processing unit (s) (e.g., one or more processors, cores of processors, or other processing units) , they cause the processing unit (s) to perform the actions indicated in the instructions. Examples of computer readable media include, but are not limited to, CD-ROMs, flash drives, random-access memory (RAM) chips, hard drives, erasable programmable read only memories (EPROMs) , electrically erasable programmable read-only memories (EEPROMs) , etc. The computer readable media does not include carrier waves and electronic signals passing wirelessly or over wired connections.

[0150] In this specification, the term “software” is meant to include firmware residing in read-only memory or applications stored in magnetic storage which can be read into memory for processing by a processor. Also, in some embodiments, multiple software inventions can be implemented as sub-parts of a larger program while remaining distinct software inventions. In some embodiments, multiple software inventions can also be implemented as separate programs. Finally, any combination of separate programs that together implement a software invention described here is within the scope of the present disclosure. In some embodiments, the software programs, when installed to operate on one or more electronic systems, define one or more specific machine implementations that execute and perform the operations of the software programs.

[0151] FIG. 14 conceptually illustrates an electronic system 1400 with which some embodiments of the present disclosure are implemented. The electronic system 1400 may be a computer (e.g., a desktop computer, personal computer, tablet computer, etc. ) , phone, PDA, or any other sort of electronic device. Such an electronic system includes various types of computer readable media and interfaces for various other types of computer readable media. Electronic system 1400 includes a bus 1405, processing unit (s) 1410, a graphics-processing unit (GPU) 1415, a system memory 1420, a network 1425, a read-only memory 1430, a permanent storage device 1435, input devices 1440, and output devices 1445.

[0152] The bus 1405 collectively represents all system, peripheral, and chipset buses that communicatively connect the numerous internal devices of the electronic system 1400. For instance, the bus 1405 communicatively connects the processing unit (s) 1410 with the GPU 1415, the read-only memory 1430, the system memory 1420, and the permanent storage device 1435.

[0153] From these various memory units, the processing unit (s) 1410 retrieves instructions to execute and data to process in order to execute the processes of the present disclosure. The processing unit (s) may be a single processor or a multi-core processor in different embodiments. Some instructions are passed to and executed by the GPU 1415. The GPU 1415 can offload various computations or complement the image processing provided by the processing unit (s) 1410.

[0154] The read-only-memory (ROM) 1430 stores static data and instructions that are used by the processing unit (s) 1410 and other modules of the electronic system. The permanent storage device 1435, on the other hand, is a read-and-write memory device. This device is a non-volatile memory unit that stores instructions and data even when the electronic system 1400 is off. Some embodiments of the present disclosure use a mass-storage device (such as a magnetic or optical disk and its corresponding disk drive) as the permanent storage device 1435.

[0155] Other embodiments use a removable storage device (such as a floppy disk, flash memory device, etc., and its corresponding disk drive) as the permanent storage device. Like the permanent storage device 1435, the system memory 1420 is a read-and-write memory device. However, unlike storage device 1435, the system memory 1420 is a volatile read-and-write memory, such a random access memory. The system memory 1420 stores some of the instructions and data that the processor uses at runtime. In some embodiments, processes in accordance with the present disclosure are stored in the system memory 1420, the permanent storage device 1435, and / or the read-only memory 1430. For example, the various memory units include instructions for processing multimedia clips in accordance with some embodiments. From these various memory units, the processing unit (s) 1410 retrieves instructions to execute and data to process in order to execute the processes of some embodiments.

[0156] The bus 1405 also connects to the input and output devices 1440 and 1445. The input devices 1440 enable the user to communicate information and select commands to the electronic system. The input devices 1440 include alphanumeric keyboards and pointing devices (also called “cursor control devices” ) , cameras (e.g., webcams) , microphones or similar devices for receiving voice commands, etc. The output devices 1445 display images generated by the electronic system or otherwise output data. The output devices 1445 include printers and display devices, such as cathode ray tubes (CRT) or liquid crystal displays (LCD) , as well as speakers or similar audio output devices. Some embodiments include devices such as a touchscreen that function as both input and output devices.

[0157] Finally, as shown in FIG. 14, bus 1405 also couples electronic system 1400 to a network 1425 through a network adapter (not shown) . In this manner, the computer can be a part of a network of computers (such as a local area network ( “LAN” ) , a wide area network ( “WAN” ) , or an Intranet, or a network of networks, such as the Internet. Any or all components of electronic system 1400 may be used in conjunction with the present disclosure.

[0158] Some embodiments include electronic components, such as microprocessors, storage and memory that store computer program instructions in a machine-readable or computer-readable medium (alternatively referred to as computer-readable storage media, machine-readable media, or machine-readable storage media) . Some examples of such computer-readable media include RAM, ROM, read-only compact discs (CD-ROM) , recordable compact discs (CD-R) , rewritable compact discs (CD-RW) , read-only digital versatile discs (e.g., DVD-ROM, dual-layer DVD-ROM) , a variety of recordable / rewritable DVDs (e.g., DVD-RAM, DVD-RW, DVD+RW, etc. ) , flash memory (e.g., SD cards, mini-SD cards, micro-SD cards, etc. ) , magnetic and / or solid state hard drives, read-only and recordable discs, ultra-density optical discs, any other optical or magnetic media, and floppy disks. The computer-readable media may store a computer program that is executable by at least one processing unit and includes sets of instructions for performing various operations. Examples of computer programs or computer code include machine code, such as is produced by a compiler, and files including higher-level code that are executed by a computer, an electronic component, or a microprocessor using an interpreter.

[0159] While the above discussion primarily refers to microprocessor or multi-core processors that execute software, many of the above-described features and applications are performed by one or more integrated circuits, such as application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) . In some embodiments, such integrated circuits execute instructions that are stored on the circuit itself. In addition, some embodiments execute software stored in programmable logic devices (PLDs) , ROM, or RAM devices.

[0160] As used in this specification and any claims of this application, the terms “computer” , “server” , “processor” , and “memory” all refer to electronic or other technological devices. These terms exclude people or groups of people. For the purposes of the specification, the terms display or displaying means displaying on an electronic device. As used in this specification and any claims of this application, the terms “computer readable medium, ” “computer readable media, ” and “machine readable medium” are entirely restricted to tangible, physical objects that store information in a form that is readable by a computer. These terms exclude any wireless signals, wired download signals, and any other ephemeral signals.

[0161] While the present disclosure has been described with reference to numerous specific details, one of ordinary skill in the art will recognize that the present disclosure can be embodied in other specific forms without departing from the spirit of the present disclosure. In addition, a number of the figures (including FIG. 10 and FIG. 13) conceptually illustrate processes. The specific operations of these processes may not be performed in the exact order shown and described. The specific operations may not be performed in one continuous series of operations, and different specific operations may be performed in different embodiments. Furthermore, the process could be implemented using several sub-processes, or as part of a larger macro process. Thus, one of ordinary skill in the art would understand that the present disclosure is not to be limited by the foregoing illustrative details, but rather is to be defined by the appended claims. Additional Notes

[0162] The herein-described subject matter sometimes illustrates different components contained within, or connected with, different other components. It is to be understood that such depicted architectures are merely examples, and that in fact many other architectures can be implemented which achieve the same functionality. In a conceptual sense, any arrangement of components to achieve the same functionality is effectively "associated" such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as "associated with" each other such that the desired functionality is achieved, irrespective of architectures or intermediate components. Likewise, any two components so associated can also be viewed as being "operably connected" , or "operably coupled" , to each other to achieve the desired functionality, and any two components capable of being so associated can also be viewed as being "operably couplable" , to each other to achieve the desired functionality. Specific examples of operably couplable include but are not limited to physically mateable and / or physically interacting components and / or wirelessly interactable and / or wirelessly interacting components and / or logically interacting and / or logically interactable components.

[0163] Further, with respect to the use of substantially any plural and / or singular terms herein, those having skill in the art can translate from the plural to the singular and / or from the singular to the plural as is appropriate to the context and / or application. The various singular / plural permutations may be expressly set forth herein for sake of clarity.

[0164] Moreover, it will be understood by those skilled in the art that, in general, terms used herein, and especially in the appended claims, e.g., bodies of the appended claims, are generally intended as “open” terms, e.g., the term “including” should be interpreted as “including but not limited to, ” the term “having” should be interpreted as “having at least, ” the term “includes” should be interpreted as “includes but is not limited to, ” etc. It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases "at least one" and "one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles "a" or "an" limits any particular claim containing such introduced claim recitation to implementations containing only one such recitation, even when the same claim includes the introductory phrases "one or more" or "at least one" and indefinite articles such as "a" or "an, " e.g., “a” and / or “an” should be interpreted to mean “at least one” or “one or more; ” the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number, e.g., the bare recitation of "two recitations, " without other modifiers, means at least two recitations, or two or more recitations. Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and / or A, B, and C together, etc. In those instances where a convention analogous to “at least one of A, B, or C, etc. ” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention, e.g., “a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and / or A, B, and C together, etc. It will be further understood by those within the art that virtually any disjunctive word and / or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B. ”

[0165] From the foregoing, it will be appreciated that various implementations of the present disclosure have been described herein for purposes of illustration, and that various modifications may be made without departing from the scope and spirit of the present disclosure. Accordingly, the various implementations disclosed herein are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Claims

1.A video coding method comprising:receiving data to be encoded or decoded as a current block of pixels of a current picture of a video;determining a template area of the current block, wherein the template area of the current block comprises a set of template samples;using a block vector associated with the current block to locate a reference block in the current picture;determining one or more sets of Position-Related Weights (PRWs) for the set of template samples;deriving a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by said one or more sets of respective PRWs, wherein the derived template samples are derived from a reconstructed template and a target template, and wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block, wherein the target prediction sample is generated by using the located reference block;applying the SPO to the target prediction sample to generate a refined prediction sample; andencoding or decoding the current block by using prediction comprising the refined prediction sample.2.The video coding method of claim 1, wherein the target template is a reference template neighboring the located reference block.3.The video coding method of claim 1, wherein the target prediction sample is a sample within an initial predictor generated by intra block copy (IBC) and the block vector is signaled in a bitstream.4.The video coding method of claim 2, wherein the reference template is from an area of the current picture that is valid for IBC mode.5.The video coding method of claim 1, wherein the target prediction sample is a sample within an initial predictor generated by intra template matching prediction (IntraTMP) and the block vector is derived by searching a reconstructed portion of the current picture.6.The video coding method of claim 5, wherein the target template used to derive the SPO is identically shaped as a template neighboring the current block used for deriving the block vector.7.The video coding method of claim 1, wherein a same set of weights is applicable to generate the SPO for when the block vector is signaled in a bitstream and when the block vector is derived by searching a reconstructed portion of the current picture.8.The video coding method of claim 1, wherein different sets of weights are used to generate the SPO for when different coding tools are used to generate the target prediction sample.9.The video coding method of claim 1, wherein a linear filter model is applied to an initial predictor, from which the target prediction sample is derived, before the SPO is applied to the target prediction sample.10.The video coding method of claim 1, wherein a linear filter model is applied to the prediction comprising the refined prediction sample before being used for encoding or decoding the current block.11.The video coding method of claim 1, wherein:N lines above the reference block and N lines right of the reference block are identified as the target template used to generate the SPO when the current picture is coded with samples horizontally flipped,N lines below the reference block and N lines left of the reference block are identified as the reference template used to generate the SPO when the current picture is coded with samples vertically flipped.12.The video coding method of claim 11, wherein one or more syntax element is used to indicate whether the current picture is coded with samples horizontally flipped, vertically flipped type, or not flipped.13.The video coding method of claim 11, wherein the SPO is not applied when the current picture is coded with samples vertically flipped or horizontally flipped.14.An electronic apparatus comprising:a video coder circuit configured to perform operations comprising:receiving data to be encoded or decoded as a current block of pixels of a current picture of a video;determining a template area of the current block, wherein the template area of the current block comprises a set of template samples;using a block vector associated with the current block to locate a reference block in the current picture;determining one or more sets of Position-Related Weights (PRWs) for the set of template samples;deriving a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by said one or more sets of respective PRWs, wherein the derived template samples are derived from a reconstructed template and a target template, and wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block, wherein the target prediction sample is generated by using the located reference block;applying the SPO to the target prediction sample to generate a refined prediction sample; andencoding or decoding the current block by using prediction comprising the refined prediction sample.15.A video decoding method comprising:receiving data to be decoded as a current block of pixels of a current picture of a video;determining a template area of the current block, wherein the template area of the current block comprises a set of template samples;using a block vector associated with the current block to locate a reference block in the current picture;determining one or more sets of Position-Related Weights (PRWs) for the set of template samples;deriving a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by said one or more sets of respective PRWs, wherein the derived template samples are derived from a reconstructed template and a target template, and wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block, wherein the target prediction sample is generated by using the located reference block;applying the SPO to the target prediction sample to generate a refined prediction sample; andreconstructing the current block by using prediction comprising the refined prediction sample.16.A video encoding method comprising:receiving data to be encoded as a current block of pixels of a current picture of a video;determining a template area of the current block, wherein the template area of the current block comprises a set of template samples;using a block vector associated with the current block to locate a reference block in the current picture;determining one or more sets of Position-Related Weights (PRWs) for the set of template samples;deriving a Sample-Based Prediction Offset (SPO) for a target prediction sample of the current block from a sum of derived template samples weighted by said one or more sets of respective PRWs, wherein the derived template samples are derived from a reconstructed template and a target template, and wherein each weight of each set of said one or more sets of PRWs is dependent on a first position of a target template sample with respect to the template area and a second position of the target prediction sample with respect to the current block, wherein the target prediction sample is generated by using the located reference block;applying the SPO to the target prediction sample to generate a refined prediction sample; andencoding the current block by using prediction comprising the refined prediction sample.