Video decoding apparatus, video coding apparatus

By limiting template lines and exploration areas in the ITMP method, the computational complexity and redundancies are reduced, leading to faster encoding and decoding times without affecting coding efficiency or accuracy.

WO2026133706A1PCT designated stage Publication Date: 2026-06-25SHARP KK

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
SHARP KK
Filing Date
2025-10-15
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

The Intra Template Matching Prediction (ITMP) method in video coding requires high computational complexity and has redundancies among sub-modes, prolonging encoding and decoding times without compromising coding efficiency and accuracy.

Method used

Reduce the computational complexity of ITMP by limiting the number of template lines and exploration areas during the encoding and decoding process, thereby reducing redundancies among various sub-modes.

Benefits of technology

This approach shortens the encoding and decoding time without compromising coding efficiency and accuracy, thus optimizing the video coding process.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure JP2025036295_25062026_PF_FP_ABST
    Figure JP2025036295_25062026_PF_FP_ABST
Patent Text Reader

Abstract

The aim of this invention is to reduce the complexity of the ITMP mode in order to decrease the time required for encoding and decoding.
Need to check novelty before this filing date? Find Prior Art

Description

VIDEO DECODING APPARATUS, VIDEO CODING APPARATUS

[0001] The embodiments of the present invention relate to, a video decoding apparatus, a video coding apparatus.f

[0002] A video coding apparatus which generates coded data by coding a video, and a video decoding apparatus which generates decoded images by decoding the coded data are used for efficient transmission or recording of videos.

[0003] For example, specific video coding schemes include H.264 / AVC, High-Efficiency Video Coding (HEVC), and Versatile Video Coding (VVC) schemes, and the like.

[0004] In such a video coding scheme, images (pictures) constituting a video are managed in a hierarchical structure including slices derived by splitting an image, coding tree units (CTUs) derived by splitting a slice, units of coding (coding units; which is referred to as CUs) derived by splitting a coding tree unit, and transform units (TUs) derived by splitting a coding unit, and are coded / decoded for each CU.

[0005] In such a video coding scheme, usually, a prediction image is generated based on a local decoded image that is derived by coding / decoding an input image (a source image), and prediction error components (which may be referred to also as “difference images” or “residual images”) derived by subtracting the prediction image from the input image are coded. Generation methods of prediction images include an inter-picture prediction (an inter-prediction) and an intra-picture prediction (intra prediction).

[0006] In a recent video coding standardization meeting, an intra prediction mode called Intra Template Matching Prediction (ITMP), proposed by NPL1 and NPL2, was adopted. This method utilizes template matching to generate a Block Vector (BV) list, which contains multiple BVs with low template matching costs. The BV list is generated on both the encoder and decoder sides. On the encoder side, decisions are made using SATD and RDO costs to determine whether to apply the ITMP method and which BV from the list to use as the predictive BV. On the decoder side, the corresponding operations are performed based on the information derived from the bitstream.

[0007] NPL 1: K. Naser, T. Poirier, F. Galpin, A. Robert, "EE2-1.14: IntraTMP adaptation for camera-captured content", JVET-AB0130, 2022-10-14.NPL 2: Fan Wang, etc, "EE2-1.20i / j: Combination of IntraTMP tests", JVET-AD0086, 2023-04-15.

[0008] The ITMP method requires high computational complexity, and redundancies exist among some of the sub-modes it includes, significantly prolonging the encoding and decoding time.

[0009] The purpose of this invention is to reduce the computational complexity of the ITMP method by decreasing the number of template lines, limiting the exploration area, and reducing redundancies among various sub-modes. This is achieved without compromising coding efficiency and accuracy, thereby shortening the time required for the encoding and decoding process.

[0010] According to an aspect of the present invention, the encoding and decoding time is reduced.

[0011] FIG. 1 is a schematic diagram illustrating a configuration of an image transmission system according to the present embodiment.FIG. 2 is a diagram showing the hierarchical structure of the coded stream data.FIG. 3 is a schematic diagram showing the type of intra-prediction mode (mode number).FIG. 4 is a schematic diagram of the video decoding apparatus.FIG. 5 shows the structure of the intra prediction image generation unit.FIG. 6 is a diagram showing the details of the ITMP Prediction Unit.FIG. 7 is a diagram showing the details of the Sub-mode Paramater Derivation Unit.FIG. 8 is a diagram showing the position of template area used in ITMP.FIG. 9 is a diagram showing the position of search area used in ITMP.FIG. 10 is a block diagram showing the structure of a video coding apparatus.

[0012] First Embodiment Hereinafter, embodiments of the present disclosure is described with reference to the drawings.

[0013] FIG. 1 is a schematic diagram illustrating a configuration of an image transmission system 1 according to the present embodiment.

[0014] The image transmission system 1 is a system in which a coding stream derived by coding a coding target image is transmitted, the transmitted coding stream is decoded, and an image is displayed. The image transmission system 1 includes a video coding apparatus (image coding apparatus) 11, a network 21, a video decoding apparatus (image decoding apparatus) 31, and a video display apparatus (image display apparatus) 41.

[0015] An image T is input to the video coding apparatus 11.

[0016] The network 21 transmits a coding stream Te generated by the video coding apparatus 11 to the video decoding apparatus 31. The network 21 is the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or a combination thereof. The network 21 is not necessarily limited to a bidirectional communication network, and may be a unidirectional communication network configured to transmit broadcast waves of digital terrestrial television broadcasting, satellite broadcasting or the like. Furthermore, the network 21 may be substituted by a storage medium in which the coding stream Te is recorded, such as a Digital Versatile Disc (DVD: trademark) or a Blu-ray Disc (BD: trademark).

[0017] The video decoding apparatus 31 decodes each of the coding streams Te transmitted from the network 21 and generates one or multiple decoded images Td which are decoded.

[0018] The video display apparatus 41 displays all or part of the one or multiple decoded images Td generated by the video decoding apparatus 31. For example, the video display apparatus 41 includes a display device such as a liquid crystal display and an organic Electro-Luminescence (EL) display. Forms of the display include a stationary type, a mobile type, an HMD type, and the like. In addition, in a case that the video decoding apparatus 31 has a high processing capability, an image having high image quality is displayed, and in a case that the apparatus only has a lower processing capability, an image which does not require high processing capability and display capability is displayed.

[0019] Operator Operators and notations used in the present specification is described below.

[0020] Structure of Coding Stream Te Prior to the detailed description of the video coding apparatus 11 and the video decoding apparatus 31 according to the present embodiment, a data structure of the coding stream Te generated by the video coding apparatus 11 and decoded by the video decoding apparatus 31 is described.

[0021] FIG. 2 is a diagram illustrating a hierarchical structure of data of the coding stream Te. The coding stream Te includes a sequence and multiple pictures constituting the sequence illustratively. (a) to (f) of FIG. 2 are diagrams illustrating a coded video sequence defining a sequence SEQ, a coded picture prescribing a picture PICT, a coding slice prescribing a slice S, a coding slice data prescribing slice data, a coding tree unit included in the coding slice data, and a coding unit (CU) included in each coding tree unit, respectively.

[0022] Coded Video Sequence In the coded video sequence (CVS, coding stream), a set of data referred to by the video decoding apparatus 31 to decode the coded sequence sequences to be processed is defined. As illustrated in FIG. 2, the CVS includes a Video Parameter Set (VPS), a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), a picture unit (PU), and Supplemental Enhancement Information (SEI).

[0023] access unit (AU): A set of PUs that belong to different layers and contain coded pictures associated with the same time for output from the decoded picture buffer (DPB.

[0024] coded video sequence (CVS): A sequence of AUs that consists, in decoding order, of a CVSS AU, followed by zero or more AUs that are not CVSS AUs, including all subsequent AUs up to but not including any subsequent AU that is a CVSS AU.

[0025] coded video sequence start (CVSS) AU: An IRAP AU or GDR AU for which the coded picture in each PU is a CLVSS picture.

[0026] coded layer video sequence (CLVS): A sequence of PUs with the same value of nuh_layer_id that consists, in decoding order, of a CLVSS PU, followed by zero or more PUs that are not CLVSS PUs, including all subsequent PUs up to but not including any subsequent PU that is a CLVSS PU.

[0027] In the video parameter set VPS, in a video including multiple layers, a set of coding parameters common to multiple videos and a set of coding parameters associated with the multiple layers and an individual layer included in the video are defined.

[0028] In the sequence parameter set SPS, a set of coding parameters referred to by the video decoding apparatus 31 to decode a target sequence is defined. For example, a width and a height of a picture are defined. Note that multiple SPSs may exist. In that case, any of multiple SPSs is selected from the PPS.

[0029] In the picture parameter set PPS, a set of coding parameters referred to by the video decoding apparatus 31 to decode each picture in a target sequence is defined. For example, a reference value (pic_init_qp_minus26) of a quantization step size used for decoding of a picture and a flag (weighted_pred_flag) indicatingan application of a weighted prediction are included. Note that multiple PPSs may exist. In that case, any of multiple PPSs is selected from each picture in a target sequence.

[0030] Coded Picture In the coded picture, a set of data referred to by the video decoding apparatus 31 to decode the picture PICT to be processed is defined. As illustrated in FIG. 2, the picture PICT includes a slice 0 to a slice NS-1 (NS is the total number of slices included in the picture PICT).

[0031] Note that in a case that it is not necessary to distinguish each of the slice 0 to the slice NS-1 below, subscripts of reference signs may be omitted. In addition, the same applies to other data with subscripts included in the coding stream Te which is described below.

[0032] Coding Slice In the coding slice, a set of data referred to by the video decoding apparatus 31 to decode the slice S to be processed is defined. As illustrated in FIG. 2, the slice includes a slice header and a slice data.

[0033] The slice header includes a coding parameter group referred to by the video decoding apparatus 31 to determine a decoding method for a target slice. Slice type specification information (slice_type) indicating a slice type is one example of a coding parameter included in the slice header.

[0034] Examples of slice types that may be specified by the slice type specification information include (1) I slice using only an intra prediction in coding, (2) P slice using a unidirectional prediction or an intra prediction in coding, and (3) B slice using a unidirectional prediction, a bidirectional prediction, or an intra prediction in coding, and the like. Note that the inter prediction is not limited to a uni-prediction and a bi-prediction, and the prediction image may be generated by using a larger number of reference pictures. Hereinafter, in a case that a slice is referred to as the P or B slice, the slice indicates a slice that includes a block in which the inter prediction may be used.

[0035] Note that, the slice header may include a reference to the picture parameter set PPS (pic_parameter_set_id).

[0036] Coding Slice Data In the coding slice data, a set of data referred to by the video decoding apparatus 31 to decode the slice data to be processed is defined. The slice data include CTUs as illustrated in FIG. 2. The CTU is a block of a fixed size (for example, 64 x 64) constituting a slice.

[0037] Coding Tree Unit In FIG. 2, a set of data referred to by the video decoding apparatus 31 to decode the CTU to be processed is defined. The CTU is split into coding units CUs, each of which is a basic unit of coding processing, by a recursive Quad Tree split (QT split), Binary Tree split (BT split), or Ternary Tree split (TT split). The BT split and the TT split are collectively referred to as a Multi Tree split (MT split). Nodes of a tree structure derived by recursive quad tree splits are referred to as Coding Nodes. Intermediate nodes of a quad tree, a binary tree, and a ternary tree are coding nodes, and the CTU itself is also defined as the highest coding node.

[0038] Coding Unit As illustrated in FIG. 2, a set of data referred to by the video decoding apparatus 31 to decode the coding unit to be processed is defined. Specifically, the CU includes a CU header CUH, a prediction parameter, a transform parameter, a quantization transform coefficient, and the like. In the CU header, a prediction mode and the like are defined.

[0039] There are cases that the prediction processing is performed in units of CU or performed in units of sub-CU derived by further splitting the CU. In a case that the sizes of the CU and the sub-CU are equal to each other, the number of sub-CUs in the CU is one. In a case that the CU is larger in size than the sub-CU, the CU is split into sub-CUs. For example, in a case that the CU has a size of 8 x 8, and the sub-CU has a size of 4 x 4, the CU is split into four sub-CUs which include two horizontal splits and two vertical splits.

[0040] There are two types of predictions (prediction modes), which are an intra prediction and an inter prediction. The intra prediction refers to a prediction in an identical picture, and the inter prediction refers to prediction processing performed between different pictures (for example, between pictures of different display times).

[0041] Transform and quantization processing is performed in units of CU, but the quantization transform coefficient may be subjected to entropy coding in units of subblock such as 4 x 4.

[0042] Prediction parameter A prediction image is derived by a prediction parameter accompanying a block. The prediction parameter includes prediction parameters of the intra prediction and the inter prediction.

[0043] The prediction parameter of the intra prediction is described below. The intra prediction parameter includes a luma intra prediction mode IntraPredModeY and a chroma intra prediction mode IntraPredModeC. FIG. 3 is a schematic diagram indicating types (mode numbers) of the intra prediction mode. As illustrated in the diagram, for example, there are 67 angular prediction modes (0 to 66) and 28 wide angle intra prediction modes, WAIP (-14 to -1 and 67 to 80), in which WAIP is used depending on the aspect ratio of CU. For example, a planar prediction (0), a DC prediction (1), and Angular predictions (2 to 66) are present. Furthermore, for chroma, CCLM (Cross Component Linear Model) prediction mode (81 to 83),MMLM (Multi Mode Linear Model) prediction mode, and CCCM (Cross Component Convolution Model) prediction mode may be added.

[0044] Configuration of video decoding apparatus A configuration of the video decoding apparatus 31 (FIG. 4) according to the present embodiment is described.

[0045] The video decoding apparatus 31 includes an entropy decoding unit 301, a parameter decoding unit (prediction image decoding apparatus) 302, a loop filter 305, a reference picture memory 306, a prediction parameter memory 307, a prediction image generation unit 308, an inverse quantization and inverse transform processing unit 311, an addition unit 312, and a prediction parameter derivation unit 320. Note that a configuration in which the loop filter 305 is not included in the video decoding apparatus 31 is also used in accordance with the video coding apparatus 11 described later.

[0046] The parameter decoding unit 302 further includes a header decoding unit 3020, a CT information decoding unit 3021, and a CU decoding unit 3022 (prediction mode decoding unit), and the CU decoding unit 3022 further includes a TU decoding unit 3024. These may be collectively referred to as a decoding module. The header decoding unit 3020 decodes, from coded data, parameter set information such as the VPS, the SPS, and the PPS, and a slice header (slice information). The CT information decoding unit 3021 decodes a CT from coded data. The CU decoding unit 3022 decodes a CU from coded data. In a case that a TU includes a prediction error, the TU decoding unit 3024 decodes QP update information (quantization correction value) and a quantization prediction error (residual_coding) from coded data.

[0047] Furthermore, an example in which a CTU and a CU are used as units of processing is described below, but the processing is not limited to this example, and processing in units of sub-CU may be performed. Alternatively, by replacing the CTU and the CU by a block and replacing the sub-CU by a subblock, and processing in units of blocks or subblocks may be performed.

[0048] The entropy decoding unit 301 performs entropy decoding on the coding stream Te input from the outside and separates and decodes individual codes (syntax elements). The separated codes include prediction information to generate a prediction image, a prediction error to generate a difference image, and the like. Entropy coding has a variable length coding method for syntax elements according to the context (probability model) adaptively selected according to the type of syntax elements and the surrounding conditions, and a variable length coding method for syntax elements using a predetermined table or formula.

[0049] The parameter decoding unit 302 notifies the entropy decoding unit 301 of which syntax elements need be decoded. The entropy decoding unit 301 outputs the syntax element to the prediction parameter derivation unit 320.

[0050] Configuration of Prediction Parameter Derivation Unit 320 The prediction parameter derivation unit 320 may derive the prediction parameters based on the output of the paremater decoding unit 302 and the prediction parematers which saved in the prediction parameter memory 307. The derived prediction parameters is output into the prediction image generation unit 308 and also is saved in the prediction parameter memory 307. The prediction parameter derivetion unit may derive different prediction mode for the Luma and Chroma prediction.

[0051] The loop filter 305 is a filter provided in the coding loop, and is a filter that removes block distortion and ringing distortion and improves image quality. The loop filter 305 applies a filter such as a deblocking filter (DF), a Sample Adaptive Offset (SAO), and an Adaptive Loop Filter (ALF) on a decoded image of a CU generated by the addition unit 312.

[0052] The reference picture memory 306 stores the decoded image of the CU generated by the addition unit 312 in a predetermined position for each target picture and target CU.

[0053] The prediction parameter memory 307 stores prediction parameters in a predetermined position for each CTU or CU to be decoded. Specifically, the prediction parameter memory 307 stores a parameter derived by the prediction parameter derivation unit 320, a prediction mode predMode separated by the entropy decoding unit 301, and the like.

[0054] The prediction image generation unit 308 receives input of the prediction parameter derived by the prediction parameter deviation unit 320, and the like. In addition, the prediction image generation unit 308 reads a reference picture from the reference picture memory 306. The prediction image generation unit 308 generates a prediction image of a block or a subblock by using the prediction parameter and the read reference picture (reference picture block) in the prediction mode indicated by the prediction mode predMode. Here, the reference picture block refers to a set of pixels (referred to as a block because they are normally rectangular) on a reference picture and is a region that is referred to to generate a prediction image.

[0055] Prediction Image Generation Unit 308 In a case that the prediction mode predMode indicates an intra prediction mode, the intra prediction image generation unit 310 performs an intra prediction by using an intra prediction parameter (luma intra prediction mode IntraPredModeY and / or chroma intra prediction mode IntraPredModeC) input from the prediction parameter derivation unit 320 and reference pixels read from the reference picture memory 306. In a case that the prediction mode predMode indicates an inter prediction mode, the inter prediction image generation unit performs an inter prediction by using an inter prediction parameter input from the prediction parameter derivation unit 320 and reference pixels read from the reference picture memory 306.

[0056] Specifically, the prediction image generation unit 308 reads, from the reference picture memory 306, a neighbouring block in a predetermined range from a target block on a target picture. The predetermined range is neighbouring blocks on the left, the top left, the top, and the top right of the target block, and the region referred to is different depending on the intra prediction mode.

[0057] The prediction image generation unit 308 generates a prediction image of the target block with reference to the read decoded pixel values and the prediction mode indicated by predMode, IntraPredModeY and / or IntraPredModeC. The prediction image generation unit 308 outputs the generated prediction image of the block to the addition unit 312.

[0058] The generation of the prediction image based on the intra prediction mode is described below. In the Planar prediction, the DC prediction, and the Angular prediction, a decoded peripheral region adjacent to (proximate to) the prediction target block is configured as a reference region R. Then, the pixels on the reference region R are extrapolated in a specific direction to generate the prediction image. For example, the reference region R may be configured as an L-shaped region including the left and top (or further, top left, top right, bottom left) of the prediction target block.

[0059] Intra prediction image generation unit 310 A configuration of the intra prediction image generation unit 310 is described using FIG. 5. The intra prediction image generation unit 310 includes a reference sample filter unit 3103 (second reference image configuration unit), an intra prediction unit 3104), and a prediction image corrector 3105 (prediction image corrector, filter switching unit, weight coefficient changing unit).

[0060] Based on each reference pixel (unfiltered reference image) on the reference region R, a filtered reference image generated by applying a reference pixel filter (first filter), and the intra prediction mode, the intra prediction unit 3104 generates a prediction image of the target block, and outputs the generated image to the prediction image corrector 3105. The prediction image corrector 3105 corrects the prediction image in accordance with the intra prediction mode, and outputs a corrected prediction image.

[0061] Hereinafter, the units included in the intra prediction image generation unit 310 is described.

[0062] Reference sample filter unit 3103 The reference sample filter unit 3103 applies the reference pixel filter (first filter) to the unfiltered reference image to derive a filtered reference image s[x][y] at each position (x, y) on the reference region R, in accordance with the intra prediction mode. Specifically, a low pass filter is applied to the unfiltered reference image at each position (x, y) and its surroundings, and a filtered reference image is derived. Note that the low pass filter need not necessarily be applied in all the intra prediction modes, and the low pass filter may be applied in some intra prediction modes. Note that the filter applied to an unfiltered reference image on a reference region R in the reference sample filter unit 3103 is referred to as the “reference pixel filter (first filter)”, whereas a filter that corrects the prediction image in the prediction image corrector 3105 described below is referred to as a “boundary filter (second filter)”.

[0063] Configuration of intra prediction unit 3104 The intra prediction unit 3104 generates, based on the intra prediction mode, the unfiltered reference image, and the filtered reference pixel value, a prediction image (prediction pixel value, uncorrected prediction image) of the prediction target block, and outputs a generated image to the prediction image corrector 3105. The intra prediction unit 3104 comprises a Planar prediction unit 31041, a DC prediction unit 31042, an Angular prediction unit 31043, an LM prediction unit 31044, an MIP (Matrix-based Intra Prediction) unit 31045, a DIMD (Decoder-side Intra Mode Derivation) unit 31046, and an EIP (extrapolation intra) prediction unit 31048. Additionally, the intra prediction unit 3104 may include a ITMP (Intra Template Maching Prediction) unit 31047, as shown in FIG. 5. The intra prediction unit 3104 selects a specific predictor in accordance with the intra prediction mode, and inputs an unfiltered reference image and a filtered reference image thereto. The relationship between the intra prediction mode and the corresponding predictor is as follows.

[0064] Planar prediction The Planar prediction unit 31041 generates a prediction image q[x][y] by linearly adding multiple filtered reference images s[x][y] in accordance with the distance between the prediction pixel position and the reference pixel position, and outputs the generated image to the prediction image corrector 3105.

[0065] DC prediction The DC prediction unit 31042 derives a DC prediction value corresponding to the average value of the filtered reference image s[x][y], and outputs a prediction image q[x][y], which takes the DC prediction value as a pixel value.

[0066] Angular prediction The Angular prediction unit 31043 generates a prediction image q[x][y] using the filtered reference image s[x][y] in a prediction direction (reference direction) indicated by the intra prediction mode, and outputs the generated image to the prediction image corrector 3105.

[0067] LM prediction The LM prediction unit 31044 predicts the pixel value of the chroma based on the pixel value of luma. More specifically, a linear model is used to generate a prediction chroma image (Cb, Cr) based on the decoded luma image. As an example of LM prediction, there is a CCLM (cross component linear model prediction) prediction. CCLM prediction is a prediction method using a linear model to predict chroma from luma to same block.

[0068] MIP prediction The MIP prediction unit 31045 generates a prediction image q[x][y] by the product sum operation on the reference sample s[x][y] and the weight matrix derived from the neighboring region, and outputs the prediction image q[x][y] to the prediction image corrector 3105.

[0069] DIMD prediction The DIMD prediction unit 31046 employs a prediction method that generates prediction images using decoder-side derived (not signaled) intra prediction modes. During the encoding and decoding process, the DIMD method utilizes neighborhood information to derive multiple suitable intra prediction modes for the target block, along with the corresponding weights for each mode. Subsequently, the DIMD prediction unit 31046 utilizes these intra prediction modes to generate prediction images.

[0070] EIP prediction (Extrapolation intra prediction, Extrapolation filter-based Intra Prediction mode) The EIP prediction unit 31048 uses an extrapolation filter for predicting current block. The EIP method may take the 15 pixels in the top-left corner (4x4 squire pixels minus bottom-right pixel) as input of the extrapolation filter and derives 1 pixel in the bottom-right corner as output of the extrapolation filter. It uses a function model (an EIP parameter) to predict the target pixel of the bottom-right corner based on the reference pixels located around the target pixel, e.g. in the top-left corner. Additionally, the EIP method uses Merge mode, inheriting the EIP paramters from adjacent and non-adjacent blocks that use the EIP method, and applies them to the current block. The EIP method uses two mode lists, namely curModelList and mergeModelList.

[0071] ITMP prediction (Intra Template Maching Prediction) ITMP takes multiple lines of pixels adjacent to the current block that have already been reconstructed as templates and performs a search within six predefined rectangular regions neiboring the current block. It stores multiple Block Vector (BV)s with smaller template matching (TM) costs into a list derived from the ITMP method. Each BV in this list is then traversed to derive the parameters required for the various sub-modes included in ITMP. The BV list and its associated parameters constitute the output of the ITMP method.

[0072] Configuration of prediction image corrector 3105 The prediction image corrector 3105 corrects the prediction image output from the intra prediction unit 3104 in accordance with the intra prediction mode. Specifically, the prediction image corrector 3105 derives, by performing weighted addition (weighted-averaging) on the unfiltered reference image and the prediction image for each pixel of the prediction image, in accordance with the distance between the reference region R and the target prediction pixel, the prediction image (corrected prediction image) Pred in which the prediction image is modified. Note that in some intra prediction modes (for example, Planar prediction, DC prediction, or the like), the prediction image corrector 3105 may not correct the prediction image, and the output of the intra prediction unit 3104 may be used as the prediction image.

[0073] (ITMP prediction) The parameter decoding unit 302 decodes a ITMP flag (tmpFlag) from the coded data for a target block, in which ITMP flag specifies that if intra template matching is used in the target block. The targer block may be either a Coding Unit (CU), Transform Unit (TU), sub-block, or similar. If the tmpFlag for the target block is false, the parameter decoding unit 302 further decodes the syntax elements related to intra prediction modes. Otherwise, if the tmpFlag is true, the unit may skip decoding these syntax elements from the coded data. When tmpFlag is set to true, the intra prediction parameters for the current block is derived using the ITMP method to generate prediction image.

[0074] (ITMP prediction unit 31047) The ITMP Prediction Unit 31047 includes an Input Data Derivation Unit 4701, a Search Range Derivation Unit 4711, and a Prediction Image Derivation Unit 4712. The Prediction Image Derivation Unit 4712 comprises a Candidate List Derivation Unit 4713, a Sub-mode Parameter Derivation Unit 4714, and a Prediction Image Generation Unit 4715. The Sub-mode Parameter Derivation Unit 4714 includes an ITMP-LIC (Local Illumination Compensation) Parameter Derivation Unit 47141, an ITMP-FLM (FiLter Mode) Parameter Derivation Unit 47142, an ITMP-Fusion Parameter Derivation Unit 47143, and an ITMP-SubPel Parameter Derivation Unit 47144. FIG.6 illustrate the construction of ITMP Prediction Unit.

[0075] (Input Data Derivation Unit 4701) The Input Data Derivation Unit 4701 is used to derive the pixels of the template regions employed by the ITMP method. These template regions are derived from the reconstructed image, recSamples[x][y]. The template regions may include five areas: topLeftArea, topArea, topRightArea, leftArea, and leftBottomArea while less than five area, e.g. topArea, leftArea may be used. The width of topArea and topRightArea is equal to the width of the current block (curWidth), and their height is rH. The width of leftArea and leftBottomArea is rW, and their height is equal to the height of the current block (curHeight). The width and height of topLeftArea are rW and rH, respectively. FIG. 8 shows the template areas. Given the top-left corner coordinates of the current block as (x0, y0), the coordinates of the template regions are as follows: (Template Region Setting) The number of template regions is set to numOfTmpl (numOfTmpl > 0 and numOfTmpl <= 5). The value of numOfTmpl may be fixed or adaptive (dynamic). When numOfTmpl is set as a adaptive value, it may be determined based on the size or shape of the current block.

[0076] (Example) (Adaptive numOfTmpl) (numOfTmpl based on curBlock size) ITMP Prediction Unit 31047 may use different template area depending on the current block size (curBlockSize). ITMP Prediction Unit 31047 may select less template area if the current block is smaller than a predetermined size and select more template area if the current block is larger than the predetemined size. The curBlock size is derived as the product of the current block’s width and height (curWidth * curHeight). Its possible values are [16, 32, 64, 128, 256, 512, 1024, 4096]. In other example, the curBlock size is derived as sum of width and height (curWidth+curHeight). In other example, the curBlock size is derived as sum of log 2 of width and log 2 of height (log2(curWidth)+ log2(curHeight)) (Embodiment 1) A threshold named theshBlockSize is pre-defined, and its value may be one of [16, 32, 64, 128, 256, 512, 1024, 4096].

[0077] In one example (Ns, Nl is {2, 3}), if curWidth * curHeight <= theshBlockSize (small blocks), two areas, e.g. [topArea, leftArea] are used as the template area. Otherwise (curWidth * curHeight > theshBlockSize, larg blocks), three areas, e.g. [topArea, leftArea, topLeftArea] are used as the template area.

[0078] In other example (Ns, Nl is {3, 5}), if curWidth * curHeight <= theshBlockSize, three areas, e.g. [topArea, leftArea, topLeftArea] is used as the template area. Otherwise (curWidth * curHeight > theshBlockSize), five areas, e.g. [topArea, leftArea, topLeftArea, topRightArea, leftBottomArea] are used as the template area.

[0079] It is noted that template area is not available due to out of picture or out of slice / tile etc, the template is excluded.

[0080] Different number of template areas may be used. Ns for small blocks and Nl for large blocks.

[0081] Ns, Nl may be {2, 4}, {2, 5}, {3, 4}, {4, 5}instead of {2, 3}, {3, 5}.

[0082] (Embodiment 2) Different number of selection may be used. e.g ITMP Prediction Unit 31047 may classify the current block into there type of blocks (small, normal, large) depending on the current block size and use Ns, Nn, Nl for each type block respectively. Ns < Nn < Nl. Ns is two, [topArea, leftArea] may be used as template. Nn is three, [topArea, leftArea, topLeftArea] may be used as template. Nl is five, [topArea, leftArea, topLeftArea, topRightArea, leftBottomArea]may be used as template.

[0083] Two thresholds, theshBlockSize1 and theshBlockSize2, are pre-defined, and their values may be chosen from [16, 32, 64, 128, 256, 512, 1024, 4096], with the condition that theshBlockSize1 < theshBlockSize2. When curWidth * curHeight <= theshBlockSize1, [topArea, leftArea] is used as the template area. When curWidth * curHeight <= theshBlockSize2, [topArea, leftArea, topLeftArea] is used as the template area. When curWidth * curHeight > theshBlockSize2, [topArea, leftArea, topLeftArea, topRightArea, leftBottomArea] is used as the template area. The specific selection of template areas can have multiple possibilities.

[0084] (numOfTmpl based on curBlock shape) ITMP Prediction Unit 31047 may use different template area depending on the current block shape (curBlockShape). In one example, ITMP Prediction Unit 31047 may derive curBlockShap from three possible forms: If curWidth equals curHeight, curBlockShape is 0. If curWidth is greater than curHeight, curBlockShape is 1. If curWidth is less than curHeight, curBlockShape is 2.

[0085] (Embodiment 3) When curBlockShape is 0, ITMP Prediction Unit 31047 may derive numOfTmpl as one of 2, 3, or 5. In this case, [topArea, leftArea], [topArea, leftArea, topLeftArea], or [topArea, leftArea, topLeftArea, topRightArea, leftBottomArea] are used as templates, respectively.

[0086] When curBlockShape is 1, ITMP Prediction Unit 31047 may derive numOfTmpl as one of 1, 2, 3, or 4. In this case, [topArea], [topArea, topLeftArea], [topArea, topRightArea], [topArea, topLeftArea, topRightArea], or [topArea, topLeftArea, topRightArea, leftArea] are used as templates, respectively.

[0087] When curBlockShape is 2, ITMP Prediction Unit 31047 may derive numOfTmpl as one of 1, 2, 3, or 4. In this case, [leftArea], [leftArea, topLeftArea], [leftArea, leftBottomArea], [leftArea, topLeftArea, leftBottomArea], or [leftArea, topLeftArea, leftBottomArea, topArea] are used as templates, respectively.

[0088] It is worth noting that the selection of template regions offers multiple possibilities.

[0089] (Template Line Setting) The number of template lines is set as numOfTmplLineW (where numOfTmplLineW > 0 and numOfTmplLineW <= curWidth) and numOfTmplLineH (where numOfTmplLineH > 0 and numOfTmplLineH <= curHeight). The values of numOfTmplLineW and numOfTmplLineH may be fixed or adaptive. When numOfTmplLineW and numOfTmplLineH are set as adaptive values, they are determined based on the size or shape of the current block.

[0090] (Fixed numOfTmplLineW and numOfTmplLineH) When numOfTmplLineW and numOfTmplLineH are set as fixed values within the closed interval of 1 to curWidth or curHeight, such as 1, 2, or 4: If numOfTmplLineW and numOfTmplLineH are set to 1, then rW = rH = 1. If set to 2, then rW = rH = 2. If set to 4, then rW = rH = 4. numOfTmplLineW and numOfTmplLineH can also take different values, such as numOfTmplLineW = 1 and numOfTmplLineH = 2. In such cases, rW and rH will also take different values: rW = numOfTmplLineW, and rH = numOfTmplLineH. It is worth noting that the possible values for numOfTmplLineW and numOfTmplLineH can vary, allowing for multiple configurations.

[0091] (Adaptive numOfTmplLineW and numOfTmplLineH) (Based on curBlock Size) The size of curBlock is defined as the product of its width and height (curWidth * curHeight). Possible values include [16, 32, 64, 128, 256, 512, 1024, 4096].

[0092] (Embodiment 4) A threshold named threshBlockSize is pre-defined, with its value being one of [16, 32, 64, 128, 256, 512, 1024, 4096].

[0093] When curWidth * curHeight is not longer than threshBlockSize, numOfTmplLineW and numOfTmplLineH are set to n1, where n1 can take values such as 1, 2, 4, and so on.

[0094] When curWidth * curHeight > threshBlockSize, numOfTmplLineW and numOfTmplLineH are set to n2, where n2 can also take values such as 1, 2, 4, and so on.

[0095] Note that n2 may be either greater than or smaller than n1, and numOfTmplLineW and numOfTmplLineH may take different values.

[0096] (Based on curBlock Shape) (Search Range Derivation Unit 4711) The Search Range Derivation Unit 4711 is responsible for deriving the exploration regions required by ITMP. ITMP has six exploration regions, named R1, R2, R3, R4, R5, and R6, as shown in FIG.9. The reconstructed pixels within the exploration regions will be used to calculate TM costs and generate the candidate BV list. The position and size of each exploration region are determined by a range called the searchSize, a threshold called miniSize,the size (curCtuSize) and top-left coordinates (xc, yc) of the current CTU (curCtu), and the height (curHeight), width (curWidth), and top-left coordinates (xb, yb) of the current block (curBlock).The detailed positions of each region are as follows: (Fixed Threshold Value) ITMP Prediction Unit 31047 (Search Range Derivation Unit 4711) may select the search area range, searchSize and the threshold miniSize as fixed values. The searchSize is set to a fixed positive integer, such as 1, 2, 3, 4, 5, etc. The miniSize is set to a fixed positive integer, such as 16, 32, 64, 128, etc.

[0097] (Adaptive Threshold Based on Current Block Size) Search Range Derivation Unit 4711 may select the search area range, searchSize and the threshold miniSize as adaptive values based on the size of the current block.

[0098] (Embodiment 6) The searchSize is set to a positive integer that is associated with the size of the current block, following the principle that the larger the block size, the greater the value of searchSize. When the block size is less than or equal to 64, searchSize is set to 4; otherwise, it is set to 5. It can also be set to adaptive values in three or more stages. For instance. When the block size is less than or equal to 64, searchSize is set to 3. When the block size is less than or equal to 256, searchSize is set to 4; otherwise, it is set to 5. Note that there are multiple possibilities for the value of searchSize.

[0099] Similarly, the miniSize is set to a positive integer that is associated with the size of the current block, following the principle that the larger the block size, the greater the value of miniSize. When the block size is less than or equal to 64, miniSize is set to 32; otherwise, it is set to 64. It can also be set to adaptive values in three or more stages. For instance. When the block size is less than or equal to 64, miniSize is set to 32. When the block size is less than or equal to 256, miniSize is set to 48; otherwise, it is set to 64. Note that there are multiple possibilities for the value of miniSize.

[0100] (Prediction Image Derivation Unit 4712) (Candidate List Derivation Unit 4713) The Candidate List Derivation Unit 4713 generates a BV candidate list bvCandiList, with the length of numBV, based on the template cost. The size and position of the template are shown in FIG.8, represented as tmplArea[][]. The Candidate List Derivation Unit 4713 consists of two exploration processes: Coarse Exploration and Fine Exploration.

[0101] (Coarse Exploration) Coarse exploration derives the corresponding BV based on the pixels in the exploration region and computes the template prediction image, tempPredImage[][]. Using tempPredImage[][]e and the TMcost, all BVs derived from the exploration region are sorted. The top numBV2 BVs with the smallest TMcost, along with other related information, are stored in a list called firstBVCandList[]. To process the six exploration regions, a stride of firstStride is applied, and the BV for each traversed pixel is computed. The following lists are defined: (Fine Exploration) Fine exploration derives the corresponding BV based on the pixels in the firstBVCandList[]and computes the template prediction image, tempPredImage[][]. Using tempPredImage[][] and the TMcost, all BVs derived are sorted. The top numBV BVs with the smallest template cost, TMcost, along with other related information, are stored in a list called bvCandList[]. The following lists are derived: (Setting) (Fixed Value) The length of firstBvCandList[] and bvCandiList[] (numBV2 and numBV) are set to a fixed positive integer, such as 5, 10, 15, 20, 30, and so on.

[0102] (Adaptive Value Based on Current Block Size) The numBV2 and numBV are adaptiveally determined based on the size of the current block. For example, the larger the current block, the larger the value.

[0103] (Embodiment 7) When the size of the current block is less than or equal to 64, the numBV2 and numBV are set to 30 and 19; otherwise, they are set to 24 and 12. Alternatively, the numBV2 and numBV may be set in multiple stages. For instance. When the current block size is less than or equal to 32, numBV2 and numBV are set to 20 and 10. When the current block size is less than or equal to 128, numBV2 and numBV are set to 24 and 12. Otherwise, numBV2 and numBV are set to 30 and 19. It is worth noting that the numBV2 and numBV configuration can vary depending on specific requirements.

[0104] (Stride Setting) (Fixed Stride) The stride (firstStride) is set to a fixed positive integer, such as 1, 2, 3, 4, 5, and so on.

[0105] (Adaptive Stride Based on Current Block Size) The stride is adaptiveally determined based on the size of the current block. For example, the larger the current block, the larger the stride.

[0106] (Embodiment 8) When the size of the current block is less than or equal to 64, the stride is set to 2; otherwise, it is set to 4. Alternatively, the stride may be set in multiple stages. For instance. When the current block size is less than or equal to 32, the stride is set to 2. When the current block size is less than or equal to 128, the stride is set to 4. Otherwise, the stride is set to 6. It is worth noting that the stride configuration can vary depending on specific requirements.

[0107] (Sub-mode Paramater Derivation Unit 4714) Sub-mode Parameter Derivation Unit 4714 derives the parameters required for the ITMP sub-mode. The Sub-mode Parameter Derivation Unit 4714 includes an ITMP-LIC Parameter Derivation Unit 47141, an ITMP-FLM Parameter Derivation Unit 47142, an ITMP-Fusion Parameter Derivation Unit 47143, and an ITMP-SubPel Parameter Derivation Unit 47144. FIG.7 illustrate the construction of the Sub-mode Paramater Derivation Unit 4714.

[0108] (ITMP-FLM Parameter Derivation Unit 47142) ITMP-FLM Parameter Derivation Unit 47142 derives the parameters required for the ITMP Filter Mode (ITMP-FLM). In the ITMP-FLM mode, predicted sample (predLumaVal) is derived using a linear equation of each BV in the bvCandList, denoted as tempBv. The equation is expressed as: predLumaVal = c0*C + c1*N + c2*S + c3*E + c4*W + c5*B, where the parameters, c0 to c5, are derived using the template region.

[0109] Inputs: The pixel (C) of the predicted template region derived using tempBv, as well as the four neighboring pixels of C-North (N), South (S), East (E), and West (W). B is defined as half the power of 2 raised to the current bit depth. For a bit depth of 10, B equals 512.

[0110] e.g. C = recSamples[xP][yP], N = recSamples[xP][yP-1] , S = recSamples[xP][yP+1] , E = recSamples[xP-1][yP] , W = recSamples[xP+1][yP]. (xP, yP) = (x + tempBV[0], y + tempMV[1]), x = 0..curWidth-1, y= 0..curHeight-1 Output: The corresponding pixel value (predLumaVal) in the template region.

[0111] For each BV in the bvCandList, six parameters (c0, c1, c2, c3, c4, c5) are derived and stored sequentially in itmpFlmParam[].

[0112] (ITMP-Fusion Parameter Derivation Unit 47143) ITMP-Fusion Parameter Derivation Unit 47143 derives the parameters required for the ITMP Fusion Mode. The ITMP Fusion mode includes two methods for deriving weights. The first method uses TMcost to calculate the weights, while the second method utilizes the Wiener-filter approach. For each weighting method, a total of numItmpFusionCandi candidates are derived, and the maximum number of fusion modes is set to numItmpFusionMode. numItmpFusionCandi * numItmpFusionMode is set less than or equal to the length of bvCandList, i.e.numBV. To store the BV and the corresponding TMcost for each fusion mode, a three-dimensional list fusionInformation[numItmpFusionMode][numItmpFusionCandi][2] is used to hold this bvCandList information. The fusionInformation is derived as follows: numItmpFusionCandi is the predefined maximum number of fusion candidates used for each fusion mode. For each fusion mode, the actual number of fusion candidates needs to be determined, which may be decided by a threshold named costThresh. The actual number of fusion candidates for each fusion mode is stored in the list realFusionModeNum[numItmpFusionMode]. It may be derived by the following method: The value of costThresh may be set as a multiple of the TMcost of the first candidate (the one with the smallest TMcost) in each fusion mode. For example, if the multiplier is 1.5, then costThresh is derived as fusionInformation[mode][0][2] * 1.5. multiplier may be differenct values, and may be calculated by shift operation, costThresh = (3*TMCost)>>1, (5*TMCost)>>2, (7*TMCost)>>2, … For each fusion mode, two types of weights (TMcost type and Wiener-filter type) are derived separately. The final predicted image is derived by the weighted average of the prediction images derived from the BVs included in each fusion mode. The parameters used for each fusion mode (weights and realFusionModeNum[]) are stored in itmpFusParam[]. itmpFusParam[mode].realNum = realFusionModeNum[mode], itmpFusParam[mode].Idx = numItmpFusionCandi * mode.

[0113] (ITMP-SubPel Parameter Derivation Unit 47144) The ITMP-SubPel Parameter Derivation Unit 47144 provides ITMP with four types of pixel precision in eight directions. The eight directions are: up, down, left, right, top-left, bottom-left, top-right, and bottom-right. The four types of precision are integer pixel precision, quarter-pixel precision, half-pixel precision, and three-quarter-pixel precision. The index named tmpSubPelIdx is used to represent pixel precision, where the values 0, 1, 2, and 3 correspond to integer, quarter-pixel, half-pixel, and three-quarter-pixel precision, respectively. The index named tmpSubPelDir is used to represent direction, where the values 0, 1, 2, 3, 4, 5, 6, and 7 correspond to up, down, left, right, top-left, top-right, bottom-left, and bottom-right, respectively. For each candidate BV in the bvCandList, there are 25 types of pixel precision: the integer pixel precision (where direction is not considered) and the three types of non-integer precision in eight directions (24 types in total). The TM cost for integer precision is stored in bvCandList. For each BV, the ITMP-SubPel Parameter Derivation Unit 47144 derives the TM cost for non-integer precision and stores the two smallest TM costs in itmpSubPelParam[], specifically itmpSubPelParam[0] and itmpSubPelParam[1]. Each of these two entries includes two indices (tmpSubPelIdx and tmpSubPelDir).

[0114] (ITMP-LIC Parameter Derivation Unit 47141) In ITMP-LIC mode, ITMP-LIC Parameter Derivation Unit 47141 generates a set of LIC parameters for each candidate in the bvCandList. The LIC parameters, a linear prediction model paramters (a, b for y = a * x + b or y = ((a * x) >> shift) + b), may be derived using the least squre estimatin method where the sum of differences between the recSamples pixels (y) at the positions pointed to by the BV and the predicted value u using recSamples pixels (x) at the template positions are minimized. The parameters are stored in itmpLicParam[].

[0115] The four ITMP sub-modes (ITMP-FLM, ITMP-FUSION, ITMP-SubPel, ITMP-LIC) may be applied to all blocks or selectively based on the block size.

[0116] (Used on all blocks) The four sub-modes are used for all block sizes.

[0117] (Disabled based on block size) A threshold named blockSize is set as a positive integer. The comparison between the current block size and this threshold determines whether the four sub-modes are disabled.

[0118] (Embodiment 9) If blockSize is set to 128, then the sub-modes are disabled when the current block size is smaller than 128; otherwise, the sub-modes are allowed.

[0119] (Disabled based on block size for each sub-mode) Four separate thresholds are set for each sub-mode: blockSize1, blockSize2, blockSize3, and blockSize4, all being positive integers. The comparison between the current block size and these thresholds determines whether each of the four sub-modes is disabled individually.

[0120] (Embodiment 10) If blockSize1, blockSize2, blockSize3, and blockSize4 are set to 32, 64, 128, and 256 respectively, then ITMP-FLM, ITMP-FUSION, ITMP-SubPel, and ITMP-LIC are disabled when the current block size is smaller than these values. Otherwise, they are enabled.

[0121] It is worth noting that the method of setting the above thresholds for blockSize can vary in multiple ways.

[0122] (Decodering Syntax) On the decoder side, the value of tmpFlag is first decoded from the bitstream. If tmpFlag is false, it indicates that the current block does not use the ITMP method for prediction, and the decoding operations related to ITMP and sub-modes are skipped.

[0123] If tmpFlag is true, the following operations are performed: Decode the value of tmpFusionFlag from the bitstream. If tmpFusionFlag is false, it indicates that the ITMP-FUSION mode is not used, and the decoding operations related to ITMP-FUSION are skipped. If tmpFusionFlag is true, it indicates that ITMP-FUSION is used for prediction. At this point, tmpIdx is decoded from the bitstream. The range of tmpIdx is from 0 to 2*numItmpFusionMode, where each tmpIdx corresponds to a unique fusion mode. In a case,tmpLicFlag may be decoded. If tmpLicFlag is false, it indicates that the current block is predicted only by ITMP-FUSION. If tmpLicFlag is true, it indicates that the current block is predicted jointly by ITMP-FUSION and ITMP-LIC.

[0124] When tmpFusionFlag is false, tmpIdx is decoded. In this case, the range of tmpIdx is from 0 to numBV, where each value of tmpIdx corresponds to a unique BV mode. Then, tmpFlmFlag is decoded. If tmpFlmFlag is false, it indicates that the ITMP-FLM mode is not used, and the decoding operations related to ITMP-FLM are skipped. If tmpFlmFlag is true, the current block uses ITMP-FLM for prediction.

[0125] When tmpFlmFlag is false, ibcLicIdx is decoded. If ibcLicIdx is 0, then the ITMP-LIC mode is not used, else ITMP-LIC mode is used for predicting the current block.

[0126] When tmpFlmFlag is false, tmpFracIdx is decoded, where the possible values of tmpFracIdx are 0 and 1. After obtaining tmpFracIdx, the values tmpIsSubPel and tmpSubPelIdx may be derived from itmpSubPelParam[], which is derived at the decoder side. These values specify the direction and type of pixel precision used for predicting the current block.

[0127] (Prediction Image Generation Unit 4715) The Prediction Image Generation Unit 4715 generates the predicted image for the current block as follows: (ITMP-LIC and ITMP-FLM) First, the bvCandList is derived, and the bv is derived based on tmpIdx: predBv = bvCandList[tmpIdx].bv. Starting from the position pointed to by predBv, pixels are copied from recSamples[][] that match the size and shape of the current block into the prediction image, predImage[][] as follows.

[0128] predImage[xP][yP] = recSamples[xP][xP], (xP, yP) = (x + predBv[0], y + predBv[1]), x = 0..curWidth-1, y= 0..curHeight-1 The corresponding parameters itmpLicParam[] and itmpFlmParam[] are derived. predImage isfurther post-processed, using the linear prediction model with itmpLicParam[tmpIdx] and / or using the filter operation with itmpFlmParam[tmpIdx] to generate the final prediction image, predictionImage[][], which is used as the output.

[0129] (ITMP and ITMP-SubPel) First, predBv is derived based on tmpIdx. From itmpSubPelParam[], the values tmpIsSubPel and tmpSubPelIdx are derived. These values indicate the direction and pixel precision to use. While copying the pixels starting from predBv, interpolation operations are performed to calculate and generate predictionImage[][], which is used as the output.

[0130] (ITMP-FUSION) The fusion parameters itmpFusParam[] are derived. Based on, the corresponding fusion mode parameters are derived: itmpFusParam[tmpIdx]. itmpFusParam[tmpIdx].realNum indicates the number of BVs used for fusion. itmpFusParam[tmpIdx].Idx represents the position of the first BV stored in bvCandList[]. The fusion mode uses itmpFusParam[tmpIdx].Idx and itmpFusParam[tmpIdx].realNum to derive a seris of BVs, corresponding to: bvCandList[itmpFusParam[tmpIdx].Idx + N] (where N = 0 .. itmpFusParam[tmpIdx].realNum). The weights or equation parameters for fusion are stored in itmpFusParam[tmpIdx].weights, with the number of weights being equal to itmpFusParam[tmpIdx].realNum. These weights correspond to either TMcost-based weights or Wiener-filter-based fusion methods. The Prediction Image Generation Unit 4715 generates a predImage for each BV, and these predImage results are fused using the specified weights or equations to produce the final predictionImage[][].

[0131] The inverse quantization and inverse transform processing unit 311 performs inverse quantization on a quantization transform coefficient input from the prediction parameter derivation unit 320 to calculate a transform coefficient. This quantization transform coefficient is a coefficient derived by performing a frequency transform such as a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or the like on prediction errors to quantize in coding processing. The inverse quantization and inverse transform processing unit 311 performs an inverse frequency transform such as an inverse DCT, an inverse DST, or the like on the derived transform coefficient to calculate a prediction error. The inverse quantization and inverse transform processing unit 311 outputs the prediction error to the addition unit 312.

[0132] The addition unit 312 adds the prediction image of the block input from the intra prediction image generation unit 310 and the prediction error input from the inverse quantization and inverse transform processing unit 311 for each pixel and generates a decoded image of the block. The addition unit 312 stores the decoded image of the block in the reference picture memory 306 and outputs the image to the loop filter 305.

[0133] Configuration of video coding apparatus Next, a configuration of the video coding apparatus 11 according to the present embodiment is described. FIG. 10 is a block diagram illustrating a conuration of the video coding apparatus 11 according to the present embodiment. The video coding apparatus 11 is configured to include a prediction image generation unit 101, a subtraction unit 102, a transform and quantization processing unit 103, an inverse quantization and inverse transform processing unit 105, an addition unit 106, a loop filter 107, a prediction parameter memory (a prediction parameter storage unit, a frame memory) 108, a reference picture memory (a reference image storage unit, a frame memory) 109, a coding parameter determination unit 110, a parameter coding unit 111, prediction parameter derivation unit 120, and an entropy coding unit 104.

[0134] The prediction image generation unit 101 generates a prediction image for each CU that is a region derived by splitting each picture of the image T. The operation of the prediction image generation unit 101 is the same as that of the intra prediction image generation unit 310 already described, and thus descriptions thereof is omitted.

[0135] The subtraction unit 102 subtracts a pixel value of the prediction image of the block input from the prediction image generation unit 101 from a pixel value of the image T to generate a prediction error. The subtraction unit 102 outputs the prediction error to the transform and quantization processing unit 103.

[0136] The transform and quantization processing unit 103 calculates a transform coefficient by performing a frequency transform on the prediction error input from the subtraction unit 102, and derives a quantization transform coefficient by quantization. The transform and quantization proceessing unit 103 outputs the quantization transform coefficient to the entropy coding unit 104 and the inverse quantization and inverse transform processing unit 105.

[0137] The inverse quantization and inverse transform processing unit 105 is the same as the inverse quantization and inverse transform processing unit 311 (FIG. 4) in the video decoding apparatus 31, and descriptions thereof are omitted. The derived prediction error is output to the addition unit 106.

[0138] To the entropy coding unit 104, the quantization transform coefficient is input from the transform and quantization processing unit 103, and coding parameters are input from the parameter coding unit 111. The entropy coding unit 104 performs entropy coding on split information, the prediction parameters, the quantization transform coefficient, and the like to generate and output the coding stream Te.

[0139] The parameter coding unit 111 instructs the entropy coding unit 104 to encode the prediction parameters and quantization coefficients, derived from the prediction parameter derivation unit 120.

[0140] The prediction parameter derivation unit 120 derives the syntax element from the parameters inputted from the coding parameter determination unit 110. Some parts of the prediction parameter derivation unit 120 have the same structure as the prediction parameter derivation unit 320.

[0141] The addition unit 106 adds a pixel value of the prediction image of the block input from the prediction image generation unit 101 and the prediction error input from the inverse quantization and inverse transform processing unit 105 to each other for each pixel, and generates a decoded image. The addition unit 106 stores the generated decoded image in the reference picture memory 109.

[0142] The loop filter 107 applies a deblocking filter, an SAO, and an ALF to the decoded image generated by the addition unit 106. Note that the loop filter 107 need not necessarily include the above-described three types of filters, and may have a configuration of only the deblocking filter, for example.

[0143] The prediction parameter memory 108 stores the prediction parameters generated by the prediction parameter derivation unit 120 for each target picture and CU at a predetermined position. It may stores the transform coefficients created by the transform and quantization processing unit 103.

[0144] The reference picture memory 109 stores the decoded image generated by the loop filter 107 for each target picture and CU at a predetermined position.

[0145] The coding parameter determination unit 110 selects one set among multiple sets of coding parameters. A coding parameter refers to the above-mentioned QT, BT, or TT split information, the prediction parameter, or a parameter to be coded, the parameter being generated in association therewith. The prediction image generation unit 101 generates the prediction image by using these coding parameters.

[0146] The coding parameter determination unit 110 calculates, for each of the multiple sets, an RD cost value indicating the magnitude of an amount of information and a coding error. The RD cost value is, for example, the sum of a code amount and the value derived by multiplying a coefficient λ by a square error. The coding parameter determination unit 110 selects a set of coding parameters of which cost value derived is a minimum value. With this configuration, the entropy coding unit 104 outputs the selected set of coding parameters as the coding stream Te. The coding parameter determination unit 110 outputs the determined coding parameters in the parameter coding unit 111, the prediction parameter derivation unit 120, the prediction image generation unit 101.

[0147] Note that, some of the video coding apparatus 11 and the video decoding apparatus 31 in the above-described embodiment, for example, the entropy decoding unit 301, the parameter decoding unit 302, the loop filter 305, the intra prediction image generation unit 310, the inverse quantization and inverse transform processing unit 311, the addition unit 312, the prediction parameter derivation unit 320, the prediction image generation unit 101, the subtraction unit 102, the transform and quantization processing unit 103, the entropy coding unit 104, the inverse quantization and inverse transform processing unit 105, the loop filter 107, the coding parameter determination unit 110, and the parameter coding unit 111, the prediction parameter derivation unit 120, may be realized by a computer. In that case, this configuration may be realized by recording a program for realizing such control functions on a computer-readable recording medium and causing a computer system to read the program recorded on the recording medium for execution. Note that the “computer system” mentioned here refers to a computer system built into either the video coding apparatus 11 or the video decoding apparatus 31 and is assumed to include an OS and hardware components such as a peripheral apparatus. Furthermore, a “computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, and the like, and a storage device such as a hard disk built into the computer system. Moreover, the “computer-readable recording medium” may include a medium that adaptiveally stores a program for a short period of time, such as a communication line in a case that the program is transmitted over a network such as the Internet or over a communication line such as a telephone line, and may also include a medium that stores the program for a fixed period of time, such as a volatile memory included in the computer system functioning as a server or a client in such a case. Furthermore, the above-described program may be one for realizing some of the above-described functions, and also may be one capable of realizing the above-described functions in combination with a program already recorded in a computer system.

[0148] Furthermore, a part or all of the video coding apparatus 11 and the video decoding apparatus 31 in the embodiment described above may be realized as an integrated circuit such as a Large Scale Integration (LSI). Each function block of the video coding apparatus 11 and the video decoding apparatus 31 may be individually realized as processors, or part or all may be integrated into processors. The circuit integration technique is not limited to LSI, and the integrated circuits for the functional blocks may be realized as dedicated circuits or a multi-purpose processor. In a case that with advances in semiconductor technology, a circuit integration technology with which an LSI is replaced appears, an integrated circuit based on the technology may be used.

[0149] The embodiment of the present disclosure has been described in detail above referring to the drawings, but the specific configuration is not limited to the above embodiments and various amendments may be made to a design that fall within the scope that does not depart from the gist of the present disclosure.

[0150] The embodiment of the present invention may be applied to a video decoding device that decodes coded data of image data, and a video encoding device that generates coded data from image data. In addition, the data structure of the coded data is generated by the video encoding device and referenced by the video decoding device.

[0151] 31 Image decoding apparatus 301 Entropy decoding unit 302 Parameter decoding unit 310 Prediction image generation unit 3104 Intra prediction unit 31046 DIMD prediction unit 310460 Reference sample derivation unit 310461 Gradient derivation unit 310462 Angular mode derivation unit 310463 Prediction image generation unit 310464 Non-angular mode derivation unit 31047 ITMP prediction unit 4701 Input Data Derivation Unit 4711 Search Range Derivation Unit 4712 Prediction Image Derivation Unit 4713 Candidate List Derivation Unit 4714 Sub-mode Paramater Derivation Unit 4715 Prediction Image Generation Unit 47141 ITMP-LIC Parameter Derivation Unit 47142 ITMP-FLM Parameter Derivation Unit 47143 ITMP-Fusion Parameter Derivation Unit 47144 ITMP-SubPel Parameter Derivation Unit 311 Inverse quantization and inverse transform processing unit 312 Addition unit 11 Image coding apparatus 101 Prediction image generation unit 102 Subtraction unit 103 Transform and quantization processing unit 104 Entropy coding unit 105 Inverse quantization and inverse transform processing unit 107 Loop filter 110 Coding parameter determination unit 111 Parameter coding unit <Cross Reference> This patent application claims priority on Japan Patent Application No. 2024-225008 filed on December 20, 2024, the entire contents of which are hereby incorporated by reference.

Claims

1. A video decoding apparatus for generating a prediction image, the video decoding apparatus comprising an ITMP prediction unit, wherein the applicability of the ITMP is controlled based on the size and shape of the current block.

2. The video decoding apparatus of Claim 1, further comprising an ITMP prediction unit, and the size and number of template regions required by the ITMP prediction unit is derived from the size or shape of the current block.

3. The video decoding apparatus of Claim 1, further comprising an ITMP prediction unit, and the size of the search area and the related thresholds of the ITMP prediction unit is derived from the size of the current block.

4. A video encoding apparatus comprising an ITMP prediction unit, wherein the stride for coarse and fine searches in the candidate BV list derivation operation is derived from the size of the current block.

5. A video encoding apparatus comprising an ITMP prediction unit, wherein the ITMP method includes multiple sub-modes, and whether each sub-mode is used for the current block is determined based on the size of the current block.

6. A video encoding apparatus comprising an ITMP prediction unit, wherein the ITMP prediction unit can employ different encoding techniques, and some key parameters and thresholds is derived from the size of the current block.