Method for decoding image information, method for encoding image information, method for bitstream, and computer-readable storage medium

CIIP mode enhances video compression by deriving inter-frame and intra-frame prediction samples, addressing the high costs of high-resolution video transmission and storage through improved prediction and coding efficiency.

WO2026135171A1PCT designated stage Publication Date: 2026-06-25LG ELECTRONICS INC

Patent Information

Authority / Receiving Office
WO · WO
Patent Type
Applications
Current Assignee / Owner
LG ELECTRONICS INC
Filing Date
2025-12-16
Publication Date
2026-06-25

AI Technical Summary

Technical Problem

The increasing demand for high-resolution, high-quality video has led to higher transmission and storage costs due to the increase in transmitted information or bits, necessitating improved video compression technologies.

Method used

The implementation of Combined Inter and Intra Prediction (CIIP) mode or modified CIIP mode to enhance prediction performance, coding efficiency, and data transmission efficiency by deriving prediction samples for current blocks using inter-frame and intra-frame prediction techniques, including luminance compensation and weighted sums of prediction samples.

Benefits of technology

This approach improves prediction performance, coding efficiency, and data transmission efficiency, reducing the costs associated with high-resolution video transmission and storage.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure KR2025021879_25062026_PF_FP_ABST
    Figure KR2025021879_25062026_PF_FP_ABST
Patent Text Reader

Abstract

A method for decoding image information, according to the present disclosure, comprises the steps of: acquiring image information including prediction information; deriving a prediction mode for the current block on the basis of the prediction information; and deriving prediction samples for the current block on the basis of the prediction mode. The deriving of prediction samples for the current block includes: deriving inter prediction samples for the current block on the basis of a reference block; deriving intra prediction samples for the current block on the basis of neighboring samples of the reference block; and deriving prediction samples for the current block on the basis of the inter prediction samples and the intra prediction samples.
Need to check novelty before this filing date? Find Prior Art

Description

Method for decoding image information, method for encoding image information, method relating to a bitstream and a computer-readable storage medium

[0001] The present disclosure relates to a method for decoding image information, a method for encoding image information, a method for bitstreams, and a computer-readable storage medium.

[0002] Recently, the demand for high-resolution, high-quality video, such as HD (High Definition) and UHD (Ultra High Definition), has been increasing across various fields. As video data becomes higher in resolution and quality, the relative amount of information or bits transmitted increases compared to conventional video data. This increase in transmitted information or bits leads to higher transmission and storage costs.

[0003] Accordingly, high-efficiency video compression technology is required to effectively transmit, store, and play back high-resolution, high-quality video information.

[0004] The present disclosure aims to improve prediction performance by intra-frame prediction in CIIP (combined inter and intra prediction) mode or a modified CIIP mode.

[0005] The present disclosure aims to improve coding efficiency by CIIP mode or a modified CIIP mode.

[0006] The present disclosure aims to improve data transmission efficiency by CIIP mode or a modified CIIP mode.

[0007] The technical problems to be solved in this disclosure are not limited to those mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art to which this disclosure belongs from the description below.

[0008] A method for decoding image information according to one aspect of the present disclosure comprises: acquiring the image information including prediction information; deriving a prediction mode for a current block based on the prediction information; and deriving prediction samples for the current block based on the prediction mode, wherein deriving prediction samples for the current block comprises deriving inter-frame prediction samples for the current block based on a reference block; deriving intra-frame prediction samples for the current block based on surrounding samples of the reference block; and deriving prediction samples for the current block based on the inter-frame prediction samples and the intra-frame prediction samples.

[0009] According to one aspect of the present disclosure, an apparatus for decoding image information comprises a memory and at least one processor connected to the memory, wherein the at least one processor acquires the image information including prediction information; derives a prediction mode for a current block based on the prediction information; and derives prediction samples for the current block based on the prediction mode, wherein the at least one processor derives cross-frame prediction samples for the current block based on a reference block; derives intra-frame prediction samples for the current block based on surrounding samples of the reference block; and derives prediction samples for the current block based on the cross-frame prediction samples and the intra-frame prediction samples.

[0010] In a method or device for decoding the above image information, the in-frame prediction samples for the current block may be derived based on the difference values ​​between the first in-frame prediction samples derived based on the surrounding samples of the current block and the second in-frame prediction samples derived based on the surrounding samples of the reference block.

[0011] In a method or device for decoding the above image information, the prediction information may include at least one of information regarding a first screen prediction mode for generating prediction samples within the first screen and information regarding a second screen prediction mode for generating prediction samples within the second screen.

[0012] In a method or device for decoding the above image information, the first in-frame prediction mode may be different from the second in-frame prediction mode.

[0013] In a method or device for decoding the above image information, the first intra-frame prediction mode and the second intra-frame prediction mode may each include a Decoder-side intra-mode derivation (DIMD).

[0014] In a method or device for decoding the above image information, the in-frame prediction samples for the current block may be derived based on the difference values ​​between the third in-frame prediction samples derived based on the surrounding samples of the reference block and the fourth in-frame prediction samples derived based on the third in-frame prediction samples to which luminance compensation has been applied.

[0015] In the method or device for decoding the above image information, the luminance compensation parameter for the luminance compensation can be determined such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized.

[0016] In a method or device for decoding the above image information, the in-frame prediction samples for the current block may be derived based on the difference values ​​between the fifth in-frame prediction samples derived based on the surrounding samples of the reference block and the sixth in-frame prediction samples derived based on the surrounding samples of the reference block to which luminance compensation is applied.

[0017] In the method or device for decoding the above image information, the luminance compensation parameter for the luminance compensation can be determined such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized.

[0018] In a method or device for decoding the above image information, the prediction information may include at least one of information regarding a prediction mode within a fifth screen for generating prediction samples within the fifth screen and information regarding a prediction mode within a sixth screen for generating prediction samples within the sixth screen.

[0019] In the method or device for decoding the above image information, the prediction mode within the fifth frame may be the same as the prediction mode within the sixth frame.

[0020] In a method or device for decoding the above image information, the prediction samples for the current block may be derived based on a weighted sum between the inter-frame prediction samples and the intra-frame prediction samples.

[0021] In the method or device for decoding the above image information, the weight of the weighted sum between the inter-frame prediction samples and the intra-frame prediction samples may vary depending on the region within the current block.

[0022] In a method or device for decoding the above image information, the weight of the weighted sum between the cross-frame prediction samples and the intra-frame prediction samples can be derived based on whether intra-frame prediction is applied to the surrounding blocks of the current block.

[0023] According to one aspect of the present disclosure, a method for encoding image information comprises determining a prediction mode for a current block; generating prediction samples for the current block based on the prediction mode; generating prediction information based on the prediction mode; and encoding image information including the prediction information, wherein generating prediction samples for the current block may include generating cross-frame prediction samples for the current block based on a reference block; generating intra-frame prediction samples for the current block based on surrounding samples of the reference block; and generating prediction samples for the current block based on the cross-frame prediction samples and the intra-frame prediction samples.

[0024] According to one aspect of the present disclosure, an apparatus for encoding image information comprises a memory and at least one processor connected to the memory, wherein the at least one processor determines a prediction mode for a current block; generates prediction samples for a current block based on the prediction mode; generates prediction information based on the prediction mode; and encodes image information including the prediction information, wherein the at least one processor generates inter-frame prediction samples for a current block based on a reference block; generates intra-frame prediction samples for a current block based on surrounding samples of the reference block; and generates prediction samples for a current block based on the inter-frame prediction samples and the intra-frame prediction samples.

[0025] A method for a bitstream according to one aspect of the present disclosure comprises generating a bitstream; transmitting data including said bitstream, wherein the bitstream is generated based on determining a prediction mode for a current block, generating prediction samples for the current block based on said prediction mode, generating prediction information based on said prediction mode, and encoding image information including said prediction information, wherein generating prediction samples for the current block comprises generating inter-frame prediction samples for the current block based on a reference block; generating intra-frame prediction samples for the current block based on surrounding samples of said reference block; and generating prediction samples for the current block based on said inter-frame prediction samples and said intra-frame prediction samples.

[0026] According to one aspect of the present disclosure, an apparatus for a bitstream comprises: at least one processor for generating a bitstream; and a transmission unit for transmitting data including said bitstream, wherein the at least one processor determines a prediction mode for a current block, generates prediction samples for a current block based on said prediction mode, generates prediction information based on said prediction mode, and generates said bitstream based on encoding image information including said prediction information, and the at least one processor generates inter-frame prediction samples for a current block based on a reference block; generates intra-frame prediction samples for a current block based on surrounding samples of said reference block; and generates prediction samples for a current block based on said inter-frame prediction samples and said intra-frame prediction samples.

[0027] A computer-readable medium according to one aspect of the present disclosure stores a bitstream generated based on determining a prediction mode for a current block, generating prediction samples for a current block based on the prediction mode, generating prediction information based on the prediction mode, and encoding image information including the prediction information. Generating prediction samples for a current block includes generating inter-frame prediction samples for a current block based on a reference block; generating intra-frame prediction samples for a current block based on surrounding samples of the reference block; and generating prediction samples for a current block based on the inter-frame prediction samples and the intra-frame prediction samples.

[0028] The features briefly summarized above regarding the present disclosure are merely exemplary aspects of the detailed description of the present disclosure that follows and do not limit the scope of the present disclosure.

[0029] According to the present disclosure, prediction performance by intra-frame prediction is improved in CIIP (combined inter and intra prediction) mode or modified CIIP mode.

[0030] According to the present disclosure, coding efficiency is improved by CIIP mode or modified CIIP mode.

[0031] According to the present disclosure, data transmission efficiency is improved by CIIP mode or modified CIIP mode.

[0032] The effects obtainable from the present disclosure are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which the present disclosure belongs from the description below.

[0033] FIG. 1 is a schematic diagram illustrating a video coding system to which an embodiment according to the present disclosure can be applied.

[0034] FIG. 2 is a schematic diagram showing an encoding device to which an embodiment according to the present disclosure can be applied.

[0035] FIG. 3 is a schematic diagram showing a decoding device to which an embodiment according to the present disclosure can be applied.

[0036] FIG. 4 is a diagram showing the search area of ​​intra-template matching prediction in an embodiment according to the present disclosure.

[0037] FIGS. 5 and 6 are drawings for illustrating a DIMD mode that can be applied to an embodiment according to the present disclosure.

[0038] FIG. 7 is a diagram illustrating a method for decoding image information according to one embodiment of the present disclosure.

[0039] FIG. 8 is a diagram illustrating an example of a method for deriving prediction samples according to one embodiment of the present disclosure.

[0040] FIG. 9 is a diagram illustrating an example of a method for deriving prediction samples according to one embodiment of the present disclosure.

[0041] FIG. 10 is a diagram illustrating an example of a method for deriving prediction samples according to one embodiment of the present disclosure.

[0042] FIG. 11 is a drawing illustrating a method for encoding image information according to one embodiment of the present disclosure.

[0043] FIG. 12 is a drawing illustrating an exemplary content streaming system to which an embodiment according to the present disclosure can be applied.

[0044] Hereinafter, embodiments of the present disclosure are described in detail with reference to the attached drawings so that those skilled in the art can easily implement them. However, the present disclosure may be embodied in various different forms and is not limited to the embodiments described herein.

[0045] In describing the embodiments of the present disclosure, detailed descriptions of known configurations or functions are omitted if it is determined that such descriptions could obscure the essence of the present disclosure. Additionally, parts of the drawings unrelated to the description of the present disclosure have been omitted, and similar parts are denoted by similar reference numerals.

[0046] In the present disclosure, when a component is described as being "connected," "combined," or "joined" with another component, this may include not only a direct connection but also an indirect connection in which another component exists in between. Furthermore, when a component is described as "comprising" or "having" another component, this means that, unless specifically stated otherwise, it does not exclude the other component but may include an additional component.

[0047] In the present disclosure, terms such as first, second, etc. are used solely for the purpose of distinguishing one component from another and do not limit the order or importance of the components unless specifically stated otherwise. Accordingly, within the scope of the present disclosure, a first component in one embodiment may be referred to as a second component in another embodiment, and likewise, a second component in one embodiment may be referred to as a first component in another embodiment.

[0048] In this disclosure, distinct components are intended to clearly describe their respective features and do not imply that the components are separate. That is, multiple components may be integrated to form a single hardware or software unit, or a single component may be distributed to form multiple hardware or software units. Accordingly, such integrated or distributed embodiments are included within the scope of this disclosure, unless otherwise noted.

[0049] In the present disclosure, the components described in various embodiments do not necessarily mean essential components, and some may be optional components. Accordingly, embodiments consisting of a subset of the components described in one embodiment are also included within the scope of the present disclosure. Furthermore, embodiments including additional components in addition to the components described in various embodiments are also included within the scope of the present disclosure.

[0050] The present disclosure relates to the encoding and decoding of images. For example, the methods and embodiments disclosed in this document may be applied to methods disclosed in the VVC (versatile video coding) standard, EVC (essential video coding) standard, AV1 (AOMedia Video 1) standard, AVS2 (2nd generation of audio video coding standard) or next-generation video / image coding standards (e.g., H.267 or H.268).

[0051] The present disclosure presents various embodiments relating to video / image coding, and unless otherwise stated, said embodiments may be performed in combination with one another.

[0052] Unless newly defined in this disclosure, the terms used herein may have the ordinary meanings commonly used in the technical field to which this disclosure belongs.

[0053] In this disclosure, "video" may refer to a set of images over time. In this disclosure, "picture" generally refers to a unit representing a single image at a specific time, and a slice / tile is a unit that constitutes a part of a picture in coding. A slice / tile may include one or more coding tree units (CTUs). A picture may be composed of one or more slices / tiles. A picture may be composed of one or more tile groups. A tile group may include one or more tiles. A brick may represent a rectangular area of ​​rows of CTUs within a tile in a picture. In this document, tile groups and slices may be used interchangeably. For example, in this document, a tile group / tile group header may be referred to as a slice / slice header.

[0054] In the present disclosure, "pixel" or "pel" may refer to the smallest unit constituting a picture (or image). Additionally, "sample" may be used as a term corresponding to pixel. A sample may generally represent a pixel or a pixel value, may represent only the pixel / pixel value of the luminance component, or may represent only the pixel / pixel value of the chroma component.

[0055] In this disclosure, "unit" may represent a basic unit of image processing. A unit may include at least one of a specific area of ​​a picture and information related to that area. A unit may include one luminance block and two chroma (e.g., cb, cr) blocks. Depending on the case, the term "unit" may be used interchangeably with terms such as "block" or "area." In general, an MxN block may include samples (or sample arrays) or a set (or array) of transform coefficients consisting of M columns and N rows.

[0056] In the present disclosure, "current block" may mean one of "current coding block," "current coding unit," "block to be encoded," "block to be decoded," or "block to be processed." When prediction is performed, "current block" may mean "current prediction block" or "block to be predicted." When transformation (inverse transformation) / quantization (inverse quantization) is performed, "current block" may mean "current transformation block" or "block to be transformed." When filtering is performed, "current block" may mean "block to be filtered."

[0057] In the present disclosure, "current block" may mean a block comprising both a luminous component block and a chroma component block, or "luma block of the current block," unless explicitly stated as a chroma block. The luminous component block of the current block may be expressed by including an explicit description of a luminous component block, such as "luma block" or "current luminous block." Additionally, the chroma component block of the current block may be expressed by including an explicit description of a chroma component block, such as "chroma block" or "current chroma block."

[0058] In the present disclosure, " / " and "," may be interpreted as "and / or." For example, "A / B" and "A, B" may be interpreted as "A and / or B." Additionally, "A / B / C" and "A, B, C" may mean "at least one of A, B and / or C."

[0059] In the present disclosure, "or" may be interpreted as "and / or". For example, "A or B" may mean 1) "A" only, 2) "B" only, or 3) "A and B". Alternatively, in the present disclosure, "or" may mean "additionally or alternatively".

[0060] FIG. 1 is a schematic diagram illustrating a video / image coding system to which an embodiment according to the present disclosure can be applied.

[0061] Referring to FIG. 1, a video / image coding system may include a first device (source device) and a second device (receiving device). The source device may transmit encoded video / image or data in the form of a file or streaming to the receiving device via a digital storage medium or a network.

[0062] The source device may include a video source, an encoding device, and a transmission unit. The receiving device may include a receiver, a decoding device, and a renderer. The encoding device may be called a video / image encoding device, and the decoding device may be called a video / image decoding device. The transmitter may be included in the encoding device. The receiver may be included in the decoding device. The renderer may include a display unit, and the display unit may be composed of a separate device or an external component.

[0063] A video source may acquire video / images through processes such as video / image capture, synthesis, or generation. The video source may include a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, one or more cameras, a video / image archive containing previously captured video / images, etc. The video / image generation device may include, for example, a computer, a tablet, and a smartphone, etc., and may generate video / images (electronically). For example, virtual video / images may be generated through a computer, etc., in which case the video / image capture process may be replaced by a process in which related data is generated.

[0064] The encoding device can encode input video / images. The encoding device can perform a series of procedures, such as prediction, transformation, and quantization, for compression and coding efficiency. The encoded data (encoded video / image information) can be output in the form of a bitstream.

[0065] The transmission unit can transmit encoded video / image information or data output in the form of a bitstream to the receiving unit of a receiving device via a digital storage medium or a network in the form of a file or streaming. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, and SSD. The transmission unit may include elements for creating a media file through a predetermined file format and elements for transmission via a broadcasting / communication network. The receiving unit can receive / extract the bitstream and transmit it to a decoding device.

[0066] The decoding device can decode video / images by performing a series of procedures such as inverse quantization, inverse transform, and prediction corresponding to the operation of the encoding device.

[0067] The renderer can render the decoded video / image. The rendered video / image can be displayed through the display unit.

[0068] FIG. 2 is a schematic diagram illustrating an encoding device to which an embodiment according to the present disclosure can be applied.

[0069] Referring to FIG. 2, the encoding device (200) may be configured to include an image partitioner (210), a predictor (220), a residual processor (230), an entropy encoder (240), an adder (250), a filter (260), and a memory (270). The predictor (220) may include an inter-predictor (221) and an intra-predictor (222). The residual processor (230) may include a transformer (232), a quantizer (233), a dequantizer (234), and an inverse transformer (235). The residual processor (230) may further include a subtractor (231). The addition unit (250) may be referred to as a reconstructor or a reconstructed block generator. The above-described image segmentation unit (210), prediction unit (220), residual processing unit (230), entropy encoding unit (240), addition unit (250), and filtering unit (260) may be configured by one or more hardware components (e.g., an encoder chipset or processor) according to the embodiment. Additionally, the memory (270) may include a DPB (Decoded Picture Buffer) and may be configured by a digital storage medium. The hardware component may further include the memory (270) as an internal / external component.

[0070] The image segmentation unit (210) can divide an input image (or picture, frame) input to an encoding device (200) into one or more processing units. For example, the processing unit may be called a coding unit (CU). A coding unit may be recursively divided into a coding tree unit (CTU) or a largest coding unit (LCU) according to a QTBTTT (Quad-tree binary-tree ternary-tree) structure. For example, a single coding unit may be divided into multiple coding units of a deeper depth based on a quad-tree structure, a binary-tree structure, and / or a ternary-tree structure. For example, a quad-tree structure may be applied first, and a binary-tree structure and / or a ternary-tree structure may be applied later. Alternatively, a binary-tree structure may be applied first. A coding procedure according to the present disclosure may be performed based on the final coding unit that is no longer divided. In this case, based on coding efficiency according to image characteristics, the maximum coding unit may be used directly as the final coding unit, or, if necessary, the maximum coding unit may be recursively divided into lower-depth coding units so that a coding unit of the optimal size is used as the final coding unit. Here, the coding procedure may include procedures such as prediction, transformation, and restoration described later. As another example, the processing unit may further include a prediction unit (PU) or a transformation unit (TU). The prediction unit and the transformation unit may each be divided or partitioned from the final coding unit.The above prediction unit may be a unit of sample prediction, and the above transformation unit may be a unit that derives transformation coefficients and / or a unit that derives a residual signal from transformation coefficients.

[0071] The term "unit" may be used interchangeably with terms such as "block" or "area" depending on the context. In general, an MxN block may represent a set of samples or transform coefficients consisting of M columns and N rows. A sample can generally represent a pixel or a pixel value, and may represent only the pixel / pixel value of the luminance component or only the pixel / pixel value of the chroma component. A sample may be used to refer to a single picture (or image) as a term corresponding to a pixel or pel.

[0072] The encoding device (200) can generate a residual signal (residual block, residual sample array) by subtracting a prediction signal (predicted block, prediction sample array) output from an inter prediction unit (221) or an intra prediction unit (222) from an input image signal (original block, original sample array), and the generated residual signal is transmitted to a conversion unit (232). In this case, as illustrated, the unit that subtracts the prediction signal (predicted block, prediction sample array) from the input image signal (original block, original sample array) within the encoding device (200) may be called a subtraction unit (231). The prediction unit (220) can perform a prediction for a block to be processed (hereinafter, current block) and generate a predicted block (predicted block) containing prediction samples for said current block. The prediction unit (220) can determine whether intra prediction is applied or inter prediction is applied in units of the current block or CU. The prediction unit (220) can generate various information regarding prediction, such as prediction mode information, as described below in the description of each prediction mode, and transmit it to the entropy encoding unit (240). The information regarding prediction can be encoded in the entropy encoding unit (240) and output in the form of a bitstream.

[0073] The intra prediction unit (222) can predict the current block by referring to samples within the current picture. The referenced samples may be located near the current block or away from it, depending on the prediction mode. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The non-directional modes may include, for example, a DC mode and a Planar mode. The directional modes may include, for example, 33 directional prediction modes or 65 directional prediction modes, depending on the degree of fineness of the prediction direction. However, this is merely an example, and depending on the settings, more or fewer directional prediction modes may be used. The intra prediction unit (222) may also determine the prediction mode applied to the current block by using the prediction mode applied to the surrounding blocks.

[0074] The inter prediction unit (221) can derive a predicted block for the current block based on a reference block (reference block) identified by a motion vector on a reference picture. At this time, to reduce the amount of motion information transmitted in the inter prediction mode, motion information can be predicted in blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include information on the inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of inter prediction, neighboring blocks may include spatial neighboring blocks existing within the current picture and temporal neighboring blocks existing in the reference picture. The reference picture containing the reference blocks and the reference picture containing the temporal neighboring blocks may be the same or different from each other. The temporal neighboring blocks may be referred to by names such as collocated reference block, collocated CU (colCU), etc. A reference picture containing the aforementioned temporal surrounding blocks may be called a collocated picture (colPic). For example, the inter prediction unit (221) may construct a list of motion information candidates based on surrounding blocks and generate information indicating which candidate is used to derive the motion vector and / or reference picture index of the current block. Inter prediction may be performed based on various prediction modes, for example, in the case of skip mode and merge mode, the inter prediction unit (221) may use the motion information of surrounding blocks as motion information of the current block. In the case of skip mode, unlike merge mode, a residual signal may not be transmitted.In the motion vector prediction (MVP) mode, the motion vector of surrounding blocks is used as a motion vector predictor, and the motion vector of the current block can be indicated by signaling the motion vector difference.

[0075] The prediction unit (220) may generate a prediction signal based on various prediction methods and / or prediction techniques described below. For example, the prediction unit (220) may apply intra prediction or inter prediction for the prediction of the current block, as well as apply intra prediction and inter prediction simultaneously. A prediction method that applies intra prediction and inter prediction simultaneously for the prediction of the current block may be called combined inter and intra prediction (CIIP). Additionally, the prediction unit (220) may be based on an intra block copy (IBC) prediction mode or a palette mode for the prediction of the block. The IBC prediction mode or palette mode may be used for content video / video coding, such as in games, for example, screen content coding (SCC). IBC basically performs prediction within the current picture, but it may be performed similarly to inter prediction in that it derives a reference block within the current picture. That is, IBC may use at least one of the inter prediction techniques described in this document. Palette mode can be viewed as an example of intra-coding or intra-prediction. When palette mode is applied, sample values ​​within a picture can be signaled based on information regarding palette tables and palette indices.

[0076] The prediction signal generated through the prediction unit (220) can be used to generate a restoration signal or to generate a residual signal. The subtraction unit (231) can generate a residual signal (residual signal, residual block, residual sample array) by subtracting the prediction signal (predicted block, prediction sample array) output from the prediction unit (220) from the input image signal (original block, original sample array). The generated residual signal can be transmitted to the conversion unit (232).

[0077] The transformation unit (232) can generate transform coefficients by applying a transformation technique to a residual signal. For example, the transformation technique may include at least one of a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), a Karhunen-Loeve Transform (KLT), a Graph-Based Transform (GBT), or a Conditionally Non-linear Transform (CNT). Here, GBT refers to a transformation obtained from a graph when the relationship information between pixels is represented as a graph. CNT refers to a transformation obtained based on a prediction signal generated using all previously reconstructed pixels. The transformation process may be applied to a block of pixels of the same size in a square, or to a block of variable size that is not square.

[0078] The quantization unit (233) can quantize the transformation coefficients and transmit them to the entropy encoding unit (240). The entropy encoding unit (240) can encode the quantized signal (information regarding the quantized transformation coefficients) and output it as a bitstream. The information regarding the quantized transformation coefficients may be called residual information. The quantization unit (233) can rearrange the block-shaped quantized transformation coefficients into a one-dimensional vector form based on the coefficient scan order, and can also generate information regarding the quantized transformation coefficients based on the one-dimensional vector-shaped quantized transformation coefficients.

[0079] The entropy encoding unit (240) can perform various encoding methods such as, for example, exponential Golomb, CAVLC (context-adaptive variable length coding), CABAC (context-adaptive binary arithmetic coding), etc. The entropy encoding unit (190) may encode information required for video / image restoration (e.g., values ​​of syntax elements) together or separately, in addition to quantized transform coefficients. The encoded information (e.g., encoded video / image information) may be transmitted or stored in the form of a bitstream in units of NAL (network abstraction layer) units. The video / image information may further include information regarding various parameter sets, such as an adaptation parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). Additionally, the video / image information may further include general constraint information. The signaling information, transmitted information, and / or syntax elements mentioned in the present disclosure may be included in the video / image information. The video / image information may be encoded through the encoding procedure described above and included in the bitstream.

[0080] The above bitstream may be transmitted via a network or stored in a digital storage medium. Here, the network may include a broadcasting network and / or a communication network, and the digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. A transmission unit (not shown) for transmitting a signal output from the entropy encoding unit (240) and / or a storage unit (not shown) for storing it may be provided as an internal / external element of the encoding device (200), or the transmission unit may be provided as a component of the entropy encoding unit (240).

[0081] The quantized transformation coefficients output from the quantization unit (233) can be used to generate a residual signal. For example, a residual signal (residual block or residual samples) can be restored by applying inverse quantization and inverse transformation to the quantized transformation coefficients through the inverse quantization unit (234) and the inverse transformation unit (235).

[0082] Meanwhile, LMCS (luma mapping with chroma scaling) may be applied during the picture encoding and / or restoration process.

[0083] The adder (250) can generate a reconstructed signal (reconstructed picture, reconstructed block, reconstructed sample array) by adding the reconstructed residual signal to the prediction signal output from the inter prediction unit (221) or the intra prediction unit (222). In cases where there is no residual for the block to be processed, such as when a skip mode is applied, the predicted block can be used as the reconstructed block. The adder (250) may be called a reconstructed unit or a reconstructed block generation unit. The generated reconstructed signal can be used for intra prediction of the next block to be processed within the current picture, and can also be used for inter prediction of the next picture after undergoing filtering as described below.

[0084] The filtering unit (260) can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit (260) can generate a modified restored picture by applying various filtering methods to the restored picture, and can store the modified restored picture in memory (270), specifically in the DPB of memory (170). The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, etc. The filtering unit (260) can generate various information regarding filtering and transmit it to the entropy encoding unit (240), as described below in the description of each filtering method. The information regarding filtering can be encoded in the entropy encoding unit (240) and output in the form of a bitstream.

[0085] The modified restored picture transmitted to the memory (270) can be used as a reference picture in the inter-prediction unit (221). Through this, the encoding device (200) can avoid prediction mismatches between the encoding device (200) and the decoding device when inter-prediction is applied, and can also improve encoding efficiency.

[0086] The DPB in memory (270) can store a modified restored picture to be used as a reference picture in the inter prediction unit (221). Memory (270) can store motion information of blocks from which motion information is derived (or encoded) in the current picture and / or motion information of blocks in the picture that have already been restored. The stored motion information can be transmitted to the inter prediction unit (221) to be used as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. Memory (270) can store restoration samples of restored blocks in the current picture and transmit them to the intra prediction unit (222).

[0087] FIG. 3 is a schematic diagram illustrating a decoding device to which an embodiment according to the present disclosure can be applied.

[0088] As illustrated in FIG. 3, the decoding device (300) may be configured to include an entropy decoder (310), a residual processor (320), a predictor (330), an adder (340), a filter (350), and a memory (360). The predictor (330) may include an inter-predictor (332) and an intra-predictor (331). The residual processor (320) may include a dequantizer (321) and an inverse transformer (321). The aforementioned entropy decoding unit (310), residual processing unit (320), prediction unit (330), addition unit (340), and filtering unit (350) may be configured by a single hardware component (e.g., a decoder chipset or a processor) according to an embodiment. Additionally, the memory (360) may include a decoded picture buffer (DPB) and may be configured by a digital storage medium. The hardware component may further include the memory (360) as an internal / external component.

[0089] When a bitstream containing video / image information is input, the decoding device (300) can restore the image by performing a process corresponding to the process performed by the encoding device (200) of FIG. 2. For example, the decoding device (300) can perform decoding using a processing unit applied in the encoding device (200). Thus, the processing unit for decoding may be, for example, a coding unit. The coding unit may be a coding tree unit, or a maximum coding unit may be obtained by dividing it according to a quad tree structure, a binary tree structure, and / or a binary tree structure. And, the restored image signal decoded and output through the decoding device (300) can be played back through a playback device (not shown).

[0090] The decoding device (300) can receive a signal output from the encoding device (200) of FIG. 2 in the form of a bitstream. The received signal can be decoded through an entropy decoding unit (310). For example, the entropy decoding unit (310) can parse the bitstream to derive information necessary for image restoration (or picture restoration) (e.g., video / image information). The video / image information may further include information regarding various parameter sets, such as an adaptation parameter set (APS), a picture parameter set (PPS), a sequence parameter set (SPS), or a video parameter set (VPS). Additionally, the video / image information may further include general constraint information. The decoding device (300) can decode the picture based on the information regarding the parameter sets and / or the general constraint information. The signaling / received information and / or syntax elements described below can be obtained from the bitstream by decoding through the decoding procedure. For example, the entropy decoding unit (310) can decode information within the bitstream based on coding methods such as exponential chord coding, CAVLC, or CABAC, and output values ​​of syntax elements required for image restoration and quantized values ​​of transformation coefficients regarding residuals. More specifically, the CABAC entropy decoding method can receive bins corresponding to each syntax element in the bitstream, determine a context model using information on the syntax element to be decoded and decoding information of surrounding and decoding target blocks or information on symbols / bins decoded in the previous step, predict the probability of occurrence of the bin according to the determined context model, and perform arithmetic decoding of the bin to generate a symbol corresponding to the value of each syntax element.At this time, the CABAC entropy decoding method can update the context model using the decoded symbol / bin information for the context model of the next symbol / bin after determining the context model. Among the information decoded in the entropy decoding unit (310), information regarding prediction is provided to the prediction unit (330), and residual values ​​for which entropy decoding was performed in the entropy decoding unit (310), i.e., quantized transformation coefficients and related parameter information, can be input to the residual processing unit (320). The residual processing unit (320) can derive residual signals (residual blocks, residual samples, residual sample array). Additionally, among the information decoded in the entropy decoding unit (310), information regarding filtering can be provided to the filtering unit (350). Meanwhile, a receiving unit (not shown) that receives a signal output from an encoding device may be further configured as an internal / external element of the decoding device (300), or the receiving unit may be a component of the entropy decoding unit (310). Meanwhile, the decoding device according to the present document may be called a video / image / picture decoding device, and the decoding device may be divided into an information decoder (video / image / picture information decoder) and a sample decoder (video / image / picture sample decoder). The information decoder may include the entropy decoding unit (310), and the sample decoder may include at least one of the inverse quantization unit (321), inverse transform unit (322), adder (340), filtering unit (350), memory (360), inter prediction unit (332), and intra prediction unit (331).

[0091] In the inverse quantization unit (321), the quantized transformation coefficients can be inversely quantized to output transformation coefficients. The inverse quantization unit (321) can rearrange the quantized transformation coefficients into a two-dimensional block form. In this case, the rearrangement can be performed based on the coefficient scan order performed in the encoding device (200). The inverse quantization unit (321) can perform inverse quantization on the quantized transformation coefficients using quantization parameters (e.g., quantization step size information) and obtain transformation coefficients.

[0092] In the inverse conversion unit (322), the conversion coefficients can be inversely converted to obtain a residual signal (residual block, residual sample array).

[0093] The prediction unit (330) can generate a prediction signal based on various prediction methods described below. For example, the prediction unit may apply intra prediction or inter prediction for a single block, and may also apply intra prediction and inter prediction simultaneously. This may be called combined inter and intra prediction (CIIP). Additionally, the prediction unit may be based on an intra block copy (IBC) prediction mode or a palette mode for predicting a block. The IBC prediction mode or palette mode may be used for content video / video coding, such as in games, for example, screen content coding (SCC). IBC basically performs prediction within the current picture, but it can be performed similarly to inter prediction in that it derives a reference block within the current picture. That is, IBC may use at least one of the inter prediction techniques described in this document. The palette mode can be viewed as an example of intra coding or intra prediction. When palette mode is applied, information regarding the palette table and palette index can be included in the above video / image information and signaled.

[0094] The intra prediction unit (331) can predict the current block by referring to samples within the current picture. The description of the intra prediction unit (222) may be applied equally to the intra prediction unit (331). The referenced samples may be located in the neighborhood of the current block or located away from it, depending on the prediction mode. In intra prediction, the prediction modes may include a plurality of non-directional modes and a plurality of directional modes. The intra prediction unit (331) may determine the prediction mode applied to the current block by using the prediction mode applied to the neighboring blocks.

[0095] The inter prediction unit (332) can derive a predicted block for the current block based on a reference block (reference block) identified by a motion vector on a reference picture. At this time, to reduce the amount of motion information transmitted in the inter prediction mode, motion information can be predicted in blocks, sub-blocks, or samples based on the correlation of motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include information on the inter prediction direction (L0 prediction, L1 prediction, Bi prediction, etc.). In the case of inter prediction, neighboring blocks may include spatial neighboring blocks existing within the current picture and temporal neighboring blocks existing in the reference picture. For example, the inter prediction unit (332) may construct a motion information candidate list based on the neighboring blocks and derive the motion vector and / or reference picture index of the current block based on the received candidate selection information. Inter-prediction can be performed based on various prediction modes (techniques), and information regarding the prediction may include information indicating the mode (technique) of inter-prediction for the current block.

[0096] The adder (340) can generate a restoration signal (restored picture, restored block, restored sample array) by adding the acquired residual signal to the prediction signal (predicted block, predicted sample array) output from the prediction unit (330) (including the inter prediction unit (332) and / or intra prediction unit (331)). In cases where there is no residual for the block to be processed, such as when a skip mode is applied, the predicted block can be used as the restoration block. The description of the adder (250) can be applied equally to the adder (340). The adder (340) may be called a restoration unit or a restoration block generation unit. The generated restoration signal can be used for intra prediction of the next block to be processed within the current picture, and can also be used for inter prediction of the next picture after undergoing filtering as described below.

[0097] Meanwhile, LMCS (luma mapping with chroma scaling) may be applied during the picture decoding process.

[0098] The filtering unit (350) can improve subjective / objective image quality by applying filtering to the restored signal. For example, the filtering unit (350) can generate a modified restored picture by applying various filtering methods to the restored picture, and can store the modified restored picture in memory (360), specifically in the DPB of memory (360). The various filtering methods may include, for example, deblocking filtering, sample adaptive offset, adaptive loop filter, bilateral filter, etc.

[0099] The (modified) restored picture stored in the DPB of the memory (360) can be used as a reference picture in the inter-prediction unit (332). The memory (360) can store motion information of blocks from which motion information within the current picture has been derived (or decoded) and / or motion information of blocks within the picture that have already been restored. The stored motion information can be transmitted to the inter-prediction unit (332) to be used as motion information of spatially surrounding blocks or motion information of temporally surrounding blocks. The memory (360) can store restoration samples of blocks restored within the current picture and transmit them to the intra-prediction unit (331).

[0100] In this specification, the embodiments described in the filtering unit (260), inter prediction unit (221), and intra prediction unit (222) of the encoding device (200) may be applied to the filtering unit (350), inter prediction unit (332), and intra prediction unit (331) of the decoding device (300) in the same or corresponding manner.

[0101] The prediction unit of the encoding device / decoding device can derive a reference sample according to the intra prediction mode of the current block among the surrounding reference samples of the current block, and can generate a prediction sample of the current block based on the reference sample.

[0102] For example, (i) a prediction sample can be derived based on the average or interpolation of neighboring reference samples of the current block, and (ii) the prediction sample can be derived based on a reference sample located in a specific (prediction) direction with respect to the prediction sample among the neighboring reference samples of the current block. Case (i) may be called a non-directional mode or non-angular mode, and case (ii) may be called a directional mode or angular mode. Additionally, the prediction sample may be generated through interpolation between the first neighboring sample and the second neighboring sample located in the opposite direction of the prediction direction of the intra prediction mode of the current block relative to the prediction sample of the current block among the neighboring reference samples. The above-described case may be called Linear Interpolation Intra Prediction (LIP). Additionally, a provisional prediction sample of the current block may be derived based on filtered surrounding reference samples, and a prediction sample of the current block may be derived by performing a weighted sum of the provisional prediction sample and at least one reference sample derived according to the intra prediction mode among the existing surrounding reference samples, i.e., unfiltered surrounding reference samples. The above case may be referred to as PDPC (Position dependent intra prediction). Furthermore, intra prediction coding may be performed by selecting the reference sample line with the highest prediction accuracy among the surrounding multiple reference sample lines of the current block, deriving a prediction sample using a reference sample located in the prediction direction of that line, and signaling the reference sample line used to the decoding device. The above case may be referred to as multi-reference line intra prediction (MRL) or MRL-based intra prediction.In addition, the current block may be divided into vertical or horizontal subpartitions to perform intra prediction based on the same intra prediction mode, while neighboring reference samples may be derived and utilized at the subpartition level. That is, in this case, the intra prediction mode for the current block is applied equally to the subpartitions, but by deriving and utilizing neighboring reference samples at the subpartition level, intra prediction performance may be improved depending on the circumstances. This prediction method may be called intra sub-partitions (IPS) or IPS-based intra prediction. Furthermore, when the prediction direction based on the prediction sample points between neighboring reference samples, that is, when the prediction direction points to a fractional sample location, the value of the prediction sample may be derived through the interpolation of multiple reference samples located around the prediction direction (around the fractional sample location).

[0103] The intra prediction methods described above may be referred to as intra prediction types to distinguish them from intra prediction modes. The intra prediction type may be referred to by various terms, such as intra prediction techniques or additional intra prediction modes. For example, the intra prediction type (or additional intra prediction mode, etc.) may include at least one of the aforementioned LIP, PDPC, MRL, and ISP. Information regarding the intra prediction type may be encoded in an encoding device, included in a bitstream, and signaled to a decoding device. The information regarding the intra prediction type may be implemented in various forms, such as flag information indicating whether each intra prediction type is applied, or index information indicating one of several intra prediction types.

[0104] The MPM list for deriving the aforementioned intra prediction mode may be configured differently depending on the intra prediction type. Alternatively, the MPM list may be configured commonly regardless of the intra prediction type.

[0105] FIG. 4 is a diagram showing the search area of ​​intra-template matching prediction in an embodiment according to the present disclosure.

[0106] Intra Template Matching Prediction (IntraTMP) is a special intra prediction mode that copies the optimal prediction block where an L-shaped template matches the current template from the reconstructed portion of the current frame. For a predefined search range, the encoder searches for the template most similar to the current template within the reconstructed portion of the current frame and uses that block as the prediction block. The encoder then signals the use of this mode, and the decoder performs the same prediction operation.

[0107] The prediction signal is generated by matching the L-shaped neighbor of the current block with other blocks in the predefined search area of ​​FIG. 4. The search area consists of the following.

[0108] R1: Current CTU

[0109] R2: Top left CTU

[0110] R3: CTU top

[0111] R4: Left CTU

[0112] The sum of absolute differences (SAD) is used as a cost function.

[0113] Within each region, the decoder searches for the template with the smallest SAD compared to the current frame and uses that block as the prediction block.

[0114] The size of all regions (SearchRange_w, SearchRange_h) is set proportionally to the block size (BlkW, BlkH) and can have a fixed number of SAD comparisons per pixel.

[0115] SearchRange_w = a * BlkW

[0116] SearchRange_h = a * BlkH

[0117] Here, 'a' is a constant that controls the gain / complexity trade-off. For example, 'a' can be 5.

[0118] To speed up the template matching process, the search range of all search areas is subsampled by a factor of two. This allows the number of template matching searches to be reduced by a factor of four. After finding the optimal matching range, a refinement process is performed. Refinement is carried out through a second template matching search centered on the optimal matching range with the reduced range. The reduced range can be defined as min(BlkW, BlkH) / 2.

[0119] The intra template matching tool can be enabled for CUs with a width and height of 64 or less. The maximum CU size for intra template matching is configurable.

[0120] The intra-template matching prediction mode can be signaled at the coding unit level via a dedicated flag when DIMD is not used in the current coding unit.

[0121] FIGS. 5 and 6 are drawings for illustrating a DIMD mode that can be applied to an embodiment according to the present disclosure.

[0122] The Decoder-side intra mode derivation (DIMD) mode can be used by deriving it in the encoder and decoder without directly transmitting intra prediction mode information. First, the horizontal gradient and vertical gradient are obtained from the second neiboring sample column and row, and a Histogram of Gradients (HoG) can be constructed from them.

[0123] The HoG can be configured as shown in Fig. 5. The HoG can be obtained by applying a Sobel filter using an L-shaped row and column of 3 pixels around the current block. If the boundaries of the block exist in different CTUs, they are not used for texture analysis.

[0124] Subsequently, up to five intra modes with the largest histogram amplitude can be selected, and the final prediction block can be constructed by blending the prediction block predicted using these modes with the planar mode. Weights can be derived from the histogram amplitude. Additionally, a DIMD flag is transmitted on a block-by-block basis to check whether DIMD is used.

[0125] Figure 6 is an example of selecting two intra modes with the largest histogram amplitude and then blending the prediction block predicted using these modes with the planar mode to form a final prediction block.

[0126] For a W×H block, if the size of the upper or left histogram is twice the size of the other, the weights of each of the five derived modes are modified. In this case, the weights depend on the position and can be calculated as follows.

[0127] When the upper histogram is twice the size of the left histogram:

[0128] [Formula 1]

[0129]

[0130] When the left histogram is twice the size of the upper histogram:

[0131] [Formula 2]

[0132]

[0133] wDimd here iis the unmodified uniform weight of the selected DIMD, and Δ i can be predefined and set to 10.

[0134] The prediction unit of the encoding / decoding device can derive prediction samples by performing inter-prediction on a block-by-block basis. Inter-prediction may represent a prediction derived in a manner dependent on data elements (e.g., sample values, or motion information, etc.) of picture(s) other than the current picture. When inter-prediction is applied to the current block, a predicted block (prediction sample array) for the current block can be derived based on a reference block (reference block) specified by a motion vector on the reference picture pointed to by the reference picture index. At this time, to reduce the amount of motion information transmitted in the inter-prediction mode, the motion information of the current block can be predicted on a block, sub-block, or sample basis, based on the correlation of motion information between neighboring blocks and the current block. The motion information may include a motion vector and a reference picture index. The motion information may further include information on the inter-prediction type (L0 prediction, L1 prediction, Bi prediction, etc.). When inter-prediction is applied, neighboring blocks may include spatial neighboring blocks existing within the current picture and temporal neighboring blocks existing in the reference picture. The reference picture containing the above reference block and the reference picture containing the above temporal surrounding block may be the same or different. The above temporal surrounding block may be called by names such as collocated reference block, collocated CU (colCU), etc., and the reference picture containing the above temporal surrounding block may be called collocated picture (colPic). For example, a list of motion information candidates may be constructed based on the surrounding blocks of the current block, and a flag or index information indicating which candidate is selected (used) to derive the motion vector and / or reference picture index of the current block may be signaled.Inter-prediction can be performed based on various prediction modes; for example, in the case of skip mode and merge mode, the motion information of the current block may be the same as the motion information of the selected surrounding block. In the case of skip mode, unlike merge mode, a residual signal may not be transmitted. In the case of motion vector prediction (MVP) mode, the motion vector of the selected surrounding block is used as a motion vector predictor, and the motion vector difference can be signaled. In this case, the motion vector of the current block can be derived by using the sum of the motion vector predictor and the motion vector difference.

[0135] The above motion information may include L0 motion information and / or L1 motion information depending on the inter-prediction type (L0 prediction, L1 prediction, Bi prediction, etc.). A motion vector in the L0 direction may be called an L0 motion vector or MVL0, and a motion vector in the L1 direction may be called an L1 motion vector or MVL1. A prediction based on an L0 motion vector may be called an L0 prediction, a prediction based on an L1 motion vector may be called an L1 prediction, and a prediction based on both the L0 motion vector and the L1 motion vector may be called a pair (Bi) prediction. Here, the L0 motion vector may represent a motion vector associated with reference picture list L0 (L0), and the L1 motion vector may represent a motion vector associated with reference picture list L1 (L1). Reference picture list L0 may include pictures that are prior to the current picture in output order as reference pictures, and reference picture list L1 may include pictures that are subsequent to the current picture in output order. The aforementioned previous pictures may be referred to as forward (reference) pictures, and the aforementioned subsequent pictures may be referred to as reverse (reference) pictures. The reference picture list L0 may include additional pictures that are output later than the current picture as reference pictures. In this case, within the reference picture list L0, the previous pictures may be indexed first, and the subsequent pictures may be indexed next. The reference picture list L1 may include additional pictures that are output earlier than the current picture as reference pictures. In this case, within the reference picture list L1, the subsequent pictures may be indexed first, and the previous pictures may be indexed next. Here, the output order may correspond to the POC (picture order count) order.

[0136] Various inter-prediction modes may be used to predict the current block within a picture. For example, various modes such as merge mode, skip mode, motion vector prediction (MVP) mode, affine mode, subblock merge mode, and merge with MVD (MMVD) mode may be used. Decoder-side motion vector refinement (DMVR) mode, adaptive motion vector resolution (AMVR) mode, bi-prediction with CU-level weight (BCW), and bi-directional optical flow (BDOF) may be used as additional or alternative modes. Affine mode may also be referred to as affine motion prediction mode. MVP mode may also be referred to as advanced motion vector prediction (AMVP) mode. In this document, motion information candidates derived by some modes and / or some modes may be included as one of the motion information candidates of other modes. For example, the HMVP candidate may be added as a merge candidate in the above merge / skip mode, or may be added as an MVP candidate in the above MVP mode.

[0137] Prediction mode information indicating the inter-prediction mode of the current block may be signaled from the encoding device to the decoding device. The prediction mode information may be received by the decoding device by being included in a bitstream. The prediction mode information may include index information indicating one of a plurality of candidate modes. Alternatively, the inter-prediction mode may be indicated through hierarchical signaling of flag information. In this case, the prediction mode information may include one or more flags. For example, a skip flag may be signaled to indicate whether the skip mode is applied, and if the skip mode is not applied, a merge flag may be signaled to indicate whether the merge mode is applied, and if the merge mode is not applied, an MVP mode may be applied, or additional flags for further distinction may be signaled. The affine mode may be signaled as an independent mode, or it may be signaled as a mode dependent on the merge mode or MVP mode, etc. For example, the affine mode may include an affine merge mode and an affine MVP mode.

[0138] Inter prediction can be performed using motion information of the current block. The encoding device can derive optimal motion information for the current block through a motion estimation procedure. For example, the encoding device can use the original block within the original picture for the current block to search for similar reference blocks with high correlation in fractional pixel units within a defined search range in the reference picture, thereby deriving motion information. Block similarity can be derived based on the difference between phase-based sample values. For example, block similarity can be calculated based on the SAD between the current block (or the template of the current block) and the reference block (or the template of the reference block). In this case, motion information can be derived based on the reference block with the smallest SAD within the search area. The derived motion information can be signaled to the decoding device according to various methods based on the inter prediction mode.

[0139] When merge mode is applied, the movement information of the current prediction block is not transmitted directly; instead, the movement information of surrounding prediction blocks is used to induce the movement information of the current prediction block. Therefore, the movement information of the current prediction block can be indicated by transmitting flag information indicating that merge mode has been used and a merge index indicating which surrounding prediction blocks were utilized. The above merge mode may be referred to as regular merge mode.

[0140] To perform merge mode, the encoder must search for merge candidate blocks used to derive movement information of the current prediction block. For example, up to five merge candidate blocks may be used, but are not limited thereto. Additionally, the maximum number of merge candidate blocks may be transmitted in the slice header or tile group header, but is not limited thereto. After finding the merge candidate blocks, the encoder may generate a merge candidate list and select the merge candidate block with the smallest cost among them as the final merge candidate block.

[0141] Motion Vector Prediction (MVP) mode may be referred to as Advanced Motion Vector Prediction (AMVP) mode. When MVP mode is applied, a list of motion vector predictor (mvp) candidates can be generated using motion vectors of reconstructed spatial neighbor blocks and / or motion vectors corresponding to temporal neighbor blocks (or Col blocks). That is, motion vectors of reconstructed spatial neighbor blocks and / or motion vectors corresponding to temporal neighbor blocks can be used as motion vector predictor candidates. When paired prediction is applied, a list of mvp candidates for deriving L0 motion information and a list of mvp candidates for deriving L1 motion information can be generated and used separately. The aforementioned prediction information (or information regarding prediction) may include selection information (e.g., an MVP flag or an MVP index) indicating the optimal motion vector predictor candidate selected from among the motion vector predictor candidates included in the list. At this time, the prediction unit can use the above selection information to select a motion vector predictor for the current block from among the motion vector predictor candidates included in the motion vector candidate list. The prediction unit of the encoding device can obtain the motion vector difference (MVD) between the motion vector of the current block and the motion vector predictor, and can encode this and output it in the form of a bitstream. That is, the MVD can be obtained as the value obtained by subtracting the motion vector predictor from the motion vector of the current block. At this time, the prediction unit of the decoding device can obtain the motion vector difference included in the information regarding the prediction, and derive the motion vector of the current block through the addition of the motion vector difference and the motion vector predictor. The prediction unit of the decoding device can obtain or derive a reference picture index indicating a reference picture, etc., from the information regarding the prediction.

[0142] Conventional video coding systems use only a single motion vector to represent the motion of a coding block (using a translation motion model). However, while the above method may represent optimal motion at the block level, it does not represent the optimal motion of each actual pixel; if the optimal motion vector can be determined at the pixel level, coding efficiency can be improved. To this end, the present embodiment describes an affine motion prediction method that encodes using an affine motion model. The affine motion prediction method can represent the motion vector at the pixel level of a block using two, three, or four motion vectors.

[0143] Affine motion models can represent four types of motion (translation, scale, rotate, shear). According to the affine motion prediction method, three types of motion (translation, scale, rotate) can be represented among the motions that affine motion models can represent.

[0144] Combined inter and intra prediction (CIIP) can be applied to the current block. An additional flag (e.g., ciip_flag) indicating whether the Combined Inter / Intra Prediction (CIIP) mode is applied to the current CU may be signaled. For example, when a CU is coded in merge mode, if the CU contains at least 64 luminous samples (i.e., CU width multiplied by CU height is 64 or greater) and both the CU width and CU height are less than 128 luminous samples, an additional flag indicating whether the Combined Inter / Intra Prediction (CIIP) mode is applied to the current CU is signaled. As the name suggests, CIIP prediction combines the inter prediction signal and the intra prediction signal. The inter prediction signal P_inter in CIIP mode is generated using the same inter prediction process applied in normal merge mode, and the intra prediction signal P_intra is generated according to the normal intra prediction process and planar mode. Then, the intra and inter prediction signals are combined using weighted averaging. Here, the weight values ​​are calculated as in Equation 3 according to the coding mode of the top and left adjacent blocks.

[0145] [Formula 3]

[0146]

[0147] Set isIntraTop to 1 if the top adjacent block is available and intra-coded, and set isIntraTop to 0 otherwise.

[0148] Set isIntraLeft to 1 if the left adjacent block is available and intra-coded, and set isIntraLeft to 0 otherwise.

[0149] Here, if (isIntraLeft + isIntraLeft) is 2, set wt to 3.

[0150] Otherwise, if (isIntraLeft + isIntraLeft) is 1, set wt to 2.

[0151] Otherwise, set wt to 1.

[0152] Based on motion information derived according to the prediction mode, a predicted block for the current block can be derived. The predicted block may include predicted samples (predicted sample array) of the current block. If the motion vector of the current block indicates fractional sample units, an interpolation procedure may be performed, through which predicted samples of the current block can be derived based on reference samples of fractional sample units within the reference picture. If Affine inter-prediction is applied to the current block, predicted samples can be generated based on sample / sub-block unit MVs. If paired prediction is applied, predicted samples derived through a weighted sum or weighted average (according to phase) of predicted samples derived based on L0 prediction (i.e., prediction using the reference picture in reference picture list L0 and MVL0) and predicted samples derived based on L1 prediction (i.e., prediction using the reference picture in reference picture list L1 and MVL1) can be used as predicted samples for the current block.

[0153] The present disclosure may be applied as one of the inter-frame prediction technologies, and in particular, relates to a technology involving a combination of inter-frame and intra-frame prediction (CIIP mode or modified CIIP mode). More specifically, the present disclosure relates to a method for generating an intra-frame prediction block that is combined with a block generated by inter-frame prediction. In the present disclosure, the intra-frame prediction block (intra-frame prediction samples) is generated based on the difference value between an intra-frame prediction block using surrounding samples of the current block and an intra-frame prediction block using surrounding samples of the reference block. This method can improve the accuracy of the intra-frame prediction block by using surrounding samples of the reference block as well as surrounding samples of the current block. The present disclosure includes the following.

[0154] 1. A method for generating an intra-frame prediction block in a technique combining inter-frame prediction (representing the same prediction as the previously described inter-prediction and may be used interchangeably with inter-prediction) and intra-frame prediction (representing the same prediction as the previously described intra-prediction and may be used interchangeably with intra-prediction):

[0155] a. Generated based on the difference value between an in-frame prediction block using neighbor samples of the current block and an in-frame prediction block using neighbor samples of the reference block.

[0156] b. Generated based on the difference value between an in-frame prediction block using surrounding samples of a reference block and a block to which lighting compensation parameters have been applied.

[0157] c. Generated based on the difference value between an in-frame prediction block using surrounding samples of the current block and an in-frame prediction block using surrounding samples of a reference block with lighting compensation parameters applied.

[0158] 2. Method for deriving weight values ​​applied to cross-frame prediction blocks and intra-frame prediction blocks in a technique combining cross-frame prediction and intra-frame prediction.

[0159] FIG. 7 is a diagram illustrating a method for decoding image information according to an embodiment of the present disclosure. FIG. 8 is a diagram illustrating an example of a method for deriving prediction samples according to an embodiment of the present disclosure. FIG. 9 is a diagram illustrating an example of a method for deriving prediction samples according to an embodiment of the present disclosure. FIG. 10 is a diagram illustrating an example of a method for deriving prediction samples according to an embodiment of the present disclosure.

[0160] The decoding method (S700) may include the operations described below.

[0161] The terms or names described below (e.g., names of syntax elements or variables, etc.) are merely examples, and the technical features of the present disclosure are not limited to the terms or names described below. For example, the image information described below may include various information according to the embodiments described in the present disclosure and may include information described in at least one of the tables described above.

[0162] The operations described below do not constitute an essential component of the decoding method according to one embodiment, and at least some of the operations described below may be omitted. Furthermore, the operations described below do not constitute a sufficient component of the decoding method according to one embodiment, and the previously described operations may be added.

[0163] The operations described below form a single embodiment integrally with the previously described operations, unless they contradict the previously described operations, and do not form a separate embodiment distinct from the previously described operations. Additionally, the operations of FIGS. 8, 9, and 10 form a single embodiment integrally with the operations of FIGS. 7, and the operations of FIGS. 8, 9, and 10 do not each form a separate embodiment distinct from the previously described operations.

[0164] The decoding method (S700) can be executed by a decoding device including a memory and a processor electrically connected to the memory, for example, by a processor.

[0165] The decoding device can acquire image information (S710).

[0166] For example, the processor of the decoding device may acquire image information including prediction information, or image information including residual information, or image information including prediction information and residual information.

[0167] Image information may be in various forms. For example, image information may be a syntax element or a syntax structure containing one or more syntax elements. Additionally, image information may be a raw byte sequence payload (RBSP) containing one or more syntax elements or one or more syntax structures. Additionally, image information may be a Network Abstraction Layer (NAL) unit containing one or more RBSPs or a bitstream containing one or more NALs.

[0168] Prediction information may include information related to the prediction of coding blocks included in each of the coded pictures. For example, prediction information may include information related to intra-frame prediction, inter-frame prediction, intra-block copy (IBC) prediction, etc. For example, regarding intra-frame prediction, prediction information may further include information related to non-directional mode, directional mode, Matrix-weighted Intra Prediction (MIP), Multi-Reference Line (MRL), or Intra-Sub-Partitions (ISP), etc. For example, the prediction information may further include information related to skip mode, Regular MERGE mode, MMVD (Merge with Motion Vector Difference) mode, CIIP (Combined Inter and Intra Prediction) mode, TRIANGULAR mode, SbTMVP (Sbblock-based Temporal Motion Vector Prediction) mode, AFFINE MERGE mode, Regular AMVP (Regular Advanced Motion Vector Prediction) mode, SMVD (Symmetric MVD) mode, and AFFINE AMVP mode in relation to inter-frame prediction. For example, the prediction information may further include information related to MERGE mode in relation to IBC prediction. For example, the prediction information may further include information related to Block-based Delta Pulse Code Modulation (BDPCM), Palette, etc. in relation to screen content coding.

[0169] Residual information may include residual samples of the coding block included in each of the coded pictures and information related to the processing of the residual samples. For example, residual information may include information related to residual samples (wherein the information related to residual samples is information for deriving residual samples and may be referred to in various ways, such as information related to quantized transform coefficient levels or information related to transform coefficients), information related to quantization parameters (QP), information related to multiple transform kernel selection (MTS), information related to sub-block transform (SBT), information related to low frequency non-separable transform (LFNST), etc.

[0170] The decoding device can derive a prediction mode for the current unit (or current block) within the current picture (S720).

[0171] For example, a processor of a decoding device may derive a combined inter- and intra-prediction (CIIP) mode or a modified CIIP mode for a current unit (or current block) within a current picture based on prediction information included in image information. For example, the prediction information may include information related to the CIIP mode or the modified CIIP mode for the current unit (or current block), and the information related to the CIIP mode or the modified CIIP mode may include CIIP flag information indicating whether the CIIP mode or the modified CIIP mode is applied to the current unit (or current block).

[0172] The processor can determine the application of a CIIP mode or a modified CIIP mode to the current unit based on CIIP flag information for the current unit (or current block).

[0173] The decoding device can derive predicted samples for the current block (S730).

[0174] For example, in a technique that combines cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), the processor of the decoding device can derive cross-frame prediction samples for the current block based on prediction information contained in the image information. In the following, cross-frame prediction samples have the same meaning as cross-frame prediction blocks and may be used interchangeably with cross-frame prediction blocks.

[0175] The processor can derive a reference block for inter-frame prediction based on prediction information included in the image information. The image information may include information regarding inter-frame prediction for the current unit. For example, information regarding inter-frame prediction for the current unit may include information regarding a prediction mode indicating an inter-frame prediction mode (e.g., skip mode, merge mode, or AMVP mode, etc.) and motion information indicating a reference block. For example, the motion information may include information regarding a reference picture indicating a reference picture and information regarding a motion vector indicating a reference block within the reference picture.

[0176] The processor acquires information regarding the inter-frame prediction mode and motion information, and can derive a reference block for inter-frame prediction based on the information regarding the inter-frame prediction mode and motion information. The processor derives a reference picture based on information regarding the reference picture, and can derive a reference block within the reference picture based on information regarding motion vectors.

[0177] For example, the processor may generate a list of motion information candidates based on at least some of the motion information of spatially surrounding blocks of the current block, motion information of temporally surrounding blocks, or historical motion information, and may derive motion information from among the motion information candidates based on information about motion included in the prediction information (e.g., index information).

[0178] The processor can derive inter-frame prediction samples for the current block based on the samples (sample array) included in the reference block of the inter-frame prediction.

[0179] For example, in a technique that combines cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), the processor of the decoding device can generate intra-frame prediction samples for the current block based on prediction information contained in the image information. In the following, intra-frame prediction samples have the same meaning as intra-frame prediction blocks and may be used interchangeably with intra-frame prediction blocks.

[0180] The processor can derive an intra-frame prediction mode for intra-frame prediction based on prediction information included in image information. The prediction information may include information regarding an intra-frame prediction mode for the current block (e.g., planar mode, DC mode, directional mode, DIMD, etc.) and / or information regarding an intra-frame prediction mode for the reference block of the inter-frame prediction mode (e.g., planar mode, DC mode, directional mode, DIMD, etc.). The intra-frame prediction mode for the current block may be different from or the same as the intra-frame prediction mode for the reference block.

[0181] The processor can acquire an in-frame prediction mode for the current block based on prediction information, and generate in-frame prediction samples of the current block based on surrounding samples of the current block according to the in-frame prediction mode for the current block. Additionally, the processor can acquire an in-frame prediction mode for the reference block of inter-frame prediction based on prediction information, and generate in-frame prediction samples of the reference block based on surrounding samples of the reference block according to the in-frame prediction mode for the reference block.

[0182] The processor generates in-frame prediction samples based on at least one of in-frame prediction samples based on surrounding samples of the current block and in-frame prediction samples based on surrounding samples of the reference block, and can generate prediction samples for the current block based on in-frame prediction samples and inter-frame prediction samples. In the following, prediction samples have the same meaning as prediction blocks and may be used interchangeably with prediction blocks.

[0183] In other words, deriving prediction samples for the current block may include deriving cross-frame prediction samples for the current block based on a reference block; deriving intra-frame prediction samples for the current block based on surrounding samples of the reference block; and deriving prediction samples for the current block based on the cross-frame prediction samples and the intra-frame prediction samples.

[0184] The specific method for generating prediction samples for the current block is explained in detail below.

[0185] The in-screen prediction samples for the above current block can be derived based on the difference values ​​between the first in-screen prediction samples derived based on the surrounding samples of the above current block and the second in-screen prediction samples derived based on the surrounding samples of the above reference block.

[0186] The above prediction information may include at least one of information regarding a prediction mode within a first screen for deriving prediction samples within the first screen and information regarding a prediction mode within a second screen for deriving prediction samples within the second screen.

[0187] The prediction mode within the first screen above may be different from the prediction mode within the second screen above.

[0188] The first intra-frame prediction mode and the second intra-frame prediction mode may each include Decoder-side intra-mode derivation (DIMD).

[0189] For example, in a technique combining cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), an intra-frame prediction block may be generated based on the difference value between an intra-frame prediction block using surrounding samples of the current block and an intra-frame prediction block using surrounding samples of the reference block. An embodiment according to the present disclosure may be applied to a P-slice or B-slice that performs intra-frame prediction using a reference image.

[0190] Generating a prediction block by combining cross-frame prediction and intra-frame prediction according to an embodiment of the present disclosure may include steps illustrated in FIG. 8. The operations described in FIG. 8 do not constitute an essential component of the method for generating a prediction block according to an embodiment, and at least some of the operations described in FIG. 8 may be omitted. Furthermore, the operations described in FIG. 8 do not constitute a sufficient component of the method for generating a prediction block according to an embodiment, and other previously described operations may be added. Moreover, the operations described in FIG. 8 form an integral embodiment with the previously described components and operations, provided they do not contradict the previously described components and operations, and do not form a separate embodiment distinct from the previously described components and operations.

[0191] The step (S810) of generating an inter-frame prediction block (inter-frame prediction samples) through motion compensation is to perform motion compensation on a reference image selected from a reference image index and a motion vector derived by a method of a general inter-frame prediction mode such as (A) MVP mode, merge mode, or skip mode, thereby generating a prediction block (P Inter Includes generating ).

[0192] The step (S820) of generating an in-frame prediction block (in-frame prediction samples) using surrounding samples of the current block is to generate an in-frame prediction block (P) corresponding to the current block using restored samples adjacent to the current block. IntraC It includes generating an intra-frame prediction mode for generating an intra-frame prediction block. The intra-frame prediction mode for generating an intra-frame prediction block may include predefined modes such as planar mode, DC mode, or arbitrary directional mode. Alternatively, the processor may derive an intra-frame prediction mode based on surrounding samples of the current block in the decoder using the previously described DIMD method. Alternatively, a candidate list having a specific intra-frame prediction mode for the current block may be generated, and one mode may be explicitly transmitted.

[0193] The step (S830) of generating an in-frame prediction block (in-frame prediction samples) using surrounding samples of a reference block is to generate an in-frame prediction block (P) corresponding to the reference block using surrounding samples adjacent to the reference block. IntraRIt includes generating ). The intra-frame prediction mode for generating an intra-frame prediction block may include predefined modes such as planar mode, DC mode, or arbitrary directional mode. Alternatively, the processor may derive an intra-frame prediction mode in the decoder using the DIMD method with the surrounding samples of the reference block. That is, the intra-frame prediction mode for the reference block may be derived independently of the intra-frame prediction mode for the current block. Alternatively, a constraint may be added requiring that the intra-frame prediction mode for the reference block must be identical or similar to the intra-frame prediction mode for the current block. For example, the processor may use the same mode obtained using the DIMD method based on the surrounding samples of the current block. As another example, the processor may apply a mode based on the DIMD method by restricting it to cases where the modes derived by the DIMD method for the current block and the reference block, respectively, are identical or similar. Alternatively, the processor may use the intra-frame prediction mode explicitly transmitted for the current block as is, or the intra-frame prediction mode for the reference block may be explicitly transmitted separately.

[0194] The step (S840) of calculating the difference value (final in-frame prediction samples) of the in-frame prediction block using surrounding samples of the current block and the in-frame prediction block using surrounding samples of the reference block is a pixel position-based difference value block (P) between the in-frame prediction block generated in the step of generating the in-frame prediction block using surrounding samples of the current block and the in-frame prediction block generated in the step of generating the in-frame prediction block using surrounding samples of the reference block. IntraD Includes generating ). The difference value block can be calculated as in Formula 4.

[0195] [Equation 4]

[0196]

[0197] Combining the difference values ​​between the cross-frame prediction block (cross-frame prediction samples) and the intra-frame prediction block (final intra-frame prediction samples) (P Comb The step (S850) includes combining the difference value block generated in the step of calculating the difference value of the in-frame prediction block using the surrounding samples of the in-frame prediction block and the reference block using the surrounding samples of the current block and the in-frame prediction block generated in the step of generating the in-frame prediction block through motion compensation. The final prediction block to be combined can be calculated as in Equation 5.

[0198] [Formula 5]

[0199]

[0200] Here, the weight value γ can control the magnitude of reflecting the difference value of an in-frame prediction block to an inter-frame prediction block, and can be expressed as a real value between 0 and 1. The weight value may use a value predefined as the same in the encoder and decoder, or the value may be explicitly transmitted to the decoder, or derived and used in the decoder, or determined by a combination of the above methods. When a method derived in the decoder is used to determine the weight value, or when a combination of other methods and a method derived in the decoder is used, one or more pieces of information may be utilized as conditions for derivation, such as the size of the current block, the shape of the current block, the quantization parameter (QP) value, the distribution of in-frame prediction blocks in neighboring blocks, in-frame prediction mode information derived in the decoder, the statistical (mean, variance) distribution of the difference value block, and the frequency of use of neighboring CIIP modes (or modified CIIP modes). Furthermore, even if the information is not exemplified herein, any information accessible during the process of decoding the current block in the decoder may be utilized. Additionally, weight values ​​can have different values ​​within a block. For example, smaller weight values ​​may be assigned as one moves further away from the left and right edges of the block. As another example, depending on the directionality of the in-screen prediction mode, the block may be divided into regions, and different weight values ​​may be set for each region. If the block's weight value is not 1, P Comb The value can be changed to an integer form.

[0201] In the step of combining the difference values ​​(final intra-frame prediction samples) of the above-mentioned inter-frame prediction blocks (inter-frame prediction samples) and intra-frame prediction blocks, the combination of the inter-frame prediction blocks and intra-frame prediction blocks may be applied limited to a specific area of ​​the current block. More specifically, a difference value block (P) generated using intra-frame prediction blocks for the current block and the reference block. IntraD) can be applied only to a part of the block. For example, it can be applied only to the left and right edges of the block. Alternatively, similar to the weight derivation method above, the region can be determined using information derivable from the decoder. The derivation region can be derived independently of the weight value derivation process. Alternatively, the region derivation can be considered together with the weight value derivation process without a separate derivation process for the combination region. For example, a region within the current block where the in-screen prediction block is not combined can have its weight value derived to 0.

[0202] The in-screen prediction samples for the current block can be derived based on the difference values ​​between the third in-screen prediction samples derived based on the surrounding samples of the reference block and the fourth in-screen prediction samples derived based on the third in-screen prediction samples with brightness compensation applied.

[0203] The luminance compensation parameter for the above luminance compensation can be determined such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized.

[0204] For example, in a technique combining cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), an intra-frame prediction block may be generated based on the difference value between an intra-frame prediction block using surrounding samples of a reference block and a prediction block to which a luminance compensation parameter is applied. An embodiment according to the present disclosure may be applied to a P-slice or B-slice that performs intra-frame prediction using a reference image.

[0205] Generating a prediction block generated by a combination of inter-frame prediction and intra-frame prediction according to an embodiment of the present disclosure may include the steps illustrated in FIG. 9. The operations described in FIG. 9 do not constitute an essential component of the method for generating a prediction block according to an embodiment, and at least some of the operations described in FIG. 9 may be omitted. Furthermore, the operations described in FIG. 9 do not constitute a sufficient component of the method for generating a prediction block according to an embodiment, and other previously described operations may be added. Moreover, the operations described in FIG. 9 form an integral embodiment with the previously described components and operations, provided they do not contradict the previously described components and operations, and do not form a separate embodiment distinct from the previously described components and operations.

[0206] The step (S910) of generating an inter-frame prediction block (inter-frame prediction samples) through motion compensation is to perform motion compensation on a reference image selected from a reference image index and a motion vector derived by a method of a general inter-frame prediction mode such as (A) MVP mode, merge mode, or skip mode, thereby generating a prediction block (P Inter Includes generating ).

[0207] The step (S920) of generating an in-frame prediction block (in-frame prediction samples) using surrounding samples of a reference block is to generate an in-frame prediction block (P) corresponding to the current block using surrounding samples adjacent to the reference block. IntraRIt includes generating ). The intra-frame prediction mode for intra-frame prediction may include predefined modes such as planar mode, DC mode, or arbitrary directional mode. Alternatively, the processor may derive an intra-frame prediction mode based on surrounding samples of the reference block in the decoder using the previously described DIMD method. Alternatively, a candidate list having a specific intra-frame prediction mode for the reference block may be generated, and one mode may be explicitly transmitted.

[0208] The luminance compensation parameter derivation step (S930) involves the surrounding samples (N) of the current block. C ) and surrounding samples (N) of the reference block R It includes obtaining parameters that compensate for luminance differences using ). For example, when using a first-order linear model, two parameters (scale value α and offset value β) can be derived as shown in Equation 6.

[0209] [Equation 6]

[0210]

[0211] The above parameters α and β are N C and N R It can be derived into a value that minimizes the sum of squared differences (SSE) or the mean sum of squared differences (MSE). The above linear model and parameter derivation method is one example, and other forms of models and parameters, such as offset models, high-dimensional linear models, convolutional models, etc., may be used. In addition, multiple models rather than a single model may be used.

[0212] The step of applying luminance compensation parameters to an in-frame prediction block (S940) includes applying the luminance compensation parameters derived in the luminance compensation parameter derivation step to the in-frame prediction block generated in the step of generating an in-frame prediction block using surrounding samples of a reference block. An in-frame prediction block (P') corrected by applying two parameters (scale value α and offset value β) as in Equation 7. IntraR ) is generated.

[0213] [Formula 7]

[0214]

[0215] The step (S950) of calculating the difference value (final in-frame prediction samples) between an in-frame prediction block using surrounding samples of a reference block and a prediction block to which luminance compensation is applied is a pixel position-specific difference value block (P) between the in-frame prediction block generated in the step of generating an in-frame prediction block using surrounding samples of the reference block and the prediction block to which luminance compensation parameters are applied in the step of applying luminance compensation parameters to the in-frame prediction block. IntraD Includes generating ). The difference value block can be calculated as in Formula 8.

[0216] [Equation 8]

[0217]

[0218] Combining the difference values ​​between the cross-frame prediction block (cross-frame prediction samples) and the intra-frame prediction block (final intra-frame prediction samples) (P Comb The step (S960) of ) combines the inter-frame prediction block generated in the step of generating an inter-frame prediction block through motion compensation with the corrected intra-frame prediction block generated in the step of applying luminance compensation parameters to the intra-frame prediction block. The final prediction block to be combined can be calculated together as in Equation 9.

[0219] [Formula 9]

[0220]

[0221] Here, the weight value γ can control the magnitude of reflecting the difference value of an in-frame prediction block to an inter-frame prediction block, and can be expressed as a real value between 0 and 1. The weight value may use a value predefined as the same in the encoder and decoder, or the value may be explicitly transmitted to the decoder, or derived and used in the decoder, or determined by a combination of the above methods. When a method derived in the decoder is used to determine the weight value, or when a combination of other methods and a method derived in the decoder is used, one or more pieces of information may be utilized as conditions for derivation, such as the size of the current block, the shape of the current block, the quantization parameter (QP) value, the distribution of in-frame prediction blocks in neighboring blocks, in-frame prediction mode information derived in the decoder, the statistical (mean, variance) distribution of the difference value block, and the frequency of use of neighboring CIIP modes (or modified CIIP modes). Furthermore, even if the information is not exemplified herein, any information accessible during the process of decoding the current block in the decoder may be utilized. Additionally, weight values ​​can have different values ​​within a block. For example, smaller weight values ​​may be assigned as one moves further away from the left and right edges of the block. As another example, depending on the directionality of the in-screen prediction mode, the area within the block may be divided, and different weight values ​​may be set for each area. If the weight value of the block is not 1, the PComb value may be changed to an integer form.

[0222] In the step of combining the difference values ​​(final intra-frame prediction samples) of the above-mentioned inter-frame prediction blocks (inter-frame prediction samples) and intra-frame prediction blocks, the combination of the inter-frame prediction blocks and intra-frame prediction blocks may be applied limited to a specific area of ​​the current block. More specifically, a difference value block (P) generated using intra-frame prediction blocks for the current block and the reference block. IntraD ) can be applied only to a part of the block. For example, it can be applied only to the left and right edges of the block. Alternatively, similar to the weight derivation method above, the region can be determined using information derivable from the decoder. The derivation region can be derived independently of the weight value derivation process. Alternatively, the region derivation can be considered together with the weight value derivation process without a separate derivation process for the combination region. For example, a region within the current block where the in-screen prediction block is not combined can have its weight value derived to 0.

[0223] The in-screen prediction samples for the current block can be derived based on the difference values ​​between the fifth in-screen prediction samples derived based on the surrounding samples of the reference block and the sixth in-screen prediction samples derived based on the surrounding samples of the reference block to which luminance compensation is applied.

[0224] The luminance compensation parameter for the above luminance compensation can be determined such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized.

[0225] The above prediction information may include at least one of information regarding a prediction mode within the fifth screen for generating prediction samples within the fifth screen and information regarding a prediction mode within the sixth screen for generating prediction samples within the sixth screen.

[0226] The prediction mode within the fifth screen above may be the same as the prediction mode within the sixth screen above.

[0227] For example, in a technique combining inter-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), an intra-frame prediction block may be generated based on the difference value between an intra-frame prediction block using surrounding samples of a reference block and an intra-frame prediction block generated by applying a luminance compensation parameter to surrounding samples of a reference block. An embodiment according to the present disclosure may be applied to a P-slice or B-slice that performs intra-frame prediction using a reference image.

[0228] Generating a prediction block by combining cross-frame prediction and intra-frame prediction according to one embodiment of the present disclosure may include the steps illustrated in FIG. 10. The operations described in FIG. 10 do not constitute an essential component of the method for generating a prediction block according to one embodiment, and at least some of the operations described in FIG. 10 may be omitted. Furthermore, the operations described in FIG. 10 do not constitute a sufficient component of the method for generating a prediction block according to one embodiment, and other previously described operations may be added. Moreover, the operations described in FIG. 10 form an integral embodiment with the previously described components and operations, provided they do not contradict the previously described components and operations, and do not form a separate embodiment distinct from the previously described components and operations.

[0229] The step (S1010) of generating an inter-frame prediction block (inter-frame prediction samples) through motion compensation is to perform motion compensation on a reference image selected from a reference image index and a motion vector derived by a method of a general inter-frame prediction mode such as (A) MVP mode, merge mode, or skip mode, thereby generating a prediction block (P Inter Includes generating ).

[0230] The step (S1020) of generating an in-frame prediction block (in-frame prediction samples) using surrounding samples of a reference block is to generate an in-frame prediction block (P) corresponding to the current block using surrounding samples adjacent to the reference block. IntraR It includes generating ). The intra-frame prediction mode for intra-frame prediction may include predefined modes such as planar mode, DC mode, or arbitrary directional mode. Alternatively, the processor may derive an intra-frame prediction mode based on surrounding samples of the reference block in the decoder using the previously described DIMD method. Alternatively, a candidate list having a specific intra-frame prediction mode for the reference block may be generated, and one mode may be explicitly transmitted.

[0231] The luminance compensation parameter derivation step (S1030) involves the surrounding samples (N) of the current block. C ) and surrounding samples (N) of the reference block R It includes obtaining parameters that compensate for luminance differences using ). For example, when using a first-order linear model, two parameters (scale value α and offset value β) can be derived as shown in Equation 10.

[0232] [Equation 10]

[0233]

[0234] The above parameters α and β are N C and N R It can be derived into a value that minimizes the sum of squared differences (SSE) or the mean sum of squared differences (MSE). The above linear model and parameter derivation method is one example, and other forms of models and parameters, such as offset models, high-dimensional linear models, convolutional models, etc., may be used. In addition, multiple models rather than a single model may be used.

[0235] The step of applying luminance compensation parameters to surrounding samples of a reference block (S1040) includes applying luminance compensation parameters derived in the luminance compensation parameter derivation step to surrounding samples of the reference block. Corrected surrounding samples (N') by applying two parameters (scale value α and offset value β) as in Equation 11. R ) is generated.

[0236] [Equation 11]

[0237]

[0238] The step (S1050) of generating an in-frame prediction block (in-frame prediction samples) using surrounding samples to which luminance compensation has been applied to a reference block is to generate an in-frame prediction block (P') using surrounding samples to which luminance compensation has been applied, which is generated in the step of applying the luminance compensation parameter to surrounding samples of the reference block. IntraR Includes generating ).

[0239] The intra-frame prediction mode for intra-frame prediction may include predefined modes such as planar mode, DC mode, or arbitrary directional mode. Alternatively, the processor may derive an intra-frame prediction mode based on surrounding samples of a reference block in the decoder using the previously described DIMD method. Alternatively, a candidate list having a specific intra-frame prediction mode for a reference block may be generated, and one mode may be explicitly transmitted. The intra-frame prediction mode for generating an intra-frame prediction block may use the same intra-frame prediction mode that was used in the step of generating the intra-frame prediction block using surrounding samples of the reference block.

[0240] The step (S1060) of calculating the difference value (final in-frame prediction samples) between the in-frame prediction block using surrounding samples of the reference block and the in-frame prediction block using surrounding samples to which luminance compensation of the reference block is applied is a pixel position-specific difference value block (P) between the prediction block generated in the step of generating the in-frame prediction block using surrounding samples of the reference block and the prediction block generated in the step of generating the in-frame prediction block using surrounding samples to which luminance compensation of the reference block is applied. IntraD Includes generating ). The difference value block can be calculated as in the following Equation 12.

[0241] [Equation 12]

[0242]

[0243] Combining the difference values ​​between the cross-frame prediction block (cross-frame prediction samples) and the intra-frame prediction block (final intra-frame prediction samples) (P Comb The step (S1070) of ) combines the inter-frame prediction block generated in the step of generating an inter-frame prediction block through motion compensation with the corrected intra-frame prediction block generated in the step of applying luminance compensation parameters to the intra-frame prediction block. The final prediction block to be combined can be calculated as in Equation 13.

[0244] [Equation 13]

[0245]

[0246] Here, the weight value γ can control the magnitude of reflecting the difference value of an in-frame prediction block to an inter-frame prediction block, and can be expressed as a real value between 0 and 1. The weight value may use a value predefined as the same in the encoder and decoder, or the value may be explicitly transmitted to the decoder, or derived and used in the decoder, or determined by a combination of the above methods. When a method derived in the decoder is used to determine the weight value, or when a combination of other methods and a method derived in the decoder is used, one or more pieces of information may be utilized as conditions for derivation, such as the size of the current block, the shape of the current block, the quantization parameter (QP) value, the distribution of in-frame prediction blocks in neighboring blocks, in-frame prediction mode information derived in the decoder, the statistical (mean, variance) distribution of the difference value block, and the frequency of use of neighboring CIIP modes (or modified CIIP modes). Furthermore, even if the information is not exemplified herein, any information accessible during the process of decoding the current block in the decoder may be utilized. Additionally, weight values ​​can have different values ​​within a block. For example, smaller weight values ​​may be assigned as one moves further away from the left and right edges of the block. As another example, depending on the directionality of the in-screen prediction mode, the area within the block may be divided, and different weight values ​​may be set for each area. If the weight value of the block is not 1, the PComb value may be changed to an integer form.

[0247] In the step of combining the difference values ​​(final intra-frame prediction samples) of the above-mentioned inter-frame prediction blocks (inter-frame prediction samples) and intra-frame prediction blocks, the combination of the inter-frame prediction blocks and intra-frame prediction blocks may be applied limited to a specific area of ​​the current block. More specifically, a difference value block (P) generated using intra-frame prediction blocks for the current block and the reference block. IntraD) can be applied only to a part of the block. For example, it can be applied only to the left and right edges of the block. Alternatively, similar to the weight derivation method above, the region can be determined using information derivable from the decoder. The derivation region can be derived independently of the weight value derivation process. Alternatively, the region derivation can be considered together with the weight value derivation process without a separate derivation process for the combination region. For example, a region within the current block where the in-screen prediction block is not combined can have its weight value derived to 0.

[0248] The prediction samples for the current block above can be derived based on the weighted sum between the inter-frame prediction samples and the intra-frame prediction samples.

[0249] The weights of the weighted sum between the prediction samples between the screens and the prediction samples within the screen may vary depending on the area within the current block.

[0250] The weight of the weighted sum between the above-mentioned cross-frame prediction samples and the above-mentioned intra-frame prediction samples can be derived based on whether intra-frame prediction is applied to the surrounding blocks of the above-mentioned current block.

[0251] For example, a method for determining weight values ​​for cross-frame prediction blocks and intra-frame prediction blocks in a technique that combines cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode) is described. The weight values ​​applied to the final prediction blocks being combined can be expressed as shown in Equation 14.

[0252] [Formula 14]

[0253]

[0254] The weight value δ for the cross-frame prediction block can be fixed at 1, as previously explained. Alternatively, the weight value δ can be used adaptively according to a predefined formula such as (2 - γ), depending on the value γ for the intra-frame prediction. For example, when the weight value γ is 1, P IntraD and P IntrerThey can be combined in a 1:1 ratio. For example, if the sum of weight value γ and weight value δ is greater than 1, the sum of weight value γ and weight value δ can be normalized to 1. For example, if weight value γ is 0, weight value δ can be 1. Alternatively, weight value γ and / or weight value δ for the in-frame prediction block may be predefined and fixed as one of the values ​​listed in the following list {0, 1 / 8, 2 / 8, 3 / 8, 4 / 8, 5 / 8, 6 / 8, 7 / 8, 1}. Alternatively, weight values ​​may be determined variably based on the encoding information of the current block and / or the encoding information of surrounding blocks. Available encoding information of surrounding blocks may include whether in-frame prediction is used, whether luminance compensation is used, etc. For example, when both the left and upper surrounding blocks are encoded as in-frame prediction blocks, a weight value γ of 6 / 8 is used; when only one of the left or upper surrounding blocks is encoded as an in-frame prediction block, a weight value γ of 4 / 8 is used; and when neither the left nor upper surrounding blocks are encoded as in-frame prediction blocks, a weight value γ of 2 / 8 may be used. Similarly, when using luminance compensation information, when both the left and upper surrounding blocks are luminance compensated, a weight value γ of 6 / 8 is used; when only one of the left or upper surrounding blocks is luminance compensated, a weight value γ of 4 / 8 is used; and when neither the left nor upper surrounding blocks are luminance compensated, a weight value γ of 2 / 8 may be used. The weight values ​​according to the weight list and surrounding block encoding information listed above are one example, and the weight values ​​may be real values ​​between 0 and 1. In addition, in addition to the encoding information of the surrounding blocks listed above, information from other surrounding blocks available during the process of decoding the current block in the decoder may also be used.

[0255] As another example, a predefined set of fixed values ​​for weight values ​​γ and δ is defined in the form of a list, and an index value corresponding to the optimal weight value can be transmitted.

[0256] The decoding device can derive residual samples for the current block (S740).

[0257] For example, the processor of a decoding device can derive residual samples based on residual information included in image information. The processor can derive quantized transform coefficient levels based on residual information. The processor can derive transform coefficients based on performing inverse quantization on the quantized transform coefficient levels. The processor can derive residual samples based on performing a second transform and a first transform on the transform coefficients.

[0258] The decoding device can derive recovery samples for the current block (S750).

[0259] For example, the processor of the decoding device may derive a reconstructed sample array based on predicted samples and / or residual samples for the current block. The processor may derive a reconstructed sample array based on predicted samples for the current block, a reconstructed sample array based on residual samples for the current block, or a reconstructed sample array based on the sum of predicted samples and residual samples for the current block.

[0260] As described above, the decoding device may use surrounding samples of the reference block used for inter-frame prediction to derive intra-frame prediction samples in CIIP mode or modified CIIP mode. For example, the decoding device may derive difference values ​​between intra-frame prediction samples derived based on the surrounding samples of the reference block and intra-frame prediction samples derived based on the surrounding samples of the current block (or surrounding samples of the reference block to which luminance compensation has been applied), and derive prediction samples based on the weighted sum between the difference values ​​and the inter-frame prediction samples.

[0261] Accordingly, prediction samples reflecting the luminance difference between the reference block and the current block used for inter-frame prediction can be derived. In other words, the texture of the current block is predicted through inter-frame prediction samples using the reference block, and the luminance difference between the reference block and the current block can be reflected through the difference values ​​between the intra-frame prediction samples based on the reference block's surrounding samples and the intra-frame prediction samples based on the current block's surrounding samples (or the reference block's surrounding samples with luminance compensation applied). As a result, the texture of the reference block and prediction samples reflecting the luminance difference between the reference block and the current block can be derived, and the prediction performance of the prediction samples can be significantly improved.

[0262] Based on the above, the coding efficiency of the coding system can be significantly improved.

[0263] In addition, the data transmission efficiency of the coding system can be significantly improved.

[0264] FIG. 11 is a drawing illustrating a method for encoding image information according to one embodiment of the present disclosure.

[0265] The encoding method (S1100) may include operations described below.

[0266] The terms or names described below (e.g., names of syntax elements or variables, etc.) are merely examples, and the technical features of the present disclosure are not limited to the terms or names described below. For example, the image information described below may include various information according to the embodiments described in the present disclosure and may include information described in at least one of the tables described above.

[0267] The operations described below do not constitute an essential component of the encoding method according to one embodiment, and at least some of the operations described below may be omitted. Furthermore, the operations described below do not constitute a sufficient component of the encoding method according to one embodiment, and the previously described operations may be added.

[0268] The operations described below form a single embodiment integrally with the previously described operations, unless they contradict the previously described operations, and do not form a separate embodiment distinct from the previously described operations. Additionally, the operations of FIGS. 8, 9, and 10 form a single embodiment integrally with the operations of FIGS. 11, and the operations of FIGS. 8, 9, and 10 do not each form a separate embodiment distinct from the others.

[0269] The encoding method (S1100) can be executed by an encoding device including a memory and a processor electrically connected to the memory, for example, by a processor.

[0270] The encoding device can determine the prediction mode for the current unit (or current block) within the current picture (S1110).

[0271] For example, the processor of the encoding device can compare the Rate Distortion (RD) cost for various prediction modes included in inter-frame prediction and intra-frame prediction, and based on the RD cost, determine a combined inter and intra prediction (CIIP) mode or a modified CIIP mode for the current unit in the current picture.

[0272] Prediction may include various prediction methods, such as intra-frame prediction, inter-frame prediction, and intra-block copy (IBC) prediction. Regarding intra-frame prediction, there are various intra-frame prediction modes (or prediction types), including non-directional prediction mode, directional prediction mode, Matrix-weighted Intra Prediction (MIP), Multi-Reference Line (MRL), or Intra-Sub-Partitions (ISP). In addition, regarding inter-frame prediction, there are various inter-frame prediction modes (or prediction types), including skip mode, regular MERGE mode, MMVD (Merge with Motion Vector Difference) mode, CIIP (Combined Inter and Intra Prediction) mode, TRIANGULAR mode, SbTMVP (Sbblock-based Temporal Motion Vector Prediction) mode, AFFINE MERGE mode, regular AMVP (Regular Advanced Motion Vector Prediction) mode, SMVD (Symmetric MVD) mode, and AFFINE AMVP mode.

[0273] The processor can compare the Rate Distortion (RD) cost for intra-frame prediction modes and / or inter-frame prediction modes, and determine the CIIP mode or modified CIIP mode for the current unit in the current picture based on the RD cost.

[0274] The encoding device can derive predicted samples for the current block (S1120).

[0275] For example, in a technique that combines cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), the processor of the encoding device can derive cross-frame prediction samples for the current block based on the RD cost. In the following, cross-frame prediction samples have the same meaning as cross-frame prediction blocks and can be used interchangeably with cross-frame prediction blocks.

[0276] The processor can compare the RD cost for various inter-frame prediction modes (or prediction types) and determine the inter-frame prediction mode (or prediction type) based on the RD cost. Inter-frame prediction modes (or prediction types) may include skip mode, regular MERGE mode, MMVD (Merge with Motion Vector Difference) mode, TRIANGULAR mode, SbTMVP (Sbblock-based Temporal Motion Vector Prediction) mode, AFFINE MERGE mode, regular AMVP (Regular Advanced Motion Vector Prediction) mode, SMVD (Symmetric MVD) mode, and AFFINE AMVP mode.

[0277] The processor can derive a reference block for inter-frame prediction based on the derived inter-frame prediction mode and RD cost. Based on the derived reference block, the processor can generate image information containing information regarding inter-frame prediction for the current unit. For example, the information regarding inter-frame prediction for the current unit may include information regarding a prediction mode indicating an inter-frame prediction mode (e.g., skip mode, merge mode, or AMVP mode, etc.) and motion information indicating a reference block. For example, the motion information may include information regarding a reference picture indicating a reference picture and information regarding a motion vector indicating a reference block within the reference picture.

[0278] The processor can derive a reference block for inter-frame prediction and generate information regarding the inter-frame prediction mode and motion information based on the derived reference block. For example, the processor can generate information regarding a reference picture based on a reference picture and generate information regarding motion vectors based on the reference block within the reference picture.

[0279] For example, the processor may generate a list of motion information candidates based on at least some of the motion information candidates of spatial neighbor blocks of the current block, motion information candidates of temporal neighbor blocks, or historical motion information candidates, and may generate information about motion indicating motion information (e.g., index information) among the motion information candidates.

[0280] The processor can derive inter-frame prediction samples for the current block based on the samples (sample array) included in the reference block of the inter-frame prediction.

[0281] For example, in a technique that combines cross-frame prediction and intra-frame prediction (e.g., CIIP mode or modified CIIP mode), the processor of the encoding device can generate intra-frame prediction samples for the current block based on the RD cost. In the following, intra-frame prediction samples have the same meaning as intra-frame prediction blocks and can be used interchangeably with intra-frame prediction blocks.

[0282] The processor can derive an intra-frame prediction mode for intra-frame prediction based on the RD cost. The processor can generate image information containing information regarding intra-frame prediction for intra-frame prediction. For example, the information regarding intra-frame prediction may include information regarding an intra-frame prediction mode for the current block (e.g., planar mode, DC mode, directional mode, DIMD, etc.) and / or information regarding an intra-frame prediction mode for the reference block of the inter-frame prediction mode (e.g., planar mode, DC mode, directional mode, DIMD, etc.). The intra-frame prediction mode for the current block may be different from or the same as the intra-frame prediction mode for the reference block.

[0283] The processor derives an intra-frame prediction mode for the current block based on the RD cost and can generate intra-frame prediction samples based on surrounding samples of the current block according to the intra-frame prediction mode for the current block. Additionally, the processor obtains an intra-frame prediction mode for the reference block of inter-frame prediction based on the RD cost and can generate intra-frame prediction samples based on surrounding samples of the reference block according to the intra-frame prediction mode for the reference block.

[0284] The processor generates in-frame prediction samples based on at least one of in-frame prediction samples based on surrounding samples of the current block and in-frame prediction samples based on surrounding samples of the reference block, and can generate prediction samples for the current block based on in-frame prediction samples and inter-frame prediction samples. In the following, prediction samples have the same meaning as prediction blocks and may be used interchangeably with prediction blocks.

[0285] Thus, deriving prediction samples for the current block may include deriving cross-frame prediction samples for the current block based on a reference block; deriving intra-frame prediction samples for the current block based on surrounding samples of the reference block; and deriving prediction samples for the current block based on the cross-frame prediction samples and the intra-frame prediction samples.

[0286] The in-screen prediction samples for the above current block can be derived based on the difference values ​​between the first in-screen prediction samples derived based on the surrounding samples of the above current block and the second in-screen prediction samples derived based on the surrounding samples of the above reference block.

[0287] The above prediction information may include at least one of information regarding a prediction mode within a first screen for deriving prediction samples within the first screen and information regarding a prediction mode within a second screen for deriving prediction samples within the second screen.

[0288] The prediction mode within the first screen above may be different from the prediction mode within the second screen above.

[0289] The first intra-frame prediction mode and the second intra-frame prediction mode may each include Decoder-side intra-mode derivation (DIMD).

[0290] The specific method for generating prediction samples for the current block may be the same as the method for deriving prediction samples for the current block of S810 to S850 of the operation described above in FIG. 8.

[0291] For example, the processor performs motion compensation to predict the block (P Inter Generate ) (S810), and using a restored sample adjacent to the current block, generate an in-screen prediction block (P) corresponding to the current block. IntraC ) is generated (S820), and an in-screen prediction block (P) corresponding to the reference block is generated using surrounding samples adjacent to the reference block. IntraR ) is generated (S830), and a block of pixel positional difference values ​​(P) between the in-frame prediction block generated in the step of generating an in-frame prediction block using surrounding samples of the current block and the in-frame prediction block generated in the step of generating an in-frame prediction block using surrounding samples of the reference block. IntraD (S840) is generated, and the difference value block generated in the step of calculating the difference value of the in-frame prediction block using the surrounding samples of the in-frame prediction block and the reference block using the surrounding samples of the current block and the in-frame prediction block generated in the step of generating the in-frame prediction block through motion compensation can be combined (S850).

[0292] The in-screen prediction samples for the current block can be derived based on the difference values ​​between the third in-screen prediction samples derived based on the surrounding samples of the reference block and the fourth in-screen prediction samples derived based on the third in-screen prediction samples with brightness compensation applied.

[0293] The luminance compensation parameter for the above luminance compensation can be determined such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized.

[0294] The specific method for generating prediction samples for the current block may be the same as the method for deriving prediction samples for the current block of S910 to S960 of the operation described above in Fig. 9.

[0295] For example, the processor performs motion compensation to predict the block (P Inter Generate ) (S910), and use surrounding samples adjacent to the reference block to generate an in-screen prediction block (P) corresponding to the current block. IntraR Generates ) (S920), and surrounding samples (N) of the current block. C ) and surrounding samples (N) of the reference block R ) to obtain parameters that compensate for the luminance difference using (S930), apply the luminance compensation parameters derived in the luminance compensation parameter derivation step to the in-frame prediction block generated in the step of generating an in-frame prediction block using surrounding samples of the reference block (S940), and a pixel position-specific difference value block (P) between the in-frame prediction block generated in the step of generating an in-frame prediction block using surrounding samples of the reference block and the prediction block to which the luminance compensation parameters were applied in the step of applying the luminance compensation parameters to the in-frame prediction block. IntraD ) is generated (S950), and the inter-frame prediction block generated in the step of generating an inter-frame prediction block through motion compensation and the corrected intra-frame prediction block generated in the step of applying luminance compensation parameters to the intra-frame prediction block can be combined (S960).

[0296] The in-screen prediction samples for the current block can be derived based on the difference values ​​between the fifth in-screen prediction samples derived based on the surrounding samples of the reference block and the sixth in-screen prediction samples derived based on the surrounding samples of the reference block to which luminance compensation is applied.

[0297] The luminance compensation parameter for the above luminance compensation can be determined such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized.

[0298] The above prediction information may include at least one of information regarding a prediction mode within the fifth screen for generating prediction samples within the fifth screen and information regarding a prediction mode within the sixth screen for generating prediction samples within the sixth screen.

[0299] The prediction mode within the fifth screen above may be the same as the prediction mode within the sixth screen above.

[0300] The specific method for generating prediction samples for the current block may be the same as the method for deriving prediction samples for the current block of S1010 to S1070 of the operation described above in Fig. 10.

[0301] For example, the processor performs motion compensation to predict the block (P Inter Generate ) (S1010), and use surrounding samples adjacent to the reference block to predict an in-screen block (P) corresponding to the current block. IntraR Generate ) (S1020), and surrounding samples (N) of the current block C ) and surrounding samples (N) of the reference block R ) is used to obtain parameters that compensate for the luminance difference (S1030), the luminance compensation parameters derived in the luminance compensation parameter derivation step are applied to surrounding samples of the reference block (S1040), and the luminance-compensated surrounding samples generated in the step of applying the luminance compensation parameters to surrounding samples of the reference block are used to predict an in-frame block (P' IntraR) is generated (S1050), and a pixel position-specific difference value block (P) between the prediction block generated in the step of generating an in-frame prediction block using surrounding samples of the reference block and the prediction block generated in the step of generating an in-frame prediction block using surrounding samples to which luminance compensation is applied to the reference block. IntraD ) is generated (S1060), and the inter-frame prediction block generated in the step of generating an inter-frame prediction block through motion compensation and the corrected intra-frame prediction block generated in the step of applying luminance compensation parameters to the intra-frame prediction block can be combined (S1070).

[0302] The prediction samples for the current block above can be derived based on the weighted sum between the inter-frame prediction samples and the intra-frame prediction samples.

[0303] The weights of the weighted sum between the prediction samples between the screens and the prediction samples within the screen may vary depending on the area within the current block.

[0304] The weight of the weighted sum between the above-mentioned cross-frame prediction samples and the above-mentioned intra-frame prediction samples can be derived based on whether intra-frame prediction is applied to the surrounding blocks of the above-mentioned current block.

[0305] The encoding device can generate prediction information (S1130).

[0306] For example, the processor of the encoding device may generate prediction information including information regarding the CIIP mode or modified CIIP mode based on determining the CIIP mode or modified CIIP mode for the current unit (or current block).

[0307] The processor can generate prediction information including information regarding the inter-frame prediction mode based on determining the inter-frame prediction mode for the current unit (or current block).

[0308] Based on the motion information generated for inter-frame prediction, the processor can generate prediction information including information about the motion indicating the motion information (e.g., index information).

[0309] The processor can generate prediction information including information about the in-frame prediction mode based on determining the in-frame prediction mode for the current unit (or current block).

[0310] The processor may optionally generate prediction information including information about weight values ​​based on determining weight values ​​for a weighted sum between cross-frame prediction samples and intra-frame prediction samples.

[0311] The encoding device can generate residual information for the current block (S1140).

[0312] For example, the processor of the encoding device can derive residual samples for the current block and generate residual information based on the residual samples.

[0313] The processor can derive residual samples based on the difference between the original samples and the predicted samples for the current block.

[0314] The processor can generate residual information based on residual samples. The processor can derive transformation coefficients based on performing first and second transformations on the residual samples. The processor can derive quantized transformation coefficient levels based on performing quantization on the transformation coefficients. The processor can generate residual information based on the quantized transformation coefficient levels.

[0315] The encoding device can encode video information (S1150).

[0316] For example, the processor of an encoding device can encode image information including prediction information and residual information.

[0317] Image information may be in various forms. For example, image information may be a syntax element or a syntax structure containing one or more syntax elements. Additionally, image information may be a raw byte sequence payload (RBSP) containing one or more syntax elements or one or more syntax structures. Additionally, image information may be a Network Abstraction Layer (NAL) unit containing one or more RBSPs or a bitstream containing one or more NALs.

[0318] As described above, the encoding device may use surrounding samples of the reference block used for inter-frame prediction to derive intra-frame prediction samples in CIIP mode or a modified CIIP mode. For example, the encoding device may derive difference values ​​between intra-frame prediction samples derived based on the surrounding samples of the reference block and intra-frame prediction samples derived based on the surrounding samples of the current block (or surrounding samples of the reference block to which luminance compensation has been applied), and derive prediction samples based on the weighted sum between the difference values ​​and the inter-frame prediction samples.

[0319] Accordingly, prediction samples reflecting the luminance difference between the reference block and the current block used for inter-frame prediction can be derived. In other words, the texture of the current block is predicted through inter-frame prediction samples using the reference block, and the luminance difference between the reference block and the current block can be reflected through the difference values ​​between the intra-frame prediction samples based on the reference block's surrounding samples and the intra-frame prediction samples based on the current block's surrounding samples (or the reference block's surrounding samples with luminance compensation applied). As a result, the texture of the reference block and prediction samples reflecting the luminance difference between the reference block and the current block can be derived, and the prediction performance of the prediction samples can be significantly improved.

[0320] Based on the above, the coding efficiency of the coding system can be significantly improved.

[0321] In addition, the data transmission efficiency of the coding system can be significantly improved.

[0322] A bitstream is generated based on video information encoded according to the encoding method described above, and the bitstream can be stored on a computer-readable storage medium.

[0323] A bitstream is generated based on video information encoded according to the encoding method described above, and the bitstream can be transmitted through a transmission unit and / or a transmission medium.

[0324] FIG. 12 is a drawing illustrating an exemplary content streaming system to which an embodiment according to the present disclosure can be applied.

[0325] As illustrated in FIG. 12, a content streaming system to which an embodiment of the present disclosure is applied may largely include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

[0326] The above encoding server compresses content input from multimedia input devices, such as smartphones, cameras, and camcorders, into digital data to generate a bitstream and transmits it to the streaming server. As another example, if multimedia input devices, such as smartphones, cameras, and camcorders, generate the bitstream directly, the encoding server may be omitted.

[0327] The bitstream may be generated by a video encoding method and / or encoding device to which an embodiment of the present disclosure is applied, and the streaming server may temporarily store the bitstream during the process of transmitting or receiving the bitstream.

[0328] The streaming server transmits multimedia data to a user device based on a user request through a web server, and the web server can act as a medium to inform the user of available services. When a user requests a desired service from the web server, the web server transmits it to the streaming server, and the streaming server can transmit multimedia data to the user. At this time, the content streaming system may include a separate control server, and in this case, the control server can perform the role of controlling commands and responses between each device within the content streaming system.

[0329] The streaming server can receive content from a media storage and / or an encoding server. For example, when receiving content from the encoding server, the content can be received in real time. In this case, to provide a seamless streaming service, the streaming server can store the bitstream for a certain period of time.

[0330] Examples of the above user devices may include mobile phones, smartphones, laptop computers, digital broadcasting terminals, PDAs (personal digital assistants), PMPs (portable multimedia players), navigation systems, slate PCs, tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, HMDs (head-mounted displays)), digital TVs, desktop computers, digital signage, etc.

[0331] Each server within the above-mentioned content streaming system can be operated as a distributed server, and in this case, data received from each server can be processed in a distributed manner.

[0332] The scope of the present disclosure includes software or machine-executable instructions (e.g., operating system, application, firmware, program, etc.) that enable an operation according to a method of various embodiments to be executed on a device or computer, and a non-transitory computer-readable medium on which such software or instructions, etc. are stored and executable on a device or computer.

[0333] An embodiment according to the present disclosure can be used to encode / decode images.

Claims

Regarding the method of decoding video information, Acquire the above image information including prediction information; Based on the above prediction information, derive a prediction mode for the current block; Deriving prediction samples for the current block based on the above prediction mode, but including Deriving prediction samples for the above current block is, Deriving cross-screen prediction samples for the current block based on the reference block; Deriving in-screen prediction samples for the current block based on surrounding samples of the reference block; A method comprising deriving prediction samples for the current block based on the prediction samples between the screens and the prediction samples within the screens. In paragraph 1, A method for deriving in-screen prediction samples for the current block based on difference values ​​between first in-screen prediction samples derived based on surrounding samples of the current block and second in-screen prediction samples derived based on surrounding samples of the reference block. In paragraph 2, A method comprising at least one of the above prediction information, which includes information regarding a prediction mode within a first screen for deriving prediction samples within the first screen and information regarding a prediction mode within a second screen for deriving prediction samples within the second screen. In paragraph 3, The above first in-screen prediction mode is a method different from the above second in-screen prediction mode. In paragraph 3, A method in which the first intra-frame prediction mode and the second intra-frame prediction mode each include Decoder-side intra-mode derivation (DIMD). In paragraph 1, A method for deriving the in-screen prediction samples for the current block based on the difference values ​​between the third in-screen prediction samples derived based on the surrounding samples of the reference block and the fourth in-screen prediction samples derived based on the third in-screen prediction samples with brightness compensation applied. In paragraph 6, A method for determining the luminance compensation parameter for the above luminance compensation such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized. In paragraph 1, A method for deriving the in-screen prediction samples for the current block based on the difference values ​​between the fifth in-screen prediction samples derived based on the surrounding samples of the reference block and the sixth in-screen prediction samples derived based on the surrounding samples of the reference block to which luminance compensation is applied. In paragraph 8, A method for determining the luminance compensation parameter for the above luminance compensation such that the difference between the surrounding samples of the current block and the surrounding samples of the reference block is minimized. In paragraph 8, A method comprising at least one of the above prediction information, information regarding a prediction mode within a fifth screen for generating prediction samples within the fifth screen, and information regarding a prediction mode within a sixth screen for generating prediction samples within the sixth screen. In Paragraph 10, The above-mentioned prediction mode within the 5th screen is the same method as the above-mentioned prediction mode within the 6th screen. In paragraph 1, A method in which prediction samples for the current block are derived based on a weighted sum between the cross-frame prediction samples and the intra-frame prediction samples. In Paragraph 12, The weights of the weighted sum between the above-mentioned cross-screen prediction samples and the above-mentioned intra-screen prediction samples are different depending on the area within the above-mentioned current block. In Paragraph 12, A method in which the weight of the weighted sum between the above-mentioned cross-frame prediction samples and the above-mentioned intra-frame prediction samples is derived based on whether intra-frame prediction is applied to the surrounding blocks of the above-mentioned current block. Regarding the method of encoding video information, Determine the prediction mode for the current block; Generate prediction samples for the current block based on the above prediction mode; Generate prediction information based on the above prediction mode; Encoding image information including the above prediction information, Generating prediction samples for the above current block is, Generate cross-screen prediction samples for the current block based on the reference block; Generate in-screen prediction samples for the current block based on surrounding samples of the reference block; A method comprising generating prediction samples for the current block based on the above-mentioned cross-frame prediction samples and above-mentioned intra-frame prediction samples. Regarding methods concerning bitstreams, Generate a bitstream; Includes transmitting data regarding the above bitstream, The above bitstream is generated based on determining a prediction mode for the current block, generating prediction samples for the current block based on the prediction mode, generating prediction information based on the prediction mode, and encoding image information including the prediction information. Generating prediction samples for the above current block is, Generate cross-screen prediction samples for the current block based on the reference block; Generate in-screen prediction samples for the current block based on surrounding samples of the reference block; A method comprising generating prediction samples for the current block based on the above-mentioned cross-frame prediction samples and above-mentioned intra-frame prediction samples. A computer-readable storage medium for storing a bitstream generated based on the method according to paragraph 15.