Image encoding / decoding method and device, and recording medium for storing bitstream

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
The image encoding/decoding method addresses inefficiencies in existing technologies by using a transformation kernel derived from intra-prediction modes and HoG to improve encoding/decoding efficiency for high-resolution video formats, reducing data volume and costs.

WO2026134732A1PCT designated stage Publication Date: 2026-06-25HYUNDAI MOTOR CO LTD +1

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: HYUNDAI MOTOR CO LTD
Filing Date: 2025-11-21
Publication Date: 2026-06-25

Application Information

Patent Timeline

21 Nov 2025

Application

25 Jun 2026

Publication

WO2026134732A1

IPC: H04N19/122; H04N19/61; H04N19/11

AI Tagging

Application Domain

Digital video signal modification

Technology Topics

Computer hardwareCoding decoding

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

Technical Problem

Existing video encoding and decoding technologies struggle with limited coding efficiency due to the use of conversion kernels that do not adequately reflect the characteristics of the current block, especially with high-resolution and high-quality video formats like UHD, VR, AR, and game footage, leading to increased data volume and transmission/storage costs.

Method used

An image encoding/decoding method that determines a transformation kernel based on intra-prediction modes using a Histogram of Gradient (HoG) and neural-network based intra prediction, deriving a transformation kernel from the first and second intra prediction modes, and applying it to improve encoding/decoding efficiency.

Benefits of technology

Enhances encoding/decoding efficiency by accurately determining a transformation kernel tailored to the block characteristics, reducing data volume and associated costs for high-resolution and high-quality video formats.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure KR2025019452_25062026_PF_FP_ABST

Patent Text Reader

Abstract

Provided are image encoding / decoding method and device, a recording medium storing a bitstream, and a transmission method. The image decoding method may comprise the steps of: acquiring, from a bitstream, a transform coefficient for the current block; deriving a prediction block by predicting the current block; generating, on the basis of the prediction block, a histogram of gradient (HoG); deriving, on the basis of the HoG, a first intra prediction mode and a second intra prediction mode for determining a transform kernel; determining, on the basis of the first intra prediction mode and the second intra prediction mode, a transform kernel for the transform coefficient; and inversely transforming, on the basis of the transform kernel for the transform coefficient, the transform coefficient.

Need to check novelty before this filing date? Find Prior Art

Description

Video encoding / decoding method, device, and recording medium storing a bitstream

[0001] The present disclosure relates to an image encoding / decoding method, an apparatus, and a recording medium storing a bitstream. Specifically, the present disclosure relates to an image encoding / decoding method, an apparatus, and a recording medium storing a bitstream based on a method for determining a conversion kernel based on intra-prediction mode derivation.

[0002] Recently, the demand for high-resolution, high-quality video, such as UHD (Ultra High Definition) video, is increasing across various application fields. Furthermore, interest in and demand for immersive media, including VR (Virtual Reality), AR (Artificial Reality), and holograms, are also on the rise. Additionally, the broadcasting of video with characteristics distinct from reality, such as game footage, is also increasing. As video data becomes higher in resolution and quality, the relative volume of data increases compared to conventional video data; consequently, transmission and storage costs rise when video data is transmitted using existing wired or wireless broadband lines or stored using conventional storage media. To address these issues arising from the increase in data resolution and quality, high-efficiency video encoding and decoding technologies for video with higher resolution and quality are required.

[0003] Conventionally, conversion is performed based on a conversion kernel that does not properly reflect the characteristics of the current block, which may limit coding efficiency.

[0004] The present disclosure aims to provide an image encoding / decoding method and apparatus with improved encoding / decoding efficiency.

[0005] In addition, the present disclosure aims to provide a recording medium storing a bitstream generated by an image decoding method or device according to the present disclosure.

[0006] In addition, the present disclosure aims to provide an intra-prediction mode derivation method for determining a transformation kernel and a transformation kernel determination method to solve the above-mentioned problems.

[0007] A video decoding method according to one embodiment of the present disclosure may include the steps of: obtaining a transformation coefficient for a current block from a bitstream; performing a prediction on the current block to derive a prediction block; generating a Histogram of Gradient (HoG) based on the prediction block; deriving a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the Histogram of Gradient; determining a transformation kernel for the transformation coefficient based on the first intra prediction mode and the second intra prediction mode; and performing an inverse transformation of the transformation coefficient based on the transformation kernel for the transformation coefficient.

[0008] In the above-described image decoding method, the step of determining a transformation kernel for the transformation coefficient further includes the step of deriving an index representing the difference between the value of the first intra prediction mode and the value of the second intra prediction mode, and the transformation kernel for the transformation coefficient may be determined as one of a plurality of transformation kernels based on the index.

[0009] In the above image decoding method, the step of deriving the prediction block is performed based on a neural-network based intra prediction mode, and the neural-network based intra prediction mode may use a neural network model that takes adjacent reference samples of the current block as input.

[0010] In the above image decoding method, the first intra prediction mode and the second intra prediction mode may be intra prediction modes corresponding to the top two directions in order of largest amplitude in the gradient histogram.

[0011] In the above image decoding method, the gradient histogram is generated by accumulating the gradients of pixels included in the prediction block, and the gradients of pixels included in the prediction block can be obtained by applying a boundary detection filter to the pixels included in the prediction block.

[0012] In the above image decoding method, the size of the boundary detection filter can be determined based on the size of the current block.

[0013] In the above image decoding method, the gradient of the pixels included in the prediction block can be accumulated in the gradient histogram based on the positions of the pixels included in the prediction block.

[0014] The above image decoding method further includes the step of sub-sampling the prediction block, and the gradient histogram can be generated by accumulating the gradients of pixels included in the sub-sampled prediction block.

[0015] In the above image decoding method, the transformation kernel for the transformation coefficient may be a separation transformation kernel.

[0016] In the above image decoding method, the transformation kernel for the transformation coefficient can be determined as one of a plurality of transformation kernels based on the size of the current block.

[0017] A video encoding method according to one embodiment of the present disclosure may include: a step of performing a prediction on a current block to derive a prediction block; a step of deriving a residual block of the current block based on the prediction block; a step of generating a Histogram of Gradient (HoG) based on the prediction block; a step of deriving a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the Histogram of Gradient; a step of determining a transformation kernel for the residual block based on the first intra prediction mode and the second intra prediction mode; and a step of performing a transformation of the residual block based on the transformation kernel for the residual block.

[0018] A non-transient computer-readable recording medium storing a bitstream generated by an image encoding method according to one embodiment of the present disclosure can store the bitstream generated by the image encoding method.

[0019] A bitstream transmission method according to one embodiment of the present disclosure can transmit a bitstream generated by the image encoding method.

[0020] The features briefly summarized above regarding the present disclosure are merely exemplary aspects of the detailed description of the present disclosure that follows and do not limit the scope of the present disclosure.

[0021] According to the present disclosure, an image encoding / decoding method and apparatus with improved encoding / decoding efficiency may be provided.

[0022] Additionally, according to the present disclosure, a method for deriving an intra-prediction mode for determining a transformation kernel may be provided.

[0023] Additionally, according to the present disclosure, a method for determining a conversion kernel for conversion / inverse conversion may be provided.

[0024] In addition, according to the present disclosure, a suitable conversion kernel can be determined, thereby improving conversion efficiency.

[0025] The effects obtainable from the present disclosure are not limited to those mentioned above, and other unmentioned effects will be clearly understood by those skilled in the art to which the present disclosure pertains from the description below.

[0026] FIG. 1 is a block diagram showing the configuration according to one embodiment of an encoding device to which the present disclosure applies.

[0027] FIG. 2 is a block diagram showing the configuration according to one embodiment of a decoding device to which the present disclosure is applied.

[0028] FIG. 3 is a schematic diagram illustrating a video coding system to which the present disclosure can be applied.

[0029] FIG. 4 is a drawing for explaining the expansion of a peripheral area according to one embodiment of the present disclosure.

[0030] FIG. 5 is a diagram illustrating the addition of blocks in which an intra prediction mode is accumulated according to one embodiment of the present disclosure.

[0031] FIG. 6 is a flowchart illustrating a decoding method according to one embodiment of the present disclosure.

[0032] FIG. 7 is a flowchart illustrating an encoding method according to one embodiment of the present disclosure.

[0033] FIG. 8 is a drawing illustrating an exemplary content streaming system to which an embodiment according to the present disclosure can be applied.

[0034] A video decoding method according to one embodiment of the present disclosure may include: obtaining a transformation coefficient for a current block from a bitstream; performing a prediction on the current block to derive a prediction block; generating a Histogram of Gradient (HoG) based on the prediction block; deriving a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the Histogram of Gradient; determining a transformation kernel for the transformation coefficient based on the first intra prediction mode and the second intra prediction mode; and performing an inverse transformation of the transformation coefficient based on the transformation kernel for the transformation coefficient.

[0035] The present disclosure is subject to various modifications and may have various embodiments, and specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present disclosure to specific embodiments, and it should be understood that it includes all modifications, equivalents, and substitutions that fall within the spirit and scope of the present disclosure. Similar reference numerals in the drawings refer to the same or similar functions across various aspects. The shapes and sizes of elements in the drawings may be provided illustratively for clearer explanation. The detailed description of the exemplary embodiments described below refers to the accompanying drawings, which illustrate specific embodiments. These embodiments are described in sufficient detail to enable those skilled in the art to practice the embodiments. It should be understood that various embodiments are different but need not be mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in other embodiments without departing from the spirit and scope of the present disclosure in relation to one embodiment. It should also be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the embodiment. Accordingly, the following detailed description is not intended to be taken in a limiting sense, and the scope of exemplary embodiments is limited only by the appended claims, together with all equivalents to those claimed therein, provided they are properly described.

[0036] In this disclosure, terms such as first, second, etc. may be used to describe various components, but said components should not be limited by said terms. Such terms are used solely for the purpose of distinguishing one component from another. For example, without departing from the scope of this disclosure, the first component may be named the second component, and similarly, the second component may be named the first component. The term "and / or" includes a combination of a plurality of related described items or any of a plurality of related described items.

[0037] The components shown in the embodiments of the present disclosure are depicted independently to represent different characteristic functions and do not imply that each component consists of separate hardware or a single software unit. That is, each component is listed and included as a separate component for convenience of explanation; however, at least two of the components may be combined to form a single component, or a single component may be divided into multiple components to perform a function, and such integrated and separated embodiments of each component are included within the scope of the rights of the present disclosure as long as they do not deviate from the essence of the present disclosure.

[0038] The terms used in this disclosure are used merely to describe specific embodiments and are not intended to limit this disclosure. Singular expressions include plural expressions unless the context clearly indicates otherwise. Additionally, some components of this disclosure may not be essential components performing an essential function in this disclosure, but may be optional components merely for enhancing performance. This disclosure may be implemented by including only the components essential to embody the essence of this disclosure, excluding components used merely for performance enhancement, and a structure including only the essential components, excluding optional components used merely for performance enhancement, is also included within the scope of this disclosure.

[0039] In the embodiments, the term "at least one" may mean one of a number of 1 or more, such as 1, 2, 3, and 4. In the embodiments, the term "a plurality of" may mean one of a number of 2 or more, such as 2, 3, and 4.

[0040] Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In describing the embodiments of this specification, if it is determined that a detailed description of related known configurations or functions may obscure the gist of this specification, such detailed description is omitted, and the same reference numerals are used for identical components in the drawings, and redundant descriptions of identical components are omitted.

[0041] Glossary

[0042] In the following, “image” may refer to a single picture constituting a video, or it may refer to the video itself. For example, “encoding and / or decoding of an image” may mean “encoding and / or decoding of an image”, and may also mean “encoding and / or decoding of one of the images constituting the video”.

[0043] In the following, "video" and "video" may be used interchangeably with the same meaning. Additionally, the target image may be an image to be encoded and / or an image to be decoded. Furthermore, the target image may be an input image fed into an encoding device and an input image fed into a decoding device. Here, the target image may have the same meaning as the current image.

[0044] In the following, the terms encoder and image encoding device may be used interchangeably.

[0045] In the following, the decoder and the image decoder may be used interchangeably with each other.

[0046] In the following, "image," "picture," "frame," and "screen" may be used interchangeably with the same meaning.

[0047] In the following, “target block” may be an encoding target block that is the target of encoding and / or a decoding target block that is the target of decoding. Additionally, the target block may be a current block that is the target of current encoding and / or decoding. For example, “target block” and “current block” may be used interchangeably.

[0048] In the following description, "block" and "unit" may be used interchangeably. Additionally, to distinguish it from a block, "unit" may refer to a block containing a luminance (Luma) component block and a corresponding chroma (Chroma) component block. For example, a Coding Tree Unit (CTU) may consist of a single luminance component (Y) coding tree block (CTB) and two chroma component (Cb, Cr) coding tree blocks associated with it.

[0049] In the following, “sample,” “pixel,” and “pixel” may be used interchangeably with the same meaning. Here, a sample may represent a basic unit constituting a block.

[0050] In the following, “inter” and “inter-screen” may be used interchangeably with the same meaning.

[0051] In the following, “intra” and “in-screen” may be used interchangeably with the same meaning.

[0052]

[0053] FIG. 1 is a block diagram showing the configuration according to one embodiment of an encoding device to which the present disclosure applies.

[0054] The encoding device (100) may be an encoder, a video encoding device, or an image encoding device. The video may include one or more images. The encoding device (100) may sequentially encode one or more images.

[0055] Referring to FIG. 1, the encoding device (100) may include an image segmentation unit (110), an intra prediction unit (120), a motion prediction unit (121), a motion compensation unit (122), a switch (115), a subtractor (113), a converter (130), a quantization unit (140), an entropy encoding unit (150), an inverse quantization unit (160), an inverse converter (170), an adder (117), a filter unit (180), and a reference picture buffer (190).

[0056] Additionally, the encoding device (100) can generate a bitstream containing encoded information through encoding of an input image and can output the generated bitstream. The generated bitstream can be stored on a computer-readable recording medium or streamed via a wired / wireless transmission medium.

[0057] The video segmentation unit (110) can divide the input video into various forms to increase the efficiency of video encoding / decoding. That is, the input video consists of multiple pictures, and a single picture can be processed by hierarchically dividing it for compression efficiency, parallel processing, etc. For example, a single picture can be divided into one or more tiles or slices and then divided again into multiple CTUs (Coding Tree Units). Alternatively, a single picture can first be divided into multiple sub-pictures defined as groups of rectangular slices, and each sub-picture can be divided into the said tiles / slices. Here, the sub-pictures can be utilized to support the function of partially and independently encoding / decoding and transmitting the picture. Since multiple sub-pictures can each be restored individually, they have the advantage of being easy to edit in applications that configure multi-channel inputs into a single picture. In addition, the tiles can be divided horizontally to create bricks. Here, a brick can be utilized as the basic unit of parallel processing within a picture. Additionally, a single CTU can be recursively partitioned into a Quadtree (QT), and the terminal node of the partition can be defined as a Coding Unit (CU). The CU can be divided into a Prediction Unit (PU) and a Transform Unit (TU) to perform prediction and partitioning. Meanwhile, the CU can be utilized as the prediction unit and / or the transformation unit itself. Here, for flexible partitioning, each CTU can be recursively partitioned into a Multi-Type Tree (MTT) as well as a Quadtree (QT). The partitioning of the CTU into a Multi-Type Tree can begin at the terminal node of the QT, and the MTT can be composed of a Binary Tree (BT) and a Triple Tree (TT).For example, the MTT structure can be classified into vertical binary splitting mode (SPLIT_BT_VER), horizontal binary splitting mode (SPLIT_BT_HOR), vertical ternary splitting mode (SPLIT_TT_VER), and horizontal ternary splitting mode (SPLIT_TT_HOR). Additionally, when splitting, the minimum block size (MinQTSize) of the quad tree for the luminance block can be set to 16x16, the maximum block size (MaxBtSize) of the binary tree to 128x128, and the maximum block size (MaxTtSize) of the triple tree to 64x64. Furthermore, the minimum block size (MinBtSize) of the binary tree and the minimum block size (MinTtSize) of the triple tree can be set to 4x4, and the maximum depth (MaxMttDepth) of the multi-type tree can be set to 4. Additionally, to increase the encoding efficiency of the I slice, a dual tree can be applied that uses different CTU splitting structures for the luminance and chrominance components. On the other hand, in P and B slices, the luminance and color difference CTBs (Coding Tree Blocks) within the CTU can be divided into a single tree that shares a coding tree structure.

[0058] The encoding device (100) may perform encoding on an input image in an intra mode and / or inter mode. Alternatively, the encoding device (100) may perform encoding on an input image in a third mode other than the intra mode and inter mode (e.g., IBC mode, Palette mode, etc.). However, if the third mode has functional characteristics similar to the intra mode or inter mode, it may be classified as an intra mode or inter mode for convenience of explanation. In this disclosure, the third mode will be classified and described separately only when a specific description of the third mode is required.

[0059] When intra mode is used as the prediction mode, the switch (115) can be switched to intra, and when inter mode is used as the prediction mode, the switch (115) can be switched to inter. Here, intra mode may mean an intra-frame prediction mode, and inter mode may mean an inter-frame prediction mode. The encoding device (100) can generate a prediction block for an input block of an input image. Additionally, after the prediction block is generated, the encoding device (100) can encode a residual block using the residual of the input block and the prediction block. The input image may be referred to as the current image that is the subject of current encoding. The input block may be referred to as the current block that is the subject of current encoding or the encoding target block.

[0060] When the prediction mode is an intra mode, the intra prediction unit (120) may use a sample of a block that has already been encoded / decoded around the current block as a reference sample. The intra prediction unit (120) may perform spatial prediction for the current block using the reference sample and generate prediction samples for the input block through spatial prediction. Here, intra prediction may mean intra-frame prediction.

[0061] In the intra prediction method, non-directional prediction modes such as DC mode and Planar mode, and directional prediction modes (e.g., 65 directions) may be applied. Here, the intra prediction method can be expressed as an intra prediction mode or an intra-frame prediction mode.

[0062] When the prediction mode is an inter mode, the motion prediction unit (121) can search for the region that best matches the input block from the reference image during the motion prediction process and derive a motion vector using the searched region. At this time, the search region can be used as the region. The reference image can be stored in the reference picture buffer (190). Here, the reference image can be stored in the reference picture buffer (190) when encoding / decoding of the reference image is processed.

[0063] The motion compensation unit (122) can generate a prediction block for the current block by performing motion compensation using a motion vector. Here, inter-prediction may mean inter-frame prediction or motion compensation.

[0064] The motion prediction unit (121) and motion compensation unit (122) can generate a prediction block by applying an interpolation filter to a portion of the reference image when the value of the motion vector does not have an integer value. To perform inter-frame prediction or motion compensation, based on the encoding unit, it can determine whether the motion prediction and motion compensation method of the prediction unit included in the corresponding encoding unit is a Skip Mode, Merge Mode, Advanced Motion Vector Prediction (AMVP) Mode, or Intra Block Copy (IBC) Mode, and can perform inter-frame prediction or motion compensation according to each mode.

[0065] In addition, based on the above-mentioned inter-frame prediction method, the AFFINE mode of sub-PU-based prediction, the SbTMVP (Subblock-based Temporal Motion Vector Prediction) mode, and the MMVD (Merge with MVD) mode and GPM (Geometric Partitioning Mode) mode of PU-based prediction may be applied. Furthermore, to improve the performance of each mode, HMVP (History based MVP), PAMVP (Pairwise Average MVP), CIIP (Combined Intra / Inter Prediction), AMVR (Adaptive Motion Vector Resolution), BDOF (Bi-Directional Optical-Flow), BCW (Bi-predictive with CU Weights), LIC (Local Illumination Compensation), TM (Template Matching), OBMC (Overlapped Block Motion Compensation), etc. may be applied.

[0066] Among these, AFFINE mode is a technology used in both AMVP and MERGE modes and also offers high encoding efficiency. Conventional video coding standards have the disadvantage of failing to properly compensate for real-world movements, such as zoom in / out and rotation, because they perform Motion Compensation (MC) by considering only the translation of blocks. To address this, a 4-parameter affine motion model using two control point motion vectors (CPMV) and a 6-parameter affine motion model using three control point motion vectors can be applied to inter-prediction. Here, CPMV is a vector representing one of the affine motion models of the top-left, top-right, or bottom-left corners of the current block.

[0067] The subtractor (113) can generate a residual block using the difference between the input block and the prediction block. The residual block may also be referred to as a residual signal. The residual signal may represent the difference between the original signal and the prediction signal. Alternatively, the residual signal may be a signal generated by transforming, quantizing, or both transforming and quantizing the difference between the original signal and the prediction signal. The residual block may be a residual signal in block units.

[0068] The transformation unit (130) can generate a transform coefficient by performing a transform on the remaining block and output the generated transform coefficient. Here, the transform coefficient may be a coefficient value generated by performing a transform on the remaining block. When a transform skip mode is applied, the transformation unit (130) may skip the transform on the remaining block.

[0069] A quantized level can be generated by applying quantization to a conversion coefficient or a residual signal. In the following embodiments, the quantized level may also be referred to as a conversion coefficient.

[0070] For example, a 4x4 luminance residual block generated through intra prediction can be transformed using a Discrete Sine Transform (DST)-based basis vector, while the remaining residual blocks can be transformed using a Discrete Cosine Transform (DCT)-based basis vector. Additionally, the transformation blocks for a single block can be divided into a quad tree form using Residual Quad Tree (RQT) technology, and after performing transformation and quantization on each transformation block divided by RQT, a coded block flag (cbf) can be transmitted to increase coding efficiency in the case where all coefficients become zero.

[0071] As another alternative, the Multiple Transform Selection (MTS) technique can be applied to perform transformations using multiple transformation bases selectively. In other words, instead of dividing a CU into TUs via RQT, a function similar to TU division can be performed using the Sub-block Transform (SBT) technique. Specifically, SBT is applied only to inter-frame prediction blocks and, unlike RQT, divides the current block into ½ or ¼ sizes in the vertical or horizontal direction, and then performs a transformation on only one of the blocks. For example, if divided vertically, a transformation can be performed on the leftmost or rightmost block, and if divided horizontally, a transformation can be performed on the topmost or bottommost block.

[0072] In addition, Low Frequency Non-Separable Transform (LFNST), a secondary transform technique that further transforms the residual signal converted to the frequency domain through DCT or DST, can also be applied. LFNST performs additional transformation on the 4x4 or 8x8 low-frequency region in the upper left corner, thereby allowing the residual coefficients to be concentrated in the upper left corner.

[0073] The quantization unit (140) can generate a quantized level by quantizing a transformation coefficient or residual signal according to a quantization parameter (QP, Quantization parameter) and can output the generated quantized level. At this time, the quantization unit (140) can quantize the transformation coefficient using a quantization matrix.

[0074] For example, a quantizer using QP values from 0 to 51 can be used. Alternatively, if the image size is larger and higher coding efficiency is required, QP values from 0 to 63 can be used. Additionally, a Dependent Quantization (DQ) method using two quantizers instead of a single one can be applied. DQ performs quantization using two quantizers (e.g., Q0, Q1), but can be applied so that the quantizer to be used for the next transform coefficient is selected based on the current state through a state transition model, even without signaling information regarding the use of a specific quantizer.

[0075] The entropy encoding unit (150) can generate a bitstream and output a bitstream by performing entropy encoding according to a probability distribution on values calculated by the quantization unit (140) or coding parameter values calculated during the encoding process. The entropy encoding unit (150) can perform entropy encoding on information regarding a sample of an image and information for decoding an image. For example, information for decoding an image may include syntax elements, etc.

[0076] When entropy coding is applied, a small number of bits are allocated to symbols with a high probability of occurrence and a large number of bits are allocated to symbols with a low probability of occurrence, thereby representing the symbols and reducing the size of the bit sequence for the symbols to be encoded. The entropy coding unit (150) may use encoding methods such as exponential Golomb, CAVLC (Context-Adaptive Variable Length Coding), and CABAC (Context-Adaptive Binary Arithmetic Coding) for entropy coding. For example, the entropy coding unit (150) may perform entropy coding using a Variable Length Coding (VLC) table. In addition, the entropy encoding unit (150) may perform arithmetic encoding using the derived binarization method, probability model, and context model after deriving a binarization method of the target symbol and a probability model of the target symbol / bin.

[0077] In this regard, when applying CABAC, in order to reduce the size of the probability table stored in the decoder, the table probability update method may be changed to a table update method using a simple formula. In addition, two different probability models may be used to obtain more accurate symbol probability values.

[0078] The entropy encoding unit (150) can convert a 2-dimensional block form coefficient into a 1-dimensional vector form through a transform coefficient scanning method to encode a transform coefficient level (quantized level).

[0079] Coding parameters may include information (flags, indexes, etc.) that is encoded in the encoding device (100) and signaled to the decoding device (200), such as syntax elements, as well as information derived during the encoding process or decoding process, and may refer to information required when encoding or decoding images.

[0080] Here, signaling a flag or index may mean that in an encoder, the corresponding flag or index is entropy encoded and included in a bitstream, and in a decoder, the corresponding flag or index is entropy decoded from the bitstream.

[0081] The encoded current image can be used as a reference image for other images processed later. Accordingly, the encoding device (100) can restore or decode the encoded current image again, and can store the restored or decoded image as a reference image in the reference picture buffer (190).

[0082] The quantized level can be dequantized in the dequantization unit (160) and inverse transformed in the inverse transform unit (170). The dequantized and / or inverse transformed coefficients can be added to the prediction block through the adder (117). A reconstructed block can be generated by adding the dequantized and / or inverse transformed coefficients and the prediction block. Here, the dequantized and / or inverse transformed coefficients refer to coefficients for which at least one of dequantization and inverse transformation has been performed, and may refer to the reconstructed residual block. The dequantization unit (160) and the inverse transform unit (170) can be performed as the reverse process of the quantization unit (140) and the transformation unit (130).

[0083] The restoration block may pass through a filter section (180). The filter section (180) may apply a deblocking filter, Sample Adaptive Offset (SAO), Adaptive Loop Filter (ALF), Bilateral filter (BIF), LMCS (Luma Mapping with Chroma Scaling), etc., to the restoration sample, restoration block, or restoration image as a whole or part of the filtering technique. The filter section (180) may also be referred to as an in-loop filter. In this case, the term in-loop filter is also used as a name that excludes LMCS.

[0084] Deblocking filters can remove block distortion occurring at the boundaries between blocks. To determine whether to perform deblocking, the decision to apply the filter to the current block can be made based on samples contained in a few columns or rows within the block. When applying a deblocking filter to a block, different filters can be applied depending on the required deblocking filtering intensity.

[0085] To compensate for encoding errors using a sample adaptive offset, an appropriate offset value can be added to the sample value. The sample adaptive offset can correct the offset from the original image on a sample-by-sample basis for the deblocked image. One method may be to divide the samples included in the image into a certain number of regions, determine the region to be offset, and apply the offset to that region, or to apply the offset by considering the edge information of each sample.

[0086] A bilateral filter (BIF) can also correct the offset from the original image on a sample-by-sample basis for the deblocked image.

[0087] An adaptive loop filter can perform filtering based on a comparison of the reconstructed image and the original image. After dividing the samples included in the image into predetermined groups, a filter to be applied to each group can be determined, thereby performing filtering differently for each group. Information regarding whether to apply an adaptive loop filter can be signaled per coding unit (CU), and the shape and filter coefficients of the adaptive loop filter to be applied may vary depending on each block.

[0088] In LMCS (Luma Mapping with Chroma Scaling), Luma mapping (LM) refers to remapping luminance values through a piece-wise linear model, and Chroma scaling (CS) refers to a technique that scales the residual values of the chrominance component according to the average luminance value of the predicted signal. In particular, LMCS can be utilized as an HDR correction technique that reflects the characteristics of HDR (High Dynamic Range) video.

[0089] The restored block or restored image that has passed through the filter unit (180) can be stored in the reference picture buffer (190). The restored block that has passed through the filter unit (180) may be part of the reference image. That is to say, the reference image may be a restored image composed of the restored blocks that have passed through the filter unit (180). The stored reference image may subsequently be used for inter-frame prediction or motion compensation.

[0090] FIG. 2 is a block diagram showing the configuration according to one embodiment of a decoding device to which the present disclosure is applied.

[0091] The decoding device (200) may be a decoder, a video decoding device, or an image decoding device.

[0092] Referring to FIG. 2, the decoding device (200) may include an entropy decoding unit (210), an inverse quantization unit (220), an inverse transformation unit (230), an intra prediction unit (240), a motion compensation unit (250), an adder (201), a switch (203), a filter unit (260), and a reference picture buffer (270).

[0093] The decoding device (200) can receive a bitstream output from the encoding device (100). The decoding device (200) can receive a bitstream stored in a computer-readable recording medium or a bitstream stream streamed through a wired / wireless transmission medium. The decoding device (200) can perform decoding on the bitstream in intra mode or inter mode. Additionally, the decoding device (200) can generate a restored image or a decoded image through decoding and can output the restored image or the decoded image.

[0094] If the prediction mode used for decoding is intra mode, the switch (203) can be switched to intra. If the prediction mode used for decoding is inter mode, the switch (203) can be switched to inter.

[0095] The decoding device (200) can decode the input bitstream to obtain a reconstructed residual block and generate a prediction block. Once the reconstructed residual block and the prediction block are obtained, the decoding device (200) can generate a reconstructed block to be decoded by adding the reconstructed residual block and the prediction block. The block to be decoded may be referred to as the current block.

[0096] The entropy decoding unit (210) can generate symbols by performing entropy decoding according to the probability distribution of the bitstream. The generated symbols may include symbols in the form of quantized levels. Here, the entropy decoding method may be the inverse process of the entropy encoding method described above.

[0097] The entropy decoding unit (210) can convert a one-dimensional vector-shaped coefficient into a two-dimensional block-shaped coefficient through a conversion coefficient scanning method to decode a conversion coefficient level (quantized level).

[0098] The quantized level can be dequantized in the dequantization unit (220) and inversely transformed in the inverse transformation unit (230). The quantized level can be generated as a restored residual block as a result of performing dequantization and / or inverse transformation. At this time, the dequantization unit (220) can apply a quantization matrix to the quantized level. The dequantization unit (220) and the inverse transformation unit (230) applied to the decoding device can apply the same technology as the dequantization unit (160) and the inverse transformation unit (170) applied to the aforementioned encoding device.

[0099] When an intra mode is used, the intra prediction unit (240) can generate a prediction block by performing a spatial prediction on the current block using sample values of already decoded blocks around the block to be decoded. The intra prediction unit (240) applied to the decoding device can apply the same technology as the intra prediction unit (120) applied to the aforementioned encoding device.

[0100] When an inter mode is used, the motion compensation unit (250) can generate a prediction block by performing motion compensation on the current block using a motion vector and a reference image stored in the reference picture buffer (270). The motion compensation unit (250) can generate a prediction block by applying an interpolation filter to a portion of the reference image when the value of the motion vector does not have an integer value. To perform motion compensation, it can be determined whether the motion compensation method of the prediction unit included in the corresponding encoding unit is a skip mode, merge mode, AMVP mode, or current picture reference mode based on the encoding unit, and motion compensation can be performed according to each mode. The motion compensation unit (250) applied to the decoder can apply the same technology as the motion compensation unit (122) applied to the aforementioned encoding unit.

[0101] The adder (201) can generate a restored block by adding the restored residual block and the prediction block. The filter unit (260) can apply at least one of the following to the restored block or the restored image: an inverse-LMCS, a deblocking filter, a sample adaptive offset, and an adaptive loop filter. The filter unit (260) applied to the decoder can apply the same filtering technology as the filter unit (180) applied to the aforementioned encoding device.

[0102] The filter unit (260) can output a restored image. The restored block or the restored image can be stored in a reference picture buffer (270) and used for inter-frame prediction. The restored block that has passed through the filter unit (260) may be part of the reference image. That is to say, the reference image may be a restored image composed of the restored blocks that have passed through the filter unit (260). The stored reference image may subsequently be used for inter-frame prediction or motion compensation.

[0103] FIG. 3 is a schematic diagram illustrating a video coding system to which the present disclosure can be applied.

[0104] A video coding system according to one embodiment may include an encoding device (10) and a decoding device (20). The encoding device (10) may transmit encoded video and / or image information or data to the decoding device (20) via a digital storage medium or network in the form of a file or streaming.

[0105] An encoding device (10) according to one embodiment may include a video source generation unit (11), an encoding unit (12), and a transmission unit (13). A decoding device (20) according to one embodiment may include a receiving unit (21), a decoding unit (22), and a rendering unit (23). The encoding unit (12) may be called a video / image encoding unit, and the decoding unit (22) may be called a video / image decoding unit. The transmission unit (13) may be included in the encoding unit (12). The receiving unit (21) may be included in the decoding unit (22). The rendering unit (23) may include a display unit, and the display unit may be composed of a separate device or an external component.

[0106] The video source generation unit (11) can acquire video / image through a process of capturing, synthesizing, or generating video / image. The video source generation unit (11) may include a video / image capture device and / or a video / image generation device. The video / image capture device may include, for example, one or more cameras, a video / image archive containing previously captured video / image, etc. The video / image generation device may include, for example, a computer, a tablet, and a smartphone, etc., and can generate video / image (electronically). For example, a virtual video / image may be generated through a computer, etc., in which case the video / image capture process may be replaced by a process of generating related data.

[0107] The encoding unit (12) can encode the input video / image. The encoding unit (12) can perform a series of procedures such as prediction, conversion, and quantization for compression and encoding efficiency. The encoding unit (12) can output the encoded data (encoded video / image information) in the form of a bitstream. The detailed configuration of the encoding unit (12) can also be configured in the same way as the encoding device (100) of FIG. 1 described above.

[0108] The transmission unit (13) can transmit encoded video / image information or data output in the form of a bitstream to the receiving unit (21) of the decoding device (20) via a digital storage medium or network in the form of a file or streaming. The digital storage medium may include various storage media such as USB, SD, CD, DVD, Blu-ray, HDD, SSD, etc. The transmission unit (13) may include elements for creating a media file through a predetermined file format and elements for transmission via a broadcasting / communication network. The receiving unit (21) can extract / receive the bitstream from the storage medium or network and transmit it to the decoding unit (22).

[0109] The decoding unit (22) can decode a video / image by performing a series of procedures such as inverse quantization, inverse transformation, and prediction corresponding to the operation of the encoding unit (12). The detailed configuration of the decoding unit (22) can also be configured to be identical to the decoding device (200) of FIG. 2 described above.

[0110] The rendering unit (23) can render the decoded video / image. The rendered video / image can be displayed through the display unit.

[0111]

[0112] In this specification, a transform kernel may refer to a transform core used when applying a transform from a spatial domain to a frequency domain. Additionally, a transform set may refer to a group containing transform kernels.

[0113] Meanwhile, the transformation set can refer to a kernel cluster.

[0114] A secondary transform refers to a transformation performed based on the correlation of the coefficients generated by the primary transform after the primary transform from the spatial domain to the frequency domain has been performed. Here, the primary transform and the primary transform may have the same meaning. By performing a secondary transform that converts to a more compressed representation than the primary transform, high-performance compression efficiency can be achieved compared to when only the primary transform is performed.

[0115] The first and second transformations can be performed as separable transforms or non-separable transforms.

[0116] A separate transformation can refer to a transformation in which vertical and horizontal transformations are performed independently using a separate transformation kernel. Here, the vertical transformation refers to a transformation in the vertical direction to which a vertical transformation kernel is applied. Similarly, the horizontal transformation refers to a transformation in the horizontal direction to which a horizontal transformation kernel is applied. For example, in a separate transformation, the vertical transformation may be performed after the horizontal transformation has been performed. Alternatively, the horizontal transformation may be performed after the vertical transformation has been performed.

[0117] Inseparable transformation can mean that the transformation is performed in a single step, rather than in the horizontal and vertical directions, using an inseparable transformation kernel in the form of a matrix.

[0118] In separate and inseparable transformations, basis vectors or combinations of basis vectors may be used. Basis vectors are vectors of size corresponding to pixels or coefficients in the 2D space where the transformation is performed, and can be designed by considering the overall characteristics and similarities of the pixels and / or coefficients in the 2D space where the transformation is performed.

[0119] For second-order transforms such as LFNST (Low frequency non-separable transform), various sizes of non-separable second-order transform kernels can be used to improve coding efficiency and transform complexity for specific input block sizes.

[0120] In the case of NSPT (Non-separable primary transform), although it is a primary transform, various sizes of non-separable primary transform kernels can be used for specific input block sizes to improve coding efficiency and transform complexity, similar to LFNST.

[0121] MTS (Multiple transform selection) may mean using a selected transform kernel among multiple transform kernels for the transformation of the current block.

[0122] MTSS (Multiple transform set selection) may mean that a transform set is selected from among multiple transform sets, and a selected transform kernel from among multiple transform kernels included in the selected transform set is used for the transformation of the current block. In this case, the transform kernel may be selected after the transform set is selected, or the selection of the transform set and the selection of the transform kernel may be performed simultaneously.

[0123]

[0124] For high compression performance, it is important to select a transformation set and / or transformation kernel suitable for the characteristics of the pixels and / or coefficients to be coded. Conventionally, the transformation set and / or transformation kernel are determined based on the intra prediction mode of the block in which the transformation is performed. In such cases, information regarding the intra prediction mode of the current block may not be beneficial to the performance of the transformation technique or the coding structure. Consequently, the transformation set and / or transformation kernel suitable for the characteristics of the pixels and / or coefficients to be coded are not selected, which may limit transformation efficiency.

[0125]

[0126] Hereinafter, the present specification provides a method for efficiently deriving a virtual intra prediction mode (VIM) for determining a transformation kernel. Here, the method for deriving a virtual intra prediction mode may refer to a method for deriving an intra prediction mode used when determining a transformation kernel for transformation / inverse transformation among a plurality of transformation kernels. Alternatively, it may refer to a method for deriving an intra prediction mode used when determining a transformation set for transformation / inverse transformation among a plurality of transformation sets.

[0127] Meanwhile, in this specification, a virtual intra-prediction mode for determining a transformation set and / or transformation kernel and a transformation kernel determination intra-prediction mode may have the same meaning.

[0128] If information regarding the intra-prediction mode of the current block does not benefit the performance and coding structure of the transformation technique, a transformation kernel determination intra-prediction mode is derived and can be used to determine the transformation set and / or transformation kernel.

[0129] In addition, a transformation kernel decision intra prediction mode can be derived even when the current block performs prediction in a different way than when prediction is performed based on the existing directional intra prediction mode and non-directional intra prediction mode.

[0130] Meanwhile, other methods mentioned above may include the DIMD (Decoder Side Intra Mode Derivation) method, TIMD (Template-based Intra Mode Derivation) method, intraTMP (intra Template Matching Prediction) method, CIIP (Combined Intra-Inter Prediction) method, Spatial CIIP (SCIIP, Spatial Combined Intra-Intra Prediction) method, SGPM (Spatial Geometric Prediction mode) method, EIP (Extrapolation-based Intra Prediction) method, MIP (Matrix-based Intra Prediction) method, IBC (Intra Block Copy) method, MHog (Merged Histogram of Gradients) method, OBIC (Occurrence-Based Intra Coding) method, and neural network-based intra prediction (NNIP).

[0131] Here, the DIMD method may refer to a method in which an intra prediction mode is derived at the decoder side, and TIMD may refer to a method in which the prediction mode of the current block is derived using a template reference region. IntraTMP may refer to a method of performing intra prediction using template matching. Additionally, CIIP may refer to a method in which prediction is performed by adding weights to intra prediction and inter prediction, and SCIIP may refer to the use of multiple intra prediction methods. Furthermore, SGPM may refer to a method of performing prediction by partitioning blocks. Additionally, EIP may refer to a method of performing prediction using extrapolation, and MIP may refer to a method of performing prediction using a weight matrix. Moreover, IBC may refer to a method of performing prediction using block vectors, and MHog may refer to a method of combining the gradient histograms of surrounding blocks. Furthermore, OBIC may refer to a method using occurrence rate histograms.

[0132] Meanwhile, a neural network-based intra-prediction method may refer to a method of performing prediction using a neural network model. For example, when prediction is performed based on a neural network intra-prediction mode, reference samples adjacent to the current block can be used as input to the neural network model, and the neural network model can receive these reference samples as input and output the predicted value of the current block.

[0133] Meanwhile, the NNIP method and intraNN (intra Neural network based prediction) may have the same meaning.

[0134] In addition, the aforementioned other methods may include cases where inter-prediction is performed in the current block. In addition, it may include cases where two or more intra-prediction methods are fused to perform prediction in the current block. In addition, it may include cases where two or more inter-prediction methods are fused to perform prediction in the current block. In addition, it may include cases where an intra-prediction method and an inter-prediction method are fused to perform prediction in the current block.

[0135] In addition, a transformation kernel decision intra-prediction mode can be induced even when two or more of the aforementioned other methods are fused to perform prediction.

[0136] In addition, an intra-prediction mode for determining a transformation kernel can be induced even when the current block is divided into multiple sub-blocks. In this case, an intra-prediction mode for determining a transformation kernel for the transformation of some of the sub-blocks may be induced, or an intra-prediction mode for determining a transformation kernel for the transformation of all sub-blocks may be induced.

[0137] Meanwhile, in the present disclosure, the transformation set may include a transformation set of first-order transformations and a transformation set of second-order transformations. Additionally, the transformation set may include a transformation set of separable transformations and a transformation set of inseparable transformations.

[0138] Meanwhile, in the present disclosure, the transformation kernel may include a transformation kernel used for a first-order transformation and a transformation kernel used for a second-order transformation. Additionally, the transformation kernel may include a transformation kernel used for a separable transformation and a transformation kernel used for a non-separable transformation.

[0139]

[0140] According to one embodiment of the present disclosure, a transformation kernel determination intra-prediction mode can be derived using the gradient of a prediction block. Specifically, a transformation kernel determination intra-prediction mode can be derived based on directional information of pixels included in the prediction block, and a transformation kernel can be determined based on the derived transformation kernel determination intra-prediction mode.

[0141] Meanwhile, directional information may refer to information obtained by calculating the gradients of pixels. Specifically, directional information may refer to information obtained by accumulating or processing gradient information calculated by applying filters to the corresponding pixels.

[0142] First, the gradients of the pixels included in the prediction block can be calculated. At this time, the applied filter may be a boundary detection filter such as a Sobel filter, a Roberts cross filter, a Prewitt filter, a Scharr filter, and a Laplacian filter. Here, the boundary detection filter and the edge filter may have the same meaning.

[0143] Meanwhile, when the gradient of pixels included in the prediction block is calculated, the size of the boundary detection filter applied to the pixels may always be the same, but it may also be determined based on either the size of the current block or the size of the prediction block.

[0144] For example, if the size of the current block (or prediction block) is smaller than or equal to any size, the size of the boundary detection filter can be determined as VxW, and if the size of the current block (or prediction block) is larger than any size, the size of the boundary detection filter can be determined as RxU.

[0145] Here, V, W, R, and U are any positive numbers, R may be greater than or equal to V, and U may be greater than or equal to W. For example, V and W may be 2, and R and U may be 3.

[0146] Meanwhile, the size of the current block (or prediction block) may be determined based on at least one of the width and height of the current block (or prediction block). For example, the size of the current block may refer to the value obtained by multiplying the width and the height.

[0147] Meanwhile, the horizontal length of the current block (or prediction block) may mean width, and the vertical length may mean height. For example, if at least one of the width and height of the current block (or prediction block) is equal to or smaller than 8, the size of the boundary detection filter may be determined to be 2x2.

[0148] In addition, a Histogram of Gradient (HoG) can be generated based on the calculated gradients. Specifically, the Histogram of Gradient can be generated by accumulating the gradients of the pixels included in the prediction block.

[0149] Meanwhile, when the gradients of pixels included in a prediction block are accumulated in a histogram, the same amplitude may be accumulated for a single gradient, or weighted amplitudes may be accumulated. A gradient histogram generated in this way can reflect the directional distribution more precisely.

[0150] For example, weights can be determined based on the pixel location where the corresponding gradient is calculated, and the amplitude with these weights applied can be accumulated in the gradient histogram. Here, the pixel location may refer to the distance from the center of a block, the area containing the pixel, etc.

[0151] Meanwhile, the aforementioned weights may be set to predefined values according to the protocols of the encoder and decoder.

[0152] Meanwhile, the gradient histogram may be generated by calculating the gradients of all pixels included in the prediction block, or by calculating the gradients of only some pixels. This can reduce computational complexity.

[0153] For example, after performing sub-sampling on a prediction block, the gradients of the pixels included in the sub-sampled prediction block can be calculated to generate a gradient histogram. Sub-sampling can be performed by selecting pixels within the prediction block at regular intervals (e.g., 2 pixels or 4 pixels) or by selecting them according to a predefined pattern (e.g., a grid pattern).

[0154] As another example, a gradient histogram can be generated by calculating the gradients of pixels included in specific regions (e.g., center, edge regions, etc.) within a prediction block.

[0155] And, based on the generated gradient histogram, a transformation kernel decision intra-prediction mode can be derived.

[0156] Meanwhile, the transformation kernel decision intra-prediction mode can be derived into an intra-prediction mode corresponding to the direction with the highest accumulation in the gradient histogram. Here, the highest accumulation may mean that the amplitude is the largest in the histogram, and the largest amplitude may mean that it has the largest value in the histogram.

[0157] Alternatively, the transformation kernel decision intra prediction mode may be derived into N intra prediction modes corresponding to the most accumulated direction in the gradient histogram. Here, N is any positive integer greater than or equal to 2.

[0158] For example, the transformation kernel determination intra prediction mode can be derived from the intra prediction modes corresponding to the top two directions in order of highest accumulation in the gradient histogram. In this case, each intra prediction mode can be referred to as the first intra prediction mode and the second intra prediction mode for transformation kernel determination.

[0159] In addition, based on the transformation kernel determination intra-prediction mode, a transformation kernel for transformation / inverse transformation among multiple transformation kernels can be determined.

[0160] Meanwhile, the derivation of the conversion kernel determination intra-prediction mode according to the present embodiment can be performed in the encoder and decoder.

[0161]

[0162] According to one embodiment of the present disclosure, a transformation kernel determination intra prediction mode can be derived using the gradients of surrounding blocks of the current block. Specifically, the transformation kernel determination intra prediction mode can be derived based on directional information of pixels included in surrounding blocks, and a transformation kernel can be determined based on the derived transformation kernel determination intra prediction mode.

[0163] Meanwhile, directional information may refer to information obtained by calculating the gradients of pixels. Specifically, directional information may refer to information obtained by accumulating or processing gradient information calculated by applying filters to the corresponding pixels.

[0164] Meanwhile, surrounding blocks may refer to blocks included in the surrounding area of the current block. Here, the surrounding area may include at least one of an adjacent area and a non-adjacent area. An adjacent area may refer to a specific area that shares a boundary with the current block. And, a non-adjacent area may refer to a specific area that does not share a boundary with the current block, and may be determined by an agreement between the encoder and decoder or indicated by additional information regarding the current block.

[0165] First, the gradient of pixels included in the surrounding block can be calculated. At this time, the applied filter may be a boundary detection filter such as a Sobel filter, Roberts intersection filter, Prewitt filter, Schar filter, and Laplacian filter. Here, boundary detection filter and edge filter may have the same meaning.

[0166] Meanwhile, when the gradient of pixels included in the surrounding block is calculated, the size of the boundary detection filter applied to the pixels may always be the same, but the size of the boundary detection filter may also be determined based on at least one of the size of the current block, the size of the adjacent region, and the size of the non-adjacent region.

[0167] For example, if the current block size is smaller than or equal to an arbitrary size, the size of the boundary detection filter can be determined as VxW, and if the current block size is larger than an arbitrary size, the size of the boundary detection filter can be determined as RxU.

[0168] Here, V, W, R, and U are any positive numbers, R may be greater than or equal to V, and U may be greater than or equal to W. For example, V and W may be 2, and R and U may be 3.

[0169]

[0170] Meanwhile, the size of the current block may be determined based on at least one of the width and height of the current block. For example, the size of the current block may refer to the value obtained by multiplying the width and the height.

[0171] Meanwhile, the horizontal length of the current block may mean width, and the vertical length may mean height. For example, if at least one of the width and height of the current block is equal to or smaller than 8, the size of the boundary detection filter may be determined to be 2x2.

[0172] In addition, a Histogram of Gradient (HoG) can be generated based on the calculated gradient. Specifically, the Histogram of Gradient can be generated by accumulating the gradients of pixels included in surrounding blocks.

[0173] Meanwhile, when the gradients of pixels included in surrounding blocks are accumulated in a gradient histogram, the same amplitude may be accumulated for a single gradient, or weighted amplitudes may be accumulated. A gradient histogram generated in this way can reflect the directional distribution more precisely.

[0174] For example, weights can be determined based on the pixel location where the corresponding gradient is calculated, and the amplitude to which these weights are applied can be accumulated in the gradient histogram. Here, the pixel location may refer to the distance from the current block, the area containing the pixel, etc. Additionally, the distance from the current block may refer to the distance from the boundary of the current block, the distance from the center of the current block, etc.

[0175] Meanwhile, the aforementioned weights may be set to predefined values according to the agreement between the encoder and decoder.

[0176] Meanwhile, the gradient histogram can be generated by calculating the gradients of all pixels included in the surrounding blocks, or by calculating the gradients of only some pixels. This can reduce computational complexity.

[0177] For example, after performing sub-sampling on a surrounding area (or surrounding block), the gradients of the surrounding blocks included in the sub-sampled surrounding area can be calculated to generate a gradient histogram. Sub-sampling can be performed by selecting pixels within the surrounding area (or surrounding block) at regular intervals (e.g., 2 pixels or 4 pixels) or by selecting them according to a predefined pattern (e.g., a grid pattern).

[0178] As another example, a gradient histogram can be generated by calculating the gradients of pixels included in specific regions within a surrounding area (e.g., the center, edge regions, etc.).

[0179] Meanwhile, when the total sum of the amplitudes of the gradient histogram is below a predetermined threshold, the surrounding area where the gradient is calculated can be expanded.

[0180] FIG. 4 is a drawing for explaining the expansion of a peripheral area according to one embodiment of the present disclosure.

[0181] As in the case of Fig. 4, the size of the surrounding area can be expanded so that the total sum of the amplitudes of the histogram of gradient (HoG) exceeds a predetermined threshold.

[0182] At this time, the expansion of the size of the surrounding area can be performed by increasing the number of pixel lines included in the adjacent or non-adjacent area, or by increasing the size or number of the area determined by the agreement of the encoder and decoder.

[0183] That is, when the total sum of the amplitudes (Total sum of the HoG) is below a predetermined threshold, the size of adjacent or non-adjacent regions can be expanded, and through this, more gradients can be extracted to derive the transformation kernel decision intra prediction mode more accurately.

[0184] Meanwhile, a predetermined threshold value can be determined based on the size of the current block. For example, as the size of the current block increases, the size of the predetermined threshold value may also increase.

[0185] Meanwhile, even when the current block size is smaller than or equal to a predetermined size, the size of the surrounding area where gradients are calculated can be expanded. Specifically, when the current block size is below a predetermined threshold, the size of the adjacent or non-adjacent area can be expanded, thereby extracting more gradients and enabling the transformation kernel decision intra-prediction mode to be derived more accurately.

[0186] At this time, the expansion of the size of the surrounding area can be performed by increasing the number of pixel lines included in the adjacent or non-adjacent area, or by increasing the size or number of the area determined by the agreement of the encoder and decoder.

[0187] And, based on the generated gradient histogram, a transformation kernel decision intra-prediction mode can be derived.

[0188] Meanwhile, the transformation kernel decision intra-prediction mode can be derived into an intra-prediction mode corresponding to the direction with the highest accumulation in the gradient histogram. Here, the highest accumulation may mean that the amplitude is the largest in the histogram, and the largest amplitude may mean that it has the largest value in the histogram.

[0189] Alternatively, the transformation kernel decision intra prediction mode may be derived into N intra prediction modes corresponding to the most accumulated direction in the gradient histogram. Here, N is any positive integer greater than or equal to 2.

[0190] For example, the transformation kernel determination intra prediction mode can be derived from the intra prediction modes corresponding to the top two directions in order of highest accumulation in the gradient histogram. In this case, each intra prediction mode can be referred to as the first intra prediction mode and the second intra prediction mode for transformation kernel determination.

[0191] In addition, based on the transformation kernel determination intra-prediction mode, a transformation kernel for transformation / inverse transformation among multiple transformation kernels can be determined.

[0192] Meanwhile, the derivation of the conversion kernel determination intra-prediction mode according to the present embodiment can be performed in the encoder and decoder.

[0193]

[0194] According to one embodiment of the present disclosure, a transformation kernel determination intra prediction mode can be derived using the intra prediction mode of surrounding blocks. Specifically, a histogram of oCcurrence (HoC) can be generated based on the intra prediction mode of surrounding blocks, and a transformation kernel determination intra prediction mode can be derived based on the generated histogram of oCcurrence.

[0195] Meanwhile, an incidence histogram and an incidence frequency histogram can have the same meaning.

[0196] Meanwhile, the method for deriving an intra prediction mode for determining a transformation kernel according to the present embodiment can be referred to as an OBIC (Occurrence-Based Intra Coding) method. Here, the OBIC method may refer to a method for deriving an intra prediction mode based on the frequency of occurrence of an intra prediction mode of a surrounding block.

[0197] Meanwhile, surrounding blocks may refer to blocks included in the surrounding area of the current block. Here, the surrounding area may include at least one of an adjacent area and a non-adjacent area. An adjacent area may refer to a specific area that shares a boundary with the current block. And, a non-adjacent area may refer to a specific area that does not share a boundary with the current block, and may be determined by an agreement between the encoder and decoder or indicated by additional information regarding the current block.

[0198] The incidence rate histogram can be generated based on the intra prediction modes of the surrounding blocks of the current block. Specifically, the incidence rate histogram can be generated by accumulating the intra prediction modes of the surrounding blocks.

[0199] Meanwhile, when the intra prediction modes of surrounding blocks are accumulated in the incidence histogram, the same amplitude may be accumulated for a single intra prediction mode, or weighted amplitudes may be accumulated. An incidence histogram generated in this way can better reflect the occurrence trends of intra prediction modes.

[0200] For example, weights can be determined based on the locations of surrounding blocks, and the amplitudes to which these weights are applied can be accumulated in the occurrence rate histogram. Here, the locations of surrounding blocks may refer to the distance from the current block, the area containing the surrounding blocks, etc. Additionally, the distance from the current block may refer to the distance from the boundary of the current block, the distance from the center of the current block, etc.

[0201] Meanwhile, the aforementioned weights may be set to predefined values according to the agreement between the encoder and decoder.

[0202] Meanwhile, the occurrence rate histogram may be generated by accumulating the intra prediction modes of all surrounding blocks included in the surrounding area, or by accumulating the intra prediction modes of some surrounding blocks. This can reduce computational complexity.

[0203] For example, after performing sub-sampling on a surrounding area, an occurrence rate histogram can be generated by accumulating the intra-prediction modes of surrounding blocks included in the sub-sampled surrounding area. Sub-sampling can be performed by selecting pixels within the surrounding area at regular intervals (e.g., 2 pixels or 4 pixels) or by selecting them according to a predefined pattern (e.g., a grid pattern).

[0204] As another example, an incidence histogram can be generated by accumulating the intra-prediction modes of surrounding blocks included in a specific region within the surrounding area (e.g., center, edge region, etc.).

[0205] Meanwhile, even when the incidence rate histogram is generated by accumulating the intra-prediction mode of a portion of the surrounding blocks, weighted amplitudes may also be accumulated as described above.

[0206] Meanwhile, when the total sum of the amplitudes of the occurrence rate histogram is below a predetermined threshold, the number of blocks in which intra prediction modes are accumulated can be increased. Through this, the number of intra prediction modes accumulated in the occurrence rate histogram increases, allowing the transformation kernel decision intra prediction mode to be derived more accurately.

[0207] FIG. 5 is a diagram illustrating the addition of blocks in which an intra prediction mode is accumulated according to one embodiment of the present disclosure.

[0208] As in the case of FIG. 5, blocks in which intra-prediction modes are accumulated may be added so that the total sum of the amplitudes of the histogram of occurrence (HoC) exceeds a predetermined threshold. Here, the blocks may be neighboring blocks included in the surrounding area, or blocks at a location determined by the agreement between the encoder and decoder.

[0209] Meanwhile, a predetermined threshold value can be determined based on the size of the current block. For example, as the size of the current block increases, the size of the predetermined threshold value may also increase.

[0210] Meanwhile, even if the current block size is smaller than or equal to a predetermined size, blocks in which intra prediction modes are accumulated may be added. For example, when the current block size is below a predetermined threshold, the size of adjacent or non-adjacent regions may be expanded, thereby accumulating more blocks' intra prediction modes in the histogram, which can lead to more accurate derivation of the transformation kernel decision intra prediction mode or a reduction in computational complexity.

[0211] Meanwhile, even if the current block size is greater than or equal to a predetermined size, blocks in which intra prediction modes are accumulated may be added. For example, if the current block size is greater than a predetermined threshold, the size of adjacent or non-adjacent regions may be expanded, thereby accumulating more blocks' intra prediction modes in the histogram and enabling the transformation kernel decision intra prediction mode to be derived more accurately.

[0212] Meanwhile, even if the number of intra-prediction modes in the incidence rate histogram is below a predetermined threshold, a block in which intra-prediction modes are accumulated in the histogram can be added as in the methods described above.

[0213] And, based on the generated incidence histogram, a transformation kernel decision intra-prediction mode can be derived.

[0214] Meanwhile, the transformation kernel decision intra-prediction mode can be derived into a mode corresponding to the most accumulated intra-prediction mode in the incidence histogram. Here, being most accumulated may mean having the largest amplitude in the histogram, and having the largest amplitude may mean having the largest value in the histogram.

[0215] Alternatively, the transformation kernel decision intra-prediction mode may be derived from N intra-prediction modes corresponding to the most accumulated intra-prediction mode in the occurrence histogram. Here, N is any positive integer greater than or equal to 2.

[0216] For example, the transformation kernel determination intra-prediction mode can be derived from the top two intra-prediction modes in the order of highest accumulation in the occurrence rate histogram. In this case, each intra-prediction mode can be referred to as the first intra-prediction mode and the second intra-prediction mode for transformation kernel determination.

[0217] In addition, based on the transformation kernel determination intra-prediction mode, a transformation kernel for transformation / inverse transformation among multiple transformation kernels can be determined.

[0218] Meanwhile, the derivation of the conversion kernel determination intra-prediction mode according to the present embodiment can be performed identically in the encoder and decoder.

[0219]

[0220] When a transformation kernel determination intra-prediction mode is derived, the transformation kernel for transformation / inverse transformation based on the mode can be determined as any one of S transformation kernels. Here, S is an arbitrary positive integer. Meanwhile, according to the embodiments of the method for deriving a transformation kernel determination intra-prediction mode described above, a plurality of transformation kernel determination intra-prediction modes may be derived.

[0221] According to one embodiment, if there are S transformation kernels, a predetermined intra-prediction mode (or mode interval) may be mapped to each transformation kernel. Here, S is any positive integer. For example, if one transformation kernel determination intra-prediction mode is derived, it may be determined as a kernel mapped to that transformation kernel determination intra-prediction mode.

[0222] Meanwhile, mapping can be implemented in the form of a lookup table (LUT).

[0223] And, the transformation kernel for the transformation / inverse transformation can be determined as a kernel among S transformation kernels that maps to the transformation kernel determination intra-prediction mode.

[0224] Alternatively, the transformation kernel for transformation / inverse transformation may be determined by considering the transformation kernel determination intra-prediction mode and additional information related to the current block. Here, the additional information related to the current block may include the intra-prediction mode of the current block, the size of the current block, relevant syntax elements, etc.

[0225] For example, multiple lookup tables may be defined depending on the size of the current block, and each lookup table may be configured to map different transformation kernels even for the same intra prediction mode. And, the transformation kernel for transformation / inverse transformation may be determined from the lookup table corresponding to the size of the current block as the transformation kernel mapped to the transformation kernel determination intra prediction mode.

[0226] According to another embodiment, a transformation kernel for transformation / inverse transformation may be determined as any one of a plurality of transformation kernels based on the cost of N transformation kernel determination intra-prediction modes. Here, N is a positive integer greater than or equal to 2.

[0227] According to the embodiments of the method for deriving transformation kernel determination intra-prediction modes described above, N transformation kernel determination intra-prediction modes may be derived.

[0228] Specifically, the rate-distortion cost (RD cost) can be calculated for each of the N transform kernel decision intra-prediction modes in the encoder. And,

[0229] The transformation kernel for the transformation can be determined as the kernel that maps to the intra-prediction mode among S transformation kernels, with the smallest cost.

[0230] Additionally, information regarding the determined transformation kernel can be signaled to a decoder. Here, the information regarding the determined transformation kernel may include at least one of a flag regarding whether to use the transformation kernel determination intra-prediction mode, a transformation kernel index, and an index regarding the transformation kernel determination intra-prediction mode. Based on this information, the decoder may determine a transformation kernel for inverse transformation.

[0231] According to another embodiment, a transformation kernel for transformation / inverse transformation can be derived based on two transformation kernel determination intra-prediction modes.

[0232] Specifically, when a first intra prediction mode and a second intra prediction mode for determining a transformation kernel are derived, an index indicating the difference between the value of the first intra prediction mode and the value of the second intra prediction mode can be derived, and a transformation kernel can be determined based on the index.

[0233] For example, an index indicating the difference between the value of the first intra-prediction mode and the value of the second intra-prediction mode can be determined as in Equation 1.

[0234]

[0235]

[0236] ipm in mathematical formula 1diff represents the difference between the value of the first intra prediction mode and the second intra prediction mode, and ipm 1st and ipm 2nd represent the values of the first intra prediction mode and the second intra prediction mode for determining the transformation kernel, respectively. And, ipm diff,idx represents an index indicating the difference between the value of the first intra prediction mode and the value of the second intra prediction mode.

[0237] Meanwhile, the value of the prediction mode may refer to a predetermined index indicating the corresponding prediction mode. For example, the value of the prediction mode may refer to the mode number of the corresponding prediction mode. In this case, if the first intra prediction mode is mode 11 and the second intra prediction mode is mode 24, ipm diff It is calculated as 13, the absolute value of the difference between the two mode numbers, and ipm diff,idx can be determined to be 1.

[0238] Meanwhile, mathematical formula 1 is merely one example, and the number of intervals, the number of indices, and the threshold value for determining the intervals can be determined in various ways.

[0239] IPM determined in this way diff,idx The value can be used as an index to determine the conversion kernel.

[0240] For example, the transformation kernel for the transformation / inverse transformation can be determined as shown in Equation 2.

[0241]

[0242]

[0243] In Equation 2, kernel selected refers to the transformation kernel selected for the transformation / inverse transformation. And, LUT refers to a lookup table in which multiple transformation kernels are mapped. Also, size idxis an index corresponding to the size of the block to be converted, and can be determined as different index values depending on block sizes such as 4x4, 8x8, or 16x16. And, ipm diff,idx represents an index indicating the difference between the value of the first intra prediction mode and the value of the second intra prediction mode.

[0244] Accordingly, referring to Equation 2, the conversion kernel can be determined based on an index representing the difference between the size of the conversion target block, the value of the first intra prediction mode, and the value of the second intra prediction mode.

[0245] Meanwhile, the method for determining a conversion kernel for conversion / inverse conversion according to the aforementioned embodiments can be performed identically in the encoder and the decoder.

[0246]

[0247] Once a conversion kernel for conversion / inverse conversion is determined, the encoder can perform the conversion based on the corresponding conversion kernel, and the decoder can perform the inverse conversion based on the corresponding conversion kernel.

[0248] Specifically, in the encoder, a prediction for the current block can be performed to derive a prediction block, and a residual block can be derived based on the difference between the current block and the prediction block. Then, when a transformation kernel for transforming the residual block is determined according to the aforementioned embodiments, a transformation for the residual block can be performed based on the transformation kernel. The transformation coefficients obtained by performing the transformation can be quantized and encoded into a bitstream.

[0249] In the decoder, a conversion coefficient for the current block can be obtained from the bitstream. Then, when a conversion kernel is determined according to the embodiments described above, an inverse conversion of the conversion coefficient can be performed based on the conversion kernel.

[0250]

[0251] FIG. 6 is a flowchart illustrating a decoding method according to one embodiment of the present disclosure. The decoding method of FIG. 6 can be performed by an image decoding device.

[0252] The video decoder can obtain conversion coefficients for the current block from the bitstream (S600).

[0253] And, the video decoder can perform a prediction on the current block to derive a prediction block (S610).

[0254] The step of deriving the prediction block is performed based on a neural-network based intra prediction mode, and the neural-network based intra prediction mode may use a neural network model that takes adjacent reference samples of the current block as input.

[0255] And, the image decoder can generate a histogram of gradient (HoG) based on the prediction block (S620).

[0256] Meanwhile, the gradient histogram is generated by accumulating the gradients of the pixels included in the prediction block, and the gradients of the pixels included in the prediction block can be obtained by applying a boundary detection filter to the pixels included in the prediction block.

[0257] Meanwhile, the size of the boundary detection filter can be determined based on the size of the current block.

[0258] Meanwhile, the gradients of the pixels included in the prediction block can be accumulated in the gradient histogram based on the positions of the pixels included in the prediction block.

[0259] And, the image decoding device can derive a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the gradient histogram (S630).

[0260] Meanwhile, the first intra prediction mode and the second intra prediction mode can be derived into intra prediction modes corresponding to the top two directions in order of largest amplitude in the gradient histogram.

[0261] And, the video decoder can determine a conversion kernel for the conversion coefficient based on the first intra prediction mode and the second intra prediction mode (S640).

[0262] Meanwhile, the step of determining a transformation kernel for the transformation coefficient further includes the step of deriving an index representing the difference between the value of the first intra prediction mode and the value of the second intra prediction mode, and the transformation kernel for the transformation coefficient may be determined as one of a plurality of transformation kernels based on the index.

[0263] Meanwhile, the transformation kernel for the above transformation coefficient may be a separate transformation kernel.

[0264] Meanwhile, the transformation kernel for the above transformation coefficient can be determined as one of a plurality of transformation kernels based on the size of the current block.

[0265] And, the video decoder can perform an inverse transformation of the transformation coefficients based on a transformation kernel for the transformation coefficients (S650).

[0266] Meanwhile, the method may further include a step of sub-sampling the prediction block, and the gradient histogram may be generated by accumulating the gradients of the pixels included in the sub-sampled prediction block.

[0267]

[0268] FIG. 7 is a flowchart illustrating an encoding method according to one embodiment of the present disclosure. The encoding method of FIG. 7 can be performed by an image encoding device.

[0269] First, the video encoding device can perform a prediction on the current block to derive a prediction block (S700).

[0270] Meanwhile, the step of deriving the prediction block is performed based on a neural-network based intra prediction mode, and the neural-network based intra prediction mode may use a neural network model that takes adjacent reference samples of the current block as input.

[0271] And, the video encoding device can derive a residual block of the current block based on the prediction block (S710).

[0272] And, the video encoding device can generate a histogram of gradient (HoG) based on the prediction block (S720).

[0273] The above gradient histogram is generated by accumulating the gradients of the pixels included in the prediction block, and the gradients of the pixels included in the prediction block can be obtained by applying a boundary detection filter to the pixels included in the prediction block.

[0274] Meanwhile, the size of the boundary detection filter can be determined based on the size of the current block.

[0275] Meanwhile, the gradients of the pixels included in the prediction block can be accumulated in the gradient histogram based on the positions of the pixels included in the prediction block.

[0276] And, the video encoding device can derive a first intra prediction mode and a second intra prediction mode for determining a transform kernel based on the gradient histogram (S730).

[0277] Meanwhile, the first intra prediction mode and the second intra prediction mode can be derived into intra prediction modes corresponding to the top two directions in order of largest amplitude in the gradient histogram.

[0278] And, the video encoding device can determine a transformation kernel for the residual block based on the first intra prediction mode and the second intra prediction mode (S740).

[0279] Meanwhile, the step of determining a transformation kernel for the above residual block further includes the step of deriving an index representing the difference between the value of the first intra prediction mode and the value of the second intra prediction mode, and the transformation kernel for the above residual block may be determined as one of a plurality of transformation kernels based on the index.

[0280] Meanwhile, the transformation kernel for the above residual block may be a separate transformation kernel.

[0281] Meanwhile, the transformation kernel for the above residual block can be determined as one of a plurality of transformation kernels based on the size of the current block.

[0282] And, the video encoding device can perform a transformation of the residual block based on a transformation kernel for the residual block (S750).

[0283] Meanwhile, the method may further include a step of sub-sampling the prediction block, and the gradient histogram may be generated by accumulating the gradients of the pixels included in the sub-sampled prediction block.

[0284] Meanwhile, a bitstream can be generated by a video encoding method including the steps described in FIG. 7. The bitstream can be stored on a non-transient computer-readable recording medium and can also be transmitted (or streamed).

[0285]

[0286] FIG. 8 is a drawing illustrating an exemplary content streaming system to which an embodiment according to the present disclosure can be applied.

[0287] As illustrated in FIG. 8, a content streaming system to which an embodiment of the present disclosure is applied may largely include an encoding server, a streaming server, a web server, a media storage, a user device, and a multimedia input device.

[0288] The encoding server described above compresses content input from multimedia input devices, such as smartphones, cameras, and CCTVs, into digital data to generate a bitstream and transmits it to the streaming server. As another example, if multimedia input devices, such as smartphones, cameras, and CCTVs, generate the bitstream directly, the encoding server may be omitted.

[0289] The bitstream may be generated by a video encoding method and / or video encoding device to which an embodiment of the present disclosure is applied, and the streaming server may temporarily store the bitstream during the process of transmitting or receiving the bitstream.

[0290] The streaming server transmits multimedia data to a user device based on a user request through a web server, and the web server can act as a medium to inform the user of available services. When a user requests a desired service from the web server, the web server transmits it to the streaming server, and the streaming server can transmit multimedia data to the user. At this time, the content streaming system may include a separate control server, and in this case, the control server can perform the role of controlling commands and responses between each device within the content streaming system.

[0291] The streaming server can receive content from a media storage and / or an encoding server. For example, when receiving content from the encoding server, the content can be received in real time. In this case, to provide a seamless streaming service, the streaming server can store the bitstream for a certain period of time.

[0292] Examples of the above user devices may include mobile phones, smartphones, laptop computers, digital broadcasting terminals, PDAs (personal digital assistants), PMPs (portable multimedia players), navigation systems, slate PCs, tablet PCs, ultrabooks, wearable devices (e.g., smartwatches, smart glasses, HMDs (head-mounted displays)), digital TVs, desktop computers, digital signage, etc.

[0293] Each server within the above-mentioned content streaming system can be operated as a distributed server, and in this case, data received from each server can be processed in a distributed manner.

[0294]

[0295] The above embodiments may be performed in the same or a corresponding way in the encoding device and the decoding device. Additionally, an image may be encoded / decoded using at least one of the above embodiments or a combination of at least one.

[0296] The order in which the above embodiments are applied may differ between the encoding device and the decoder. Alternatively, the order in which the above embodiments are applied may be the same between the encoding device and the decoder.

[0297] The above embodiments may be performed for each of the luminance and chrominance signals. Alternatively, the above embodiments for the luminance and chrominance signals may be performed in the same way.

[0298] In the above embodiments, methods are described based on flowcharts as a series of steps or units; however, the present disclosure is not limited to the order of steps, and some steps may occur in a different order or simultaneously with other steps as described above. Furthermore, those skilled in the art will understand that the steps shown in the flowcharts are not exclusive, other steps may be included, or one or more steps of the flowcharts may be omitted without affecting the scope of the present disclosure.

[0299] The above embodiments may be implemented in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., either alone or in combination. The program instructions recorded on the computer-readable recording medium may be those specifically designed and configured for the present disclosure, or they may be those known and available to those skilled in the art of computer software.

[0300] The bitstream generated by the encoding method according to the above embodiment may be stored in a non-transient computer-readable recording medium. Additionally, the bitstream stored in the non-transient computer-readable recording medium may be decoded by the decoding method according to the above embodiment.

[0301] Herein, examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes; optical recording media such as CD-ROMs and DVDs; magneto-optical media such as floptical disks; and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, and flash memory. Examples of program instructions include machine code, such as that generated by a compiler, as well as high-level language code that can be executed by a computer using an interpreter, etc. The hardware devices may be configured to operate as one or more software modules to perform processing according to the present disclosure, and vice versa.

[0302] Although the present disclosure has been described above with specific details such as specific components, limited embodiments, and drawings, this is provided only to aid in a more comprehensive understanding of the present disclosure and is not limited to the above embodiments, and a person skilled in the art to which the present disclosure belongs can make various modifications and variations from this description.

[0303] Accordingly, the scope of the present disclosure is not limited to the embodiments described above, and all things equivalent or equivalently modified to the claims set forth below, as well as the claims set forth below, shall be considered to be within the scope of the scope of the present disclosure.

[0304] The present disclosure may be used in an apparatus for encoding / decoding images and a recording medium storing a bitstream.

Claims

1. A step of obtaining conversion coefficients for the current block from the bitstream; A step of deriving a prediction block by performing a prediction on the current block above; A step of generating a Histogram of Gradient (HoG) based on the above prediction block; A step of deriving a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the above gradient histogram; A step of determining a transformation kernel for the transformation coefficient based on the first intra prediction mode and the second intra prediction mode; and An image decoding method comprising the step of performing an inverse transformation of the transformation coefficients based on a transformation kernel for the transformation coefficients.

2. In Paragraph 1, The step of determining the transformation kernel for the above transformation coefficient is, The method further includes the step of deriving an index representing the difference between the value of the first intra prediction mode and the value of the second intra prediction mode, and An image decoding method characterized in that the transformation kernel for the above transformation coefficient is determined as one of a plurality of transformation kernels based on the above index.

3. In Paragraph 1, The step of deriving the above prediction block is performed based on a neural-network based intra prediction mode, and An image decoding method characterized in that the above neural network-based intra prediction mode uses a neural network model that takes adjacent reference samples of the current block as input.

4. In Paragraph 1, An image decoding method characterized in that the first intra prediction mode and the second intra prediction mode are guided to the intra prediction mode corresponding to the top two directions in order of largest amplitude in the gradient histogram.

5. In Paragraph 1, The above gradient histogram is generated by accumulating the gradients of pixels included in the prediction block, and An image decoding method characterized in that the gradient of pixels included in the above prediction block is obtained by applying a boundary detection filter to the pixels included in the above prediction block.

6. In Paragraph 5, An image decoding method characterized in that the size of the boundary detection filter is determined based on the size of the current block.

7. In Paragraph 5, An image decoding method characterized in that the gradient of pixels included in the prediction block is accumulated in the gradient histogram based on the positions of the pixels included in the prediction block.

8. In Paragraph 1, The method further includes a step of sub-sampling the above prediction block, and An image decoding method characterized in that the above gradient histogram is generated by accumulating the gradients of pixels included in the above sub-sampled prediction block.

9. In Paragraph 1, An image decoding method characterized in that the conversion kernel for the above conversion coefficient is a separate conversion kernel.

10. In Paragraph 1, An image decoding method characterized in that the transformation kernel for the above transformation coefficient is determined as one of a plurality of transformation kernels based on the size of the current block.

11. A step of deriving a predicted block by performing a prediction on the current block; A step of deriving a residual block of the current block based on the above prediction block; A step of generating a Histogram of Gradient (HoG) based on the above prediction block; A step of deriving a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the above gradient histogram; A step of determining a transformation kernel for the residual block based on the first intra prediction mode and the second intra prediction mode; and An image encoding method comprising the step of performing a transformation of the residual block based on a transformation kernel for the residual block.

12. In a bitstream transmission method generated by a video encoding method, The above transmission method includes the step of transmitting the bitstream, and The above image encoding method is, A step of deriving a predicted block by performing a prediction on the current block; A step of deriving a residual block of the current block based on the above prediction block; A step of generating a Histogram of Gradient (HoG) based on the above prediction block; A step of deriving a first intra prediction mode and a second intra prediction mode for determining a transformation kernel based on the above gradient histogram; A step of determining a transformation kernel for the residual block based on the first intra prediction mode and the second intra prediction mode; and A transmission method comprising the step of performing a transformation of the residual block based on a transformation kernel for the residual block.